arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.31596 2026-06-01 cs.CV cs.LG 版本更新

KLIP: localized distribution shift detection via KL-divergence with diffusion priors in Inverse Problems

KLIP：通过逆问题中扩散先验的KL散度进行局部分布偏移检测

Alireza Kheirandish, Jihoon Hong, Sara Fridovich-Keil

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出基于KL散度的OOD检测指标，无需校准数据或偏移分布知识，可检测并定位图像中的局部分布偏移。

Comments CVPR 2026

详情

AI中文摘要

扩散模型作为计算成像的数据驱动先验以及检测分布外（OOD）图像方面已展现出有前景的性能。然而，现有的OOD检测方法通常需要一些关于偏移分布的知识，无法检测细微或局部的分布偏移，并且作用于完整图像而非逆问题中可用的间接测量。我们提出了一种基于扩散先验与后验分布之间的Kullback-Leibler散度的OOD检测指标，该指标（i）不需要任何校准数据或关于偏移分布的知识，并且（ii）可以检测整张图像是否为OOD，以及定位图像内的OOD块。实验上，我们表明该指标可以检测细微但语义上有意义的分布偏移，例如从健康肝脏CT扫描到有肿瘤的CT扫描的偏移，并且能够泛化到不同类型的扩散模型、数据集和逆问题。我们的代码可在https://github.com/voilalab/KLIP找到。

英文摘要

Diffusion models have shown promising performance as data-driven priors for computational imaging, as well as some capacity to detect out-of-distribution (OOD) images. However, existing approaches to OOD detection often require some knowledge of the shifted distribution, fail to detect subtle or localized distribution shifts, and operate on full images, rather than the indirect measurements available in inverse problems. We propose an OOD detection metric based on the Kullback-Leibler divergence between the diffusion prior and the posterior distribution, that (i) does not require any calibration data or knowledge of the shifted distribution, and (ii) can detect whole images as OOD as well as localize OOD patches within an image. Experimentally, we show that this metric can detect subtle yet semantically meaningful distribution shifts, such as the shift from healthy liver CT scans to those with tumors, and generalizes across different types of diffusion models, datasets, and inverse problems. Our code can be found at https://github.com/voilalab/KLIP.

URL PDF HTML ☆

赞 0 踩 0

2605.31594 2026-06-01 cs.LG math.OC 版本更新

A Tight Theory of Error Feedback Algorithms in Distributed Optimization

分布式优化中误差反馈算法的紧致理论

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

发表机构 * Inria, D.I. ENS, CNRS, PSL Research University, Paris, France（法国国家信息与自动化研究所、巴黎综合理工学院、法国国家科学研究中心、巴黎高等师范学院、法国巴黎）

AI总结本文针对分布式优化中的两种主流误差反馈算法（EF和EF21），通过确定最优步长和构造最优Lyapunov函数，给出了紧致的收敛性分析，结果与智能体数量无关且恢复单智能体情形下的已知最优保证。

2605.31584 2026-06-01 cs.CL cs.AI cs.LG 版本更新

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

LongTraceRL: 基于评分奖励从搜索智能体轨迹中学习长上下文推理

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

发表机构 * Tsinghua University（清华大学）

AI总结提出LongTraceRL框架，通过知识图谱随机游走生成多跳问题并利用搜索智能体轨迹构建分层干扰物，结合基于实体链的评分奖励进行过程监督，提升大语言模型在长上下文推理中的表现。

详情

AI中文摘要

长上下文推理仍然是大型语言模型的核心挑战，模型往往难以在大量干扰内容中定位和整合关键信息。基于可验证奖励的强化学习（RLVR）在此任务上展现出潜力，但现有方法受限于低混淆度的干扰物和稀疏的、仅基于结果的奖励信号，无法监督中间推理步骤。为解决这些问题，我们引入了 extsc{LongTraceRL}。在数据构建方面，我们通过知识图谱随机游走生成多跳问题，并利用搜索智能体轨迹构建\emph{分层干扰物}：智能体读取但未引用的文档（高混淆度）和搜索结果中出现但从未打开的文档（低混淆度），从而生成比随机采样或单次搜索构建的训练上下文更具挑战性的内容。在奖励设计方面，我们提出了一种\emph{评分奖励}，利用每条推理链上的黄金实体作为细粒度的实体级过程监督。该评分奖励仅应用于最终答案正确的响应（正向策略），以区分正确响应之间的推理质量，并防止奖励作弊。在五个长上下文基准上对三种推理LLM（4B-30B）进行的实验表明， extsc{LongTraceRL} 始终优于强基线，并鼓励全面、基于证据的推理。代码、数据集和模型可在 \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL} 获取。

英文摘要

Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractors and sparse, outcome-only reward signals that cannot supervise intermediate reasoning steps. To address these issues, we introduce \textsc{LongTraceRL}. For data construction, we generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build \emph{tiered distractors}: documents the agent read but did not cite (high confusability) and documents that appeared in search results but were never opened (low confusability), producing training contexts that are far more challenging than those built by random sampling or one-shot search. For reward design, we propose a \emph{rubric reward} that uses the gold entities along each reasoning chain as fine-grained, entity-level process supervision. This rubric reward is applied only to responses with correct final answers (positive-only strategy), distinguishing the reasoning quality among correct responses and preventing reward hacking. Experiments on three reasoning LLMs (4B--30B) across five long-context benchmarks demonstrate that \textsc{LongTraceRL} consistently outperforms strong baselines and encourages comprehensive, evidence-grounded reasoning. Codes, datasets and models are available at \href{https://github.com/THU-KEG/LongTraceRL}{https://github.com/THU-KEG/LongTraceRL}.

URL PDF HTML ☆

赞 0 踩 0

2605.31580 2026-06-01 cs.LG 版本更新

Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

赋予传感器声音：用于语义时间序列嵌入的多模态JEPA

Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad, Henrik Ohlsson

发表机构 * C3 AI

AI总结提出CHARM模型，通过通道级文本描述与Transformer编码器结合，利用联合嵌入预测架构（JEPA）学习语义时间序列嵌入，在异常检测、分类和预测任务中仅用线性探针即取得强性能。

Comments 9 pages, 5 figures, accepted at ICML 2026. arXiv admin note: substantial text overlap with arXiv:2505.14543

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML), PMLR 306, 2026

AI中文摘要

基于Transformer的架构在语言和视觉领域的序列建模中取得了进展，但针对异构多变量时间序列的通用表示学习仍未被充分探索。我们提出了CHARM（通道感知表示模型），该模型将通道级文本描述整合到对通道顺序等变的Transformer编码器中。CHARM采用联合嵌入预测架构（JEPA）和一种新颖的损失函数进行训练，该损失函数促进信息丰富且时间稳定的嵌入；潜在空间预测增强了对传感器噪声的鲁棒性，而描述感知门控通过学习到的通道间关系提供了可解释性。在异常检测、分类以及短期和长期预测任务中，学习到的嵌入仅使用线性探针就取得了强性能。性能主要由JEPA目标和条件架构驱动，文本描述作为跨数据集泛化的通道标识符。

英文摘要

Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.

URL PDF HTML ☆

赞 0 踩 0

2605.31562 2026-06-01 cs.LG 版本更新

Effective Biological Representation Learning by Masking Gene Expression

通过掩码基因表达实现有效的生物表示学习

Kian Kenyon-Dean, Alina Selega, Ihab Bendidi, Jordan M. Sorokin, Luca Bertinetto, David Errington, Hayley Donnella, Oren Kraus

发表机构 * Recursion ； Valence Labs ； École Normale Supérieure PSL

AI总结提出自监督模型TxFM，采用掩码自编码方法处理RNA-seq数据，通过消融研究确定关键架构，并在精心策划的DiverseRNA-1.4M数据集上训练，获得优于大规模基础模型的基因表示。

Comments 31 pages, 11 figures. Preprint; presented at ICLR 2026 2nd Workshop on Foundation Models for Science: Real-World Impact and Science-First Design

详情

AI中文摘要

RNA测序产生丰富多样的基因表达数据集，为细胞状态和功能提供了引人注目的见解，在药物发现中有许多应用。由于固有的技术噪声和实验批次效应，对此类数据进行建模具有挑战性，许多现有的转录组基础模型（FMs）表现不如线性基线。这些结果提出了一个问题：深度表示学习是否比直接使用原始转录计数具有明显优势。我们的工作通过开发一种新的自监督模型TxFM来探索这一点，重点关注归纳表示学习评估。TxFM采用了一种针对多样化RNA-seq计数数据定制的掩码自编码方法，我们的消融研究通过实验确定了强迁移性能所需的关键架构配置。此外，我们策划了一个公共训练语料库DiverseRNA-1.4M，并发现，在此策划数据集上训练的TxFM产生了高保真度的基因表示，其性能优于在规模大100倍以上的图谱级语料库上训练的FMs。总体而言，我们的结果表明，只要精心综合模型架构和训练数据策划，归纳自监督学习是转录组表示的一种可行建模方法。

英文摘要

RNA sequencing produces rich and diverse datasets of gene expression, offering compelling insights into cellular state and function that have many applications in drug discovery. Modeling such data is challenging due to inherent technical noise and experimental batch effects, as evidenced by many existing transcriptomic foundation models (FMs) underperforming relative to linear baselines. Such results raise the question of whether deep representation learning provides a distinct advantage over the direct use of raw transcript counts. Our work explores this by developing a new self-supervised model, TxFM, with a focus on inductive representation learning evaluations. TxFM employs a masked autoencoding approach tailored to diverse RNA-seq count data, and our ablation study empirically identifies crucial architecture configurations required for strong transfer performance. Additionally, we curate a public training corpus, DiverseRNA-1.4M, and find that TxFM trained on this curated dataset yields high-fidelity gene representations that outperform FMs trained on atlas-scale corpora over 100x larger. Overall, our results indicate that inductive self-supervised learning is a viable modeling approach for transcriptomics representation, provided a careful synthesis of model architecture and training data curation.

URL PDF HTML ☆

赞 0 踩 0

2605.31559 2026-06-01 cs.LG 版本更新

Functional Attention: From Pairwise Affinities to Functional Correspondences

函数注意力：从成对亲和性到函数对应

Jiefang Xiao, Maolin Gao, Simon Weber, Guandao Yang, Daniel Cremers

发表机构 * Technical University of Munich, Germany（慕尼黑技术大学，德国）； Munich Center for Machine Learning (MCML), Germany（慕尼黑机器学习中心（MCML），德国）； PIXL, Department of Computer Science, University of Oxford, United Kingdom（牛津大学计算机科学系PIXL，英国）； ECE, University of Texas at Austin, USA（德克萨斯大学奥斯汀分校电子与计算机工程系，美国）

AI总结提出函数注意力机制，将注意力重新解释为自适应基之间的函数对应，通过结构化线性算子替代softmax亲和性，实现紧凑、可泛化、分辨率不变的全局依赖表示，在PDE求解、3D分割和回归等算子学习任务中达到最先进性能。

Comments 26 pages, 12 figures. Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

学习无限维函数空间之间的映射，即算子学习，对于许多机器学习应用至关重要。尽管基于Transformer的算子很流行，但它们通常依赖于token-wise注意力。这些方法将连续场视为离散token，通常忽略全局函数结构。我们引入了\emph{函数注意力}，它将注意力重新解释为自适应基之间的函数对应。受几何函数映射的启发，我们的方法用结构化线性算子替换softmax亲和性。这产生了一个紧凑、可泛化、分辨率不变的表示，显式捕获全局依赖关系。实验表明，\emph{函数注意力}可以在许多算子学习任务中达到最先进的性能，包括求解PDE、3D分割和回归，同时保持对不同离散化的鲁棒性。项目页面可在https://github.com/xjffff/FUNCATTN获取。

英文摘要

Learning mappings between infinite-dimensional function spaces, or operator learning, is essential for many machine learning applications. Although transformer-based operators are popular, they often rely on token-wise attention. These methods treat continuous fields as discrete tokens and usually ignore the global functional structure. We introduce \emph{Functional Attention}, which reinterprets attention as a functional correspondence between adaptive bases. Inspired by geometric functional maps, our method replaces softmax affinities with structured linear operators. This yields a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies. Experiments demonstrate that \emph{Functional Attention} can match state-of-the-art performance in many operator learning tasks, including solving PDEs, 3D segmentation, and regression, while remaining robust to varying discretizations. Project page is available at https://github.com/xjffff/FUNCATTN.

URL PDF HTML ☆

赞 0 踩 0

2605.31558 2026-06-01 cs.LG cs.AI 版本更新

Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

位置注意力头与符号注意力头：学习动态、RoPE几何和长度泛化

Felipe Urrutia, Juan José Alegría, Cinthia Sanchez Macias, Jorge Salas, Cristian B. Calderon, Cristobal Rojas

发表机构 * CENIA & Faculty of Mathematics UC Santiago（CENIA与圣托里尼大学数学系）； IMC UC & CENIA Santiago（UC IMC与圣托里尼CENIA）

AI总结通过控制实验研究Transformer注意力头在位置推理和符号推理任务中的学习动态，发现位置和符号注意力头的不同机制及其对长度泛化的影响。

详情

AI中文摘要

基于Transformer的语言模型在当今社会广泛应用。因此，理解它们解决结构化任务的机制以及预测它们在新型场景中的行为对于安全部署至关重要。我们通过在两个结构等价的多跳推理任务上训练仅解码器Transformer（GPT-J）来研究注意力头的学习动态：一个需要位置推理的数字任务和一个需要符号推理的字母任务。利用最近引入的度量标准，该标准将注意力头的行为分类为给定提示下的位置性或符号性，我们表明成功学习与纯头（即表达为位置性或符号性的头）的出现相关。尽管任务结构等价，但它们施加了不同的机制需求：数字任务需要位置头和符号头，而字母任务仅需要符号头。然后，我们识别这些头的计算角色，描述它们实现的基本功能，并给出理论构造，展示单层基于RoPE的注意力如何通过几何可解释的查询、键和值操作实现这些功能。该分析通过一种新的差异概念形式化，在位置和符号机制对更长序列的鲁棒性上产生了定量分离。我们在受控模型和真实世界模型中经验验证了由此产生的预测，表明符号机制更可靠地外推到更长序列，而位置机制面临更严格的限制。

英文摘要

Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic. Despite the tasks' structural equivalence, they impose different mechanistic demands: the number task requires both positional and symbolic heads, whereas the letter task requires only symbolic heads. We then identify the computational roles of these heads, characterize the basic functions they implement, and give theoretical constructions showing how single-layer RoPE-based attention can realize these functions through geometrically interpretable query, key, and value operations. This analysis yields a quantitative separation between positional and symbolic mechanisms in their robustness to longer sequences, formalized through a novel notion of discrepancy. We empirically validate the resulting predictions in both controlled and real-world models, showing that symbolic mechanisms extrapolate more reliably to longer sequences while positional mechanisms face sharper limitations.

URL PDF HTML ☆

赞 0 踩 0

2605.31547 2026-06-01 cs.LG math.DS stat.ML 版本更新

The Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling

混沌替代建模中的动态-概率一致性差距

Andre Herz, Matthijs Pals, Daniel Durstewitz, Georgia Koppe

发表机构 * Interdisciplinary Center for Scientific Computing, Heidelberg University, Germany（海德堡大学交叉科学计算中心）； Faculty of Mathematics and Computer Science, Heidelberg University, Germany（海德堡大学数学与计算机科学学院）； Dept. of Theoretical Neuroscience, Central Institute of Mental Health (CIMH), Mannheim, Germany（曼海姆中央心理健康研究所理论神经科学部门）； Faculty of Physics and Astronomy, Heidelberg University, Germany（海德堡大学物理与天文学学院）； Hector Institute for AI in Psychiatry and Dept. of Psychiatry and Psychotherapy, CIMH, Mannheim, Germany（曼海姆中央心理健康研究所精神病AI研究所及精神病学与心理学系）； Hertie Institute for AI in Brain Health, University of Tübingen, Germany（图宾根大学脑健康AI研究所）

AI总结针对混沌系统替代建模中动态与概率目标不一致的问题，提出基于可微扩展卡尔曼滤波的KAFFEE框架，通过局部预测残差似然和雅可比协方差传播来缩小差距。

详情

AI中文摘要

动力系统重构旨在学习捕捉时间序列数据背后动力学的替代模型。可靠部署这些替代模型需要与所学动力学一致的不确定性估计。我们揭示了一个动态-概率一致性差距：追求有限时域概率目标可能会退化动力学，或使预测不确定性脱离其应反映的局部切向动力学。我们分离出这一差距背后的三种机制：核心坍缩、噪声掩盖和盲不确定性。具体来说，我们表明开环高斯滚动目标会惩罚混沌系统中雅可比生成的协方差增长，鼓励削弱物理扩张或使不确定性与之脱钩的优化捷径。为缓解这一差距，我们提出KAFFEE（用于遍历仿真的卡尔曼感知框架），这是一个基于可微扩展卡尔曼滤波的训练框架，在通过学习的局部雅可比传输协方差的同时，评估局部预测残差（新息）的似然。在随机超混沌Lorenz-96上，KAFFEE减少了已识别的失败模式，改善了相对于开环目标的动力学不变量重建，并保持了有竞争力的预测分数。我们进一步表明，当概率性地将DSR基础模型适应于13个混沌系统时，DPC差距出现，而KAFFEE在基本保留零样本动力学的同时实现了上下文贝叶斯滤波。

英文摘要

Dynamical systems reconstruction (DSR) aims to learn surrogate models that capture the dynamics underlying time-series data. Reliably deploying these surrogates requires uncertainty estimates consistent with the learned dynamics. We expose a dynamic-probabilistic consistency (DPC) gap: the pursuit of finite-horizon probabilistic objectives can degrade dynamics or decouple predictive uncertainty from the local tangent dynamics it ought to reflect. We isolate three mechanisms behind this gap: core collapse, noise masking, and blind uncertainty. Specifically, we show that open-loop Gaussian rollout objectives can penalize Jacobian-generated covariance growth in chaotic systems, encouraging optimization shortcuts that weaken physical expansion or decouple uncertainty from it. To mitigate this gap, we propose KAFFEE (Kalman-Aware Framework For Ergodic Emulation), a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals (innovations) while transporting covariance through learned local Jacobians. On stochastic hyperchaotic Lorenz-96, KAFFEE reduces the identified failure modes, improves reconstruction of dynamical invariants relative to open-loop objectives, and maintains competitive predictive scores. We further show that the DPC gap appears when probabilistically adapting a DSR foundation model across 13 chaotic systems, where KAFFEE enables in-context Bayesian filtering while largely preserving zero-shot dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.31539 2026-06-01 cs.CV cs.LG q-bio.QM 版本更新

Automated Prediction of Postoperative Pancreatic Fistula Using Preoperative Computed Tomography

利用术前计算机断层扫描自动预测术后胰瘘

Ashok Choudhary, Chris Varghese, Leo Y. Li-Han, Frank G. Lee, Ellen L. Larson, Elizabeth B. Habermann, Cornelius A. Thiels, Hojjat Salehinejad

发表机构 * Department of Surgery, Mayo Clinic, Rochester, MN, USA（梅奥诊所外科部，罗切斯特，明尼苏达州，美国）； Department of Surgery, University of Auckland, Auckland, NZ（奥克兰大学外科部，奥克兰，新西兰）； Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA（健康照护科学中心，梅奥诊所，罗切斯特，明尼苏达州，美国）； Department of Artificial Intelligence（人工智能部）

AI总结提出一种从胰腺分割到分类的端到端深度学习流程，利用术前CT扫描自动预测术后胰瘘风险，为临床决策提供工具和方法基准。

2605.31535 2026-06-01 cs.CV cs.AI cs.LG 版本更新

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

RayDer: 从真实世界视频中可扩展的自监督新视角合成

Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

发表机构 * Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心 (MCML)）

AI总结提出统一前馈变压器RayDer，将相机估计、场景重建和渲染整合为单一骨干，实现自监督新视角合成的可扩展幂律缩放，在零样本开放集性能上媲美有监督方法。

Comments Project Page: https://compvis.github.io/rayder

详情

AI中文摘要

自监督新视角合成（NVS）在扩展方面仍然具有挑战性，尽管视频数据丰富，这主要是由于在真实视频上训练的脆弱性以及多网络系统设计的难以预测的缩放行为。我们引入了RayDer，一个统一的前馈变压器，将相机估计、场景重建和渲染整合到一个单一骨干中，将自监督NVS转化为一个适定的单模型缩放问题。一个最小的动态状态，被视为干扰因素，吸收时变内容，使得在无约束的真实世界视频上稳定训练成为可能。重要的是，RayDer将静态场景NVS作为其目标任务：动态内容仅作为可扩展的监督被利用，而不是像动态场景（4D）NVS那样重建。在多个模型大小和数量级的数据上，RayDer展示了与数据和计算量相关的清晰幂律缩放，并优于静态场景数据混合。在大量基准测试中，RayDer实现了与最先进的有监督方法相竞争的强大零样本开放集性能。项目页面：https://compvis.github.io/rayder

英文摘要

Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem. A minimal dynamic state, treated as a nuisance factor, absorbs time-varying content and enables stable training on unconstrained real-world video. Importantly, RayDer keeps static-scene NVS as its target task: dynamic content is leveraged purely as scalable supervision, not reconstructed as in dynamic-scene (4D) NVS. Across multiple model sizes and orders of magnitude in data, RayDer exhibits clean power-law scaling with data and compute, and outperforms static-scene data mixtures. On a large number of benchmarks, RayDer achieves strong zero-shot open-set performance competitive with state-of-the-art supervised approaches. Project Page: https://compvis.github.io/rayder

URL PDF HTML ☆

赞 0 踩 0

通过IO感知层实现实现GNN的高效扩展

Daria Fomina, Daniil Krasylnikov, Alexey Boykov, Andrey Dolgovyazov, Vyacheslav Zhdanovskiy, Fedor Velikonivtsev

发表机构 * HSE University（俄罗斯高等经济大学）； ITMO University（ITMO大学）

AI总结针对GNN中稀疏不规则内存访问瓶颈，提出三种GPU内核族（SpMM卷积、归约聚合、注意力层）以减少数据移动并提升局部性，在真实图上实现高达8.5倍加速和76倍内存降低。

Comments International Conference on Machine Learning (ICML) 2026, Spotlight Paper

详情

AI中文摘要

图神经网络（GNN）受限于稀疏、不规则的内存访问。流行的框架如DGL和PyTorch Geometric支持通用消息传递，但复杂层通常具体化边中间结果，增加内存流量并限制在大图上的可扩展性。我们以I/O和算术强度为中心的观点表明，广泛使用的层分为三种内核族：基于SpMM的卷积、基于归约的聚合和基于注意力的层（GATv2/Graph Transformer）。对于每个族，我们开发了减少数据移动、改善局部性并在真实图上保持鲁棒性的GPU内核。我们还研究了图重排序，发现其影响取决于内核映射：它对邻居并行（以gather为主）内核的益处比特征并行设计更一致。实验表明，我们的融合注意力内核在Graph Transformer上达到高达$ extbf{3.9} imes$的加速（中位数$ extbf{1.6} imes$），在局部密集图上使用Tensor Core（块稀疏）变体达到高达$ extbf{7.3} imes$；对于GATv2，我们达到高达$ extbf{8.5} imes$的加速（中位数$ extbf{2.0} imes$），同时峰值内存降低高达$ extbf{76} imes$（中位数$ extbf{6} imes$）。我们的度感知归约内核达到高达$ extbf{10} imes$的加速（中位数$ extbf{2.6} imes$）。对于基于SpMM的层，适当缓存的cuSPARSE比DGL达到高达$ extbf{8} imes$的加速，并在大多数评估中优于评估的自定义基线。我们发布我们的实现作为即插即用的替代品，以支持可重现的、硬件感知的GNN加速。

英文摘要

Graph Neural Networks (GNNs) are bottlenecked by sparse, irregular memory access. Popular frameworks such as DGL and PyTorch Geometric support general message passing, but complex layers often materialize edge-wise intermediates, increasing memory traffic and limiting scalability on large graphs. We take an I/O- and arithmetic-intensity--centric view and show that widely used layers fall into three kernel families: SpMM-based convolutions, reduction-based aggregations, and attention-based layers (GATv2/Graph Transformer). For each family, we develop GPU kernels that reduce data movement, improve locality, and remain robust across realistic graphs. We also study graph reordering and find that its impact depends on the kernel mapping: it benefits neighbor-parallel (gather-dominated) kernels more consistently than feature-parallel designs. Empirically, our fused attention kernels reach up to $\textbf{3.9}\times$ speedup for Graph Transformer (median $\textbf{1.6}\times$), with Tensor Core (block-sparse) variants up to $\textbf{7.3}\times$ on locally dense graphs; for GATv2 we reach up to $\textbf{8.5}\times$ speedup (median $\textbf{2.0}\times$) while reducing peak memory by up to $\textbf{76}\times$ (median $\textbf{6}\times$). Our degree-aware reduction kernels achieve up to $\textbf{10}\times$ speedup (median $\textbf{2.6}\times$). For SpMM-based layers, properly cached cuSPARSE achieves up to $\textbf{8}\times$ speedup over DGL and outperforms evaluated custom baselines in the majority of evaluations. We release our implementations as drop-in replacements to support reproducible, hardware-aware GNN acceleration.

URL PDF HTML ☆

赞 0 踩 0

2605.31497 2026-06-01 cs.LG stat.ML 版本更新

Assign and Add: A Mechanistic Study of Compositional Arithmetic

Assign and Add: 组合算术的机制研究

Brady Exoo, Alberto Bietti, John Sous

发表机构 * Yale University（耶鲁大学）； Flatiron Institute（Flatiron研究所）

AI总结通过变量赋值和模加法任务，研究Transformer中组合泛化的机制，发现模型利用同一模加法模块处理直接和间接输入，并揭示了三阶段学习动态。

详情

AI中文摘要

大型语言模型能够组合技能以执行复杂任务，其中许多任务可能在训练期间未曾见过。这种组合发生的具体细节仍然难以捉摸。在本文中，我们通过考虑一个涉及变量赋值和模加法的简单受控设置，研究Transformer中组合泛化的机制。通过将训练数据划分为不相交的集合，我们观察到小型Transformer能够泛化到先前未见过的变量和数字组合。我们的机制分析表明，无论输入是直接给出还是通过单独的变量赋值机制间接给出，都使用相同的“模加法”MLP模块。我们还从经验角度分析了训练动态，揭示了三个学习阶段：首先学习模加法，然后学习变量赋值所需的结构，最后是精炼阶段，模型泛化到训练中未见的一些困难序列。最后，我们提供了一个理论框架来解释组合性如何从训练动态中涌现。这些结果表明，组合泛化可以是Transformer内部机制组合性的自然结果。

英文摘要

Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.

URL PDF HTML ☆

赞 0 踩 0

2605.31494 2026-06-01 cs.CL cs.LG 版本更新

Consolidating Rewarded Perturbations for LLM Post-Training

整合奖励扰动用于大语言模型后训练

Zheyu Zhang, Shuo Yang, Gjergji Kasneci

发表机构 * Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心（MCML））

AI总结提出CoRP方法，通过奖励加权聚合、兼容性重加权和验证门控，将奖励扰动整合为单一模型，无需梯度，在单次推理下平均提升8.1分。

详情

AI中文摘要

语言模型的后训练通常被框架为通过梯度下降实现的样本-分数-更新循环。最近的一系列工作，以RandOpt为例，将此循环转移到权重空间，在预训练模型周围采样高斯扰动，并在推理时集成前K个奖励专家。虽然在与PPO和GRPO匹配训练计算量下具有竞争力，但这种预测级集成每个测试样本需要K次前向传播，并且不能干净地扩展到自由生成。我们询问是否可以将奖励种群折叠成一个单一的可部署模型，用一次整合更新替代推理时集成。对25个模型-任务对的拆分半分析揭示了每种情况下可复现的低秩结构。我们将这种几何结构转化为CoRP（整合奖励扰动），这是一种无梯度算子，结合了奖励加权聚合、兼容性感知重加权和保留验证门控，且没有梯度通过语言模型。在从0.5B到8B的五个语言模型和涵盖数学、代码和创意写作的五个任务上，CoRP平均将基础模型提升了8.1分。使用RandOpt扰动预算的十分之一，CoRP超过了单次推理的RandOpt 6.5分，并恢复了50次多数投票集成增益的一半以上，而每个测试样本只需一次前向传播。

英文摘要

Post-training of language models is commonly framed as a sample-score-update loop implemented by gradient descent. A recent line of work, exemplified by RandOpt, relocates this loop to weight space, sampling Gaussian perturbations around a pretrained model and ensembling the top-K rewarded specialists at inference. While competitive with PPO and GRPO under matched training compute, this prediction-level ensemble incurs K forward passes per test example and does not extend cleanly to free-form generation. We ask whether the rewarded population can instead be folded into a single deployable model, replacing the inference-time ensemble with one consolidated update. A split-half analysis over 25 model-task pairs reveals reproducible low-rank structure in every case. We turn this geometry into CoRP (Consolidating Rewarded Perturbations), a gradient-free operator that combines reward-weighted aggregation, compatibility-aware reweighting, and a held-out validation gate, with no gradient flowing through the language model. Across five language models from 0.5B to 8B and five tasks covering math, code, and creative writing, CoRP improves the base model by 8.1 points on average. Using one tenth of RandOpt's perturbation budget, CoRP exceeds single-inference RandOpt by 6.5 points and recovers more than half of the gain of the 50-pass majority-vote ensemble, at one forward pass per test example.

URL PDF HTML ☆

赞 0 踩 0

2605.31485 2026-06-01 cs.LG math.CT 版本更新

Graphical einops: bridging tensor networks and computation graphs

Graphical einops: 桥接张量网络与计算图

Vincent Wang-Maścianica, Nikhil Khatri

发表机构 * Laboratory for Human-Centered AI, Department of Philosophy, University of Oxford（人类中心人工智能实验室，哲学系，牛津大学）； Machine Learning Research Group, Department of Engineering Science, University of Oxford（机器学习研究组，工程科学系，牛津大学）

AI总结本文提出一种形式化的图形演算，用于einops的张量编程结构片段，通过等级自然性重写实现张量等变性的图解证明，并应用于注意力掩码转换以优化稀疏注意力实现。

2605.31484 2026-06-01 cs.LG 版本更新

Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence

平衡LoRA：消除参数不变性以加速收敛

Valérie Castin, Kimia Nadjahi, Pierre Ablin, Gabriel Peyré

发表机构 * Apple, Paris, France（苹果公司，巴黎，法国）； CNRS（法国国家科学研究中心）

AI总结针对LoRA过参数化导致不同低秩因子对条件数差异大而影响收敛速度的问题，提出BaLoRA，通过投影到平衡流形改善损失景观条件，实现更快收敛和更优性能。

Comments Accepted at ICML 2026

2605.31464 2026-06-01 cs.LG cs.AI 版本更新

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

GPU预测器：语言模型作为内核运行时优化的选择性替代

Zaid Khan, Justin Chih-Yao Chen, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

发表机构 * UNC Chapel Hill（北卡罗来纳大学教堂山分校）； AI2 ； Johns Hopkins University（约翰霍普金斯大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结研究利用语言模型作为GPU内核性能的选择性替代，通过强化学习提高预测准确性和校准度，在有限GPU评估预算下加速内核搜索。

Comments Code: https://github.com/codezakh/gpu-forecasters

详情

AI中文摘要

GPU内核是现代深度学习的主力，优化它们（通过进化搜索或编码代理）通常需要在目标硬件上重复测量。虽然这些测量提供了内核搜索所需的地面真实信号，但成本高昂，因为每次评估内核都需要编译并在GPU上重复执行。随着LLM推理的改进降低了编写新内核的成本，并且LLM驱动的搜索扩展到大的搜索预算，设备上的评估成为瓶颈。为了解决这个问题，我们研究LLM如何通过预测所提议内核的性能，作为选择性GPU替代用于内核评估。一个有用的替代应该是准确的，并且应该是选择性的，知道何时可能出错，并推迟到GPU。为了评估替代，我们测量其预测是否准确、校准良好，并且在有限的GPU测量预算下对恢复快速内核实际有用。接下来，我们研究强化学习是否能提高预测准确性和置信度校准。我们的实验表明，LLM可以准确预测相对内核性能，并且通过强化学习可以提高其实用性。在内核搜索中使用替代，使得搜索在相同的GPU评估预算下可以考虑多倍的候选，从而比同等预算的基线找到更快的内核。这些结果表明，LLM可以在内核优化中发挥更广泛的作用，作为GPU的虚拟模型，而不仅仅是搜索的内核生成器。

英文摘要

GPU kernels are the workhorse of modern deep learning, and optimizing them (via evolutionary search or coding agents) usually requires repeated measurement on target hardware. While these measurements provide the ground-truth signal necessary for kernel search, they are costly, because each evaluation of a kernel requires compilation and repeated execution on a GPU. As improvements in LLM inference reduce the cost of writing novel kernels and LLM-driven searches scale to large search budgets, on-device evaluation becomes a bottleneck. To address this, we study how LLMs can serve as selective GPU surrogates for kernel evaluation, by forecasting the performance of proposed kernels. A useful surrogate should be accurate, and it should be selective, by knowing when it could be wrong, and deferring to the GPU. To evaluate surrogates, we measure whether their forecasts are accurate, calibrated, and practically useful for recovering fast kernels under limited GPU-measurement budgets. Next, we study whether reinforcement learning can improve forecast accuracy and confidence calibration. Our experiments demonstrate that LLMs can accurately forecast relative kernel performance, that their utility can be improved through reinforcement learning. Used inside a kernel search, the surrogate lets the search consider several times as many candidates under the same GPU evaluation budget, and that leads to finding faster kernels than an equal-budget baseline. These results suggest that LLMs can play a broader role in kernel optimization, by acting as virtual models of a GPU rather than solely as kernel generators for search.

URL PDF HTML ☆

赞 0 踩 0

2605.31463 2026-06-01 cs.LG cs.AI cs.CL cs.DC 版本更新

PithTrain: A Compact and Agent-Native MoE Training System

PithTrain: 一个紧凑且面向智能体的MoE训练系统

Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy, Zichun Yu, Junru Shao, Todd C. Mowry, Chenyan Xiong, Tianqi Chen

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Xlue ； NVIDIA（英伟达）

AI总结提出PithTrain，一个基于智能体原生设计原则的紧凑型MoE训练框架，通过引入ATE-Bench评估智能体任务效率，在保持生产框架吞吐量的同时，将智能体任务轮次和活跃GPU时间分别降低62%和64%。

详情

AI中文摘要

混合专家模型（MoE）已成为前沿语言模型的主导架构。为满足这一需求，生产框架经过多年的工程努力构建了优化的MoE训练栈。然而，为新的架构和系统优化而演进这些栈仍然代价高昂。随着AI编码智能体的兴起，它们可以自动化训练框架开发的部分工作并加速这一演进。但将这些智能体应用于现有框架会带来隐藏成本，这些成本在当今仅关注吞吐量的评估中不可见。我们将这一缺失维度命名为智能体任务效率（ATE）：即使用编码智能体理解、操作和扩展框架的成本。基于四个智能体原生设计原则，我们构建了PithTrain，一个紧凑、智能体原生的MoE训练框架。我们进一步引入了ATE-Bench，涵盖现实世界的训练框架任务。我们的评估表明，PithTrain在吞吐量上与生产框架相当，并且在ATE-Bench上，PithTrain实现了更高的智能体任务效率，智能体轮次减少高达62%，活跃GPU时间减少64%。

英文摘要

Mixture-of-Experts (MoE) has become the dominant architecture for frontier language models. To meet this demand, production frameworks have built optimized MoE training stacks over years of engineering effort. Yet evolving these stacks for new architectures and system optimizations remains expensive. With the rise of AI coding agents, they could automate parts of training-framework development and accelerate this evolution. But applying them to these existing frameworks carries hidden costs, invisible to today's throughput-only evaluations. We name this missing dimension agent-task efficiency (ATE): the cost of using coding agents to understand, operate, and extend a framework. Grounded in four agent-native design principles, we build PithTrain, a compact, agent-native MoE training framework. We further introduce ATE-Bench, covering real-world training-framework tasks. Our evaluation shows PithTrain matches the throughput of production frameworks, and on ATE-Bench, PithTrain enables higher agent-task efficiency, with up to 62% fewer Agent Turns and 64% less Active GPU Time.

URL PDF HTML ☆

赞 0 踩 0

2605.31455 2026-06-01 cs.LG cs.CL 版本更新

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

DRIFT: 解耦的轨迹采样与重要性加权微调以实现高效的多轮优化

Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu

发表机构 * The Hong Kong University of Science（香港科学与技术大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结针对多轮交互中在线强化学习成本高而离线监督微调存在分布偏移的问题，提出DRIFT框架，通过将KL正则化强化学习目标等价转化为重要性加权监督学习，实现高效且稳定的多轮优化。

详情

AI中文摘要

大型语言模型越来越多地部署在多轮交互环境中，用户或环境可以迭代地提供轻量级反馈。不幸的是，优化这种行为在实践中面临一个尖锐的困境：在线强化学习能够有效处理多轮动态，但由于每次更新时生成完整修正轨迹的成本过高而变得昂贵，而离线监督微调（SFT）虽然高效，但存在分布偏移和行为崩溃的问题。为此，我们创新性地提出了DRIFT（解耦的轨迹采样与重要性加权微调）框架，该框架实现了KL正则化强化学习目标等价于重要性加权监督学习的理论洞察。DRIFT通过从固定参考策略中采样离线交互轨迹，推导基于回报的重要性权重，并通过在所得数据集上进行加权SFT来优化策略，从而将轨迹采样与优化解耦。实验表明，DRIFT在多轮强化学习基线中达到或超越其性能，同时保持了标准监督微调的训练效率和简单性。代码可在 https://github.com/2020-qqtcg/DRIFT 获取。

英文摘要

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereas offline supervised fine-tuning (SFT) is efficient but suffers from distribution shift and behavioral collapse. To this end, we novelly propose DRIFT (Decoupled Rollouts and Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that the KL-regularized RL objective is equivalent to importance-weighted supervised learning. DRIFT decouples rollout from optimization by sampling offline interaction trajectories from a fixed reference policy, deriving return-based importance weights, and optimizing the policy via weighted SFT on the resulting dataset. Empirically, we demonstrate that DRIFT matches or exceeds the performance of multi-turn reinforcement learning baselines while maintaining the training efficiency and simplicity of standard supervised fine-tuning. Code is available at https://github.com/2020-qqtcg/DRIFT.

URL PDF HTML ☆

赞 0 踩 0

2605.31445 2026-06-01 cs.GT cs.AI cs.CL cs.LG 版本更新

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

二手车销售机器人？作为讨价还价代理的LLM在部分信息下的诚实与轻信

Antonio Valerio Miceli-Barone, Vaishak Belle, Shay B. Cohen

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结研究LLM代理在模拟讨价还价场景中的表现，发现它们偏离博弈论均衡，尝试撒谎但无法有效利用信息不对称，且优化财务效用会增强谈判能力但增加不诚实行为。

Comments 18 pages, 14 figures

详情

AI中文摘要

在这项工作中，我们研究了模拟讨价还价场景中的代理，其中买方和卖方通过文本渠道进行通信，并试图在不同信息制度（完全信息、信息不对称或相互不确定性）下谈判互利交易。我们评估了它们相对于博弈论解决方案的表现，并进一步调查了它们的诚实性（披露或隐瞒信息、误导或欺骗的倾向）以及轻信性（信任或不信任对方提供信息的倾向）。我们研究了零样本LLM代理（使用简单的提示脚手架）以及微调代理，以探讨优化代理以最大化财务利润是否使它们成为更强的谈判者，但也更不诚实和更不信任。我们发现，现成的LLM都显著偏离博弈论均衡，它们试图对自己的私人信息撒谎，但无法有效利用信息不对称。对财务效用的微调使代理在达成更好交易方面更强，但也更不诚实，这突显了优化代理任务对其安全性可能带来的风险。我们发布了我们的代码和一个讨价还价场景数据集。

英文摘要

In this work we study agents in simulated bargaining scenarios, where a buyer and a seller communicate through a text channel and attempt to negotiate mutually beneficial trades, under different information regimes (complete information, information asymmetry or mutual uncertainty). We evaluate their performance w.r.t. game-theoretical solutions and further investigate their honesty (their tendency to disclose or withhold information or to mislead and deceive) as well as their credulity (their tendency to trust or distrust information provided by the other agent). We study zero-shot LLM agents with simple prompting scaffolding as well as fine-tuned agents, in order to investigate whether optimising the agents to maximise financial profits makes them stronger negotiators but also more dishonest and less trusting. We find that off-the-shelf LLMs all substantially deviate from game-theoretical equilibria, they attempt to lie about their private information but cannot efficiently exploit information asymmetries. Fine-tuning on financial utility makes the agents stronger at achieving better deals but also more dishonest, highlighting the risks that optimising agents for a task can have on their safety. We release our code and a dataset of bargaining scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.31443 2026-06-01 stat.ME cs.LG econ.EM math.ST stat.TH 版本更新

Modeling Covariate Transition for Efficient Estimation of Longitudinal Treatment Effects in Randomized Experiments

建模协变量转移以高效估计随机实验中的纵向处理效应

Naoki Chihara, Tatsushi Oka, Yasuko Matsubara, Yasushi Sakurai, Shota Yasui

发表机构 * SANKEN, The University of Osaka, Osaka, Japan.（大阪大学SANKEN分校）； CyberAgent, Inc., Tokyo, Japan（日本东京CyberAgent公司）； Keio University, Tokyo, Japan.（东京大学）

AI总结提出一种回归调整框架，通过建模协变量转移来估计随机实验中的纵向处理效应，并实现渐近正态性和半参数有效性。

Comments Accepted by ICML'26

详情

Journal ref: The 43rd International Conference on Machine Learning, 2026

AI中文摘要

我们提出一个回归调整框架，用于在静态制度下估计随机实验中的纵向处理效应。虽然回归调整方法通过使用预处理协变量有助于随机实验中的方差减少，但它们通常只关注平均效应，从中我们无法获得关于效应何时出现以及持续多久的有价值见解。为了解决这个问题，我们考虑随时间变化的中间结果和事后协变量，并使用转移核表示这些动态轨迹。此外，我们建立了估计量的渐近正态性和半参数效率界，从而实现更强大的统计推断。使用日本某流媒体平台的A/B测试数据进行的模拟研究和实证分析显示了我们的方法的实际优势。

英文摘要

We present a regression-adjustment framework designed for the estimation of longitudinal treatment effects in randomized experiments under static regimes. While regression-adjustment methods are useful for variance reduction in randomized experiments by using pre-treatment covariates, they usually focus only on average effects, from which we cannot obtain valuable insights into when the effects appear and how long they continue. To address this issue, we consider intermediate outcomes and evolving post-treatment covariates over time, and we represent such dynamic trajectories using transition kernels. Furthermore, we establish the asymptotic normality and the semiparametric efficiency bound for our estimator, enabling more powerful statistical inference. Simulation studies and empirical analysis using A/B test data from a streaming platform in Japan show the practical advantages of our method.

URL PDF HTML ☆

赞 0 踩 0

2605.31438 2026-06-01 cs.LG 版本更新

Flow map learning in nonlinear vector autoregressive models: influence of the feature-library structure on the training error

非线性向量自回归模型中的流映射学习：特征库结构对训练误差的影响

Markus Gross

发表机构 * Institute for AI Safety and Security, German Aerospace Center (DLR)（人工智能安全与保密研究所，德国航空航天中心（DLR））

AI总结研究非线性向量自回归模型（NVAR/NG-RC）中特征库结构如何影响训练误差，揭示了训练误差随时间分辨率遵循的标度律，并指出特征库能否精确表示流映射的Lie级数系数决定了误差行为。

Comments 35 pages, 12 figures

详情

AI中文摘要

时间序列预测通常需要学习非线性和时滞依赖关系。一类典型的预测模型是非线性向量自回归过程（NVAR），也称为下一代储层计算机（NG-RC）。这些模型在其显式特征库张成的空间上近似Koopman算子。我们考虑学习马尔可夫非线性动力系统的可辨识性问题，并表明训练误差作为时间分辨率的函数遵循特征性的（预）渐近标度律。这些定律取决于特征库能否精确或仅近似表示流映射（传播子）的早期Lie级数系数。对于由多项式向量场控制的动力系统，我们展示了具有单项式和傅里叶特征库的NVAR/NG-RC模型的机制。我们确定了训练误差对时间分辨率、涉及的非线性阶数和延迟项数量的依赖性。虽然延迟项减少了最优单步训练误差，但只有当库提供足够的非线性时，它们才能改善长期预测。因此，当模型类与真实数据生成过程不匹配时，小的训练误差与弱的泛化能力共存。在各种混沌动力系统上的数值实验证实了理论预测。

英文摘要

Time series forecasting often requires learning nonlinear and time-delayed dependencies. A paradigmatic class of forecasting models are nonlinear vector autoregressive processes (NVAR), also known as next-generation reservoir computers (NG-RCs). These models approximate the Koopman operator on the space spanned by their explicit feature library. We consider the identifiability problem for learning Markovian nonlinear dynamical systems and show that the training error as a function of time resolution follows characteristic (pre-)asymptotic scaling laws. These laws depend on whether the feature library can represent the early Lie-series coefficients of the flow map (propagator) exactly or merely approximately. For dynamical systems governed by polynomial vector fields, we demonstrate the mechanism for NVAR/NG-RC models with monomial and Fourier feature libraries. We determine the dependence of the training error on the temporal resolution, the involved nonlinear degree, and the number of delay terms. While delay terms reduce the optimal one-step training error, they improve long-horizon forecasts only when the library provides sufficient nonlinearity. Thus, small training error coexists with weak generalization as the model class is mismatched to the true data-generating process. Numerical experiments on various chaotic dynamical systems confirm the theoretical predictions.

URL PDF HTML ☆

赞 0 踩 0

2605.31427 2026-06-01 cs.LG cs.DC 版本更新

DG-CoLearn: An Efficient Collaborative Learning Framework for Dynamic Graphs

DG-CoLearn：一种高效的动态图协同学习框架

Ashley Hoi-Ting Au, Zikun Zhang, Ligang He, Qiang Ni

发表机构 * Department of Computer Science, The University of Warwick（华威大学计算机科学系）； School of Computing and Communications, Lancaster University（兰卡斯特大学计算机与通信学系）

AI总结针对动态图学习中重复全快照重训练计算开销大且不适用于分区数据协同场景的问题，提出基于增量图快照处理的客户端无感知协同学习框架DG-CoLearn，通过服务器中介的嵌入交换机制实现准确的多跳消息传递，在训练速度、通信开销和预测性能上均取得显著提升。

详情

AI中文摘要

基于深度学习的Hyper-Kamiokande实验低能量触发算法

Katharina Lachner, Saúl Alonso-Monsalve, Benjamin Richards, Davide Sgalaberna

发表机构 * University of Warwick（沃里克大学）

AI总结本文针对Hyper-Kamiokande实验的低能中微子事件（<7 MeV），提出并比较了监督式神经网络和基于异常检测（自编码器与MPDR）的触发算法，在3 MeV单电子信号上效率分别达76.7%和31.8%，远超传统命中计数触发的26.4%，且GPU推理延迟低于毫秒级，满足实时运行需求。

Comments 16 pages, 6 figures

详情

AI中文摘要

现代机器学习技术因其强大的模式识别能力在粒子物理学中变得越来越重要，包括在具有严格运行时间约束的实时数据采集中。本文详细介绍了针对大型水切伦科夫探测器（如Hyper-Kamiokande）的低能中微子事件（低于7 MeV）的基于深度学习的触发算法的性能。展示了自定义神经网络监督分类器的性能，以及两种仅基于探测器噪声训练的异常检测方法：纯自编码器和基于流形投影-扩散恢复（MPDR）的能量模型。监督模型对动能为3 MeV的单电子信号识别效率为76.7%，显著超过了传统基于命中计数触发的26.4%的信号效率，MPDR方法也达到了31.8%。在GPU上的运行时间评估显示，每窗口推理延迟远低于毫秒量级，表明实时操作是可行的。

英文摘要

Modern machine learning techniques have become increasingly important in particle physics because of their powerful pattern-recognition capabilities, including in real-time data acquisition where stringent runtime constraints apply. This paper details the performance of deep-learning-based trigger algorithms for a large water Cherenkov detector such as Hyper-Kamiokande aimed at low-energy neutrino events (below 7 MeV). The performance of custom neural-network supervised classifiers is shown alongside two anomaly-detection approaches trained solely on detector noise: a pure autoencoder and an energy-based model based on Manifold Projection--Diffusion Recovery (MPDR). The supervised model shows signal identification efficiencies of 76.7% for single electrons of 3 MeV kinetic energy, significantly exceeding signal efficiencies obtained from a traditional hit-count-based trigger of 26.4%, as does the MPDR approach with 31.8%. Runtime evaluations on GPU yield per-window inference latencies well below the millisecond scale, indicating that real-time operation is feasible.

URL PDF HTML ☆

赞 0 踩 0

2605.31388 2026-06-01 cs.LG 版本更新

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion

David Fernández-Narro, Pablo Ferri, Ángel Sánchez-García, Juan M. García-Gómez, Carlos Sáez

发表机构 * Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de Valéncia（生物医学数据科学实验室，信息与通信技术大学，巴塞罗那理工大学）

AI总结本文介绍dashi，一个开源Python库，通过无监督（基于信息几何和非参数统计流形）和有监督方法，对数据集偏移进行探索、量化和表征，以支持AI生命周期中的可信度评估。

详情

AI中文摘要

人工智能（AI）生命周期需要对底层数据动态有透彻理解，以实现稳健、安全且经济高效的AI开发和使用。数据集偏移定义为训练和测试数据分布之间的变化。无论是随时间（时间性）还是跨不同站点（多源）发生，它们都可能严重降低模型性能并损害数据质量。这在健康AI中尤为重要，因为不受控制的偏移在训练和操作阶段都可能严重影响患者的安全和基本权利。虽然协变量偏移、先验偏移和概念偏移的理论基础已很完善，但缺乏可访问且全面的软件工具来执行其分析。我们介绍了dashi，一个开源Python库，旨在对数据集偏移进行探索、量化和表征。dashi提供双重方法：一种无监督方法，利用信息几何和非参数统计流形进行数据变异性表征和分析（例如，信息几何时间图和多源变异性指标，如全局概率偏差和源概率异常度）；以及一种有监督方法，量化和表征模型性能退化。无监督和有监督方法均适用于用户定义的时间批次和域/源批次。我们在三个模拟和真实世界的健康AI案例研究（妊娠期糖尿病、COVID-19和紧急医疗调度）中展示了dashi的实用性。通过提供交互式视觉分析和变异性指标，dashi支持AI生命周期阶段的可信度，通过评估数据一致性和AI性能实现稳健且安全的机器学习管道。

英文摘要

The Artificial Intelligence (AI) life cycle requires a thorough understanding of the underlying data dynamics for robust, safe and cost-effective AI development and use. Dataset shifts are defined as changes between train and test data distributions. Whether occurring over time (temporal) or across different sites (multi-source), they can severely degrade model performance and compromise data quality. This is particularly important in health AI, where the safety and fundamental rights of patients can be severely affected by uncontrolled shifts both at training and operational stages. While the theoretical foundations of covariate, prior, and concept shifts are well established, there is a lack of accessible and comprehensive software tools to perform their analysis. We introduce dashi, an open-source Python library designed for the exploration, quantification, and characterization of dataset shifts. dashi provides a dual approach: an unsupervised approach that leverages information geometry and non-parametric statistical manifolds to data variability characterization and analysis (e.g., Information Geometric Temporal plots and Multi-Source Variability metrics like Global Probabilistic Deviation and Source Probabilistic Outlyingness), and a supervised approach that quantifies and characterizes model performance degradation. Both unsupervised and supervised approaches work across user-defined temporal and domain/source batches. We demonstrate the utility of dashi on three simulated and real-world health AI case studies on gestational diabetes mellitus, COVID-19 and emergency medical dispatch. By providing interactive visual analytics and variability metrics, dashi supports trustworthiness of AI life cycle stages enabling robust and safe machine learning pipelines through the assessment of data coherence and AI performance.

URL PDF HTML ☆

赞 0 踩 0

2605.31354 2026-06-01 cs.AI cs.LG 版本更新

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

资源受限视觉代理中共享状态协作的故障模式诊断

Yunpeng Zhou

发表机构 * Nanjing University of Information Science \& Technology, Nanjing, China

AI总结本文通过噪声累积视角研究弱学习者（4B-8B模型）在共享工作记忆下的协作推理故障模式，提出CoSee审计框架追踪文档视觉问答中的信息流，发现朴素共享工作空间会放大幻觉而非解决，并识别出噪声强化和策略崩溃两种主要故障模式。

详情

AI中文摘要

模块化视觉推理系统越来越依赖共享工作记忆进行多步协作，但低容量场景下中间状态演化的故障动态仍未被充分探索。我们通过噪声累积的视角研究弱学习者（4B-8B模型）的协作推理故障模式。我们引入了CoSee，一个审计框架，形式化了读-写-验证循环以追踪文档视觉问答中的信息流。在多页、图表和基于网页的基准测试中，我们发现了一个反直觉的退化：朴素的共享工作空间往往放大而非解决幻觉。我们识别出两种主要的故障模式：噪声强化（未基于事实的笔记被重新用作证据）和策略崩溃（添加的上下文使模型转向欠指定的短形式答案）。使用成本-准确率帕累托前沿，我们表明增加计算量在没有显式验证的情况下可能与性能负相关。我们的发现表明，对于资源受限的代理，瓶颈不在于推理深度而在于通信保真度，为可靠的模块化设计提供了轨迹级诊断和机制基线。

英文摘要

Modular visual reasoning systems increasingly rely on shared working memory for multi-step collaboration, yet the failure dynamics of intermediate state evolution in low-capacity regimes remain underexplored. We study failure modes of collaborative reasoning with weak learners (4B--8B models) through the lens of noise accumulation. We introduce CoSee, an auditing framework that formalizes the read-write-verify loop to trace information flow in document visual question answering. Across multi-page, chart, and web-based benchmarks, we find a counter-intuitive degradation: naive shared workspaces often amplify hallucinations rather than resolve them. We identify two dominant failure modes: Noise Reinforcement, where ungrounded notes are reused as evidence, and Policy Collapse, where added context shifts the model toward under-specified, short-form answers. Using cost-accuracy Pareto frontiers, we show that increased compute can correlate negatively with performance without explicit verification. Our findings suggest that for resource-constrained agents, the bottleneck lies not in reasoning depth but in communication fidelity, providing trace-level diagnostics and a mechanistic baseline for reliable modular design.

URL PDF HTML ☆

赞 0 踩 0

2605.31346 2026-06-01 math.OC cs.LG 版本更新

多智能体强化学习中的广义意图建模

Mateusz Odrowaz-Sypniewski, Jasmine Bayrooti, Ajay Shankar, Amanda Prorok

发表机构 * Department of Computer Science and Technology, University of Cambridge, UK（计算机科学与技术系，剑桥大学，英国）

AI总结提出一种任务自适应的对手建模框架，通过性能驱动的多意图表示混合及最大化与自我智能体未来回报的互信息的新意图表示，提升非合作多智能体环境中的决策性能。

详情

AI中文摘要

在非合作、竞争和一般和的多智能体强化学习中，建模对手的意图对于有效决策至关重要。现有的对手建模方法使用从先验选择的回合信息（如对手的下一个动作或未来环境状态）中提取的嵌入来编码意图，并以此引导自我智能体的行为。这些方法假设所选信息普遍代表意图；然而，我们通过实验证明情况并非如此，因为意图通常依赖于任务和环境。为了解决这个问题，我们引入了一个任务自适应的对手建模框架，该框架学习一种性能驱动的多意图表示混合。此外，我们提出了一种新的意图表示，它最大化与自我智能体未来回报的互信息，从而捕获与性能最直接相关的对手信息。我们的方法在各种任务中始终匹配或超越最先进基线的性能，并揭示了不同对手建模策略何时以及为何成功。

英文摘要

Modeling an opponent's intent is critical for effective decision-making in non-cooperative, competitive, and general-sum multi-agent reinforcement learning. Existing opponent modeling methods encode intent using an embedding derived from episode information chosen a priori, such as the opponent's next action or a future environment state, and use this to guide the ego-agent's behavior. These approaches assume that the chosen information is universally representative of intent; however, we show empirically that this is not the case as intentions are often task- and environment-dependent. To address this, we introduce a task-adaptive opponent modeling framework that learns a performance-driven mixture of multiple intent representations. We further introduce a new intention representation that maximizes mutual information with the ego-agent's future returns, thereby capturing opponent information that is most directly relevant to performance. Our approach consistently matches or exceeds the performance of state-of-the-art baselines across diverse tasks and yields insights into when and why different opponent modeling strategies succeed.

URL PDF HTML ☆

赞 0 踩 0

2605.31317 2026-06-01 cs.LG 版本更新

Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning

遗忘有邻居：机器遗忘中的局部连带遗忘

Polina Dolgova, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍茨信息安全中心）； Universität des Saarlandes（萨尔兰州立大学）

AI总结本文研究机器遗忘中梯度上升和随机标签方法导致的局部连带遗忘现象，并提出了基于局部教师蒸馏的缓解策略。

详情

AI中文摘要

机器遗忘旨在无需完全重新训练的情况下移除选定训练样本的影响。标准评估通常使用聚合指标（如准确率和遗忘分数）来概括遗忘质量，这可能会掩盖局部失败。我们通过比较遗忘模型与删除后重新训练模型的预测，在样本级别研究这种失败模式。我们表明，这种逐点差异可能高度不均匀：对于梯度上升和随机标签方法，无论是否进行保留集微调，差异都随着与遗忘集的几何接近度而增大。我们将这种现象称为局部连带遗忘。我们的分析确定了该效应背后的机制：遗忘过程中使用的替代目标可能与重新训练引起的局部预测结构不一致，并且这种不一致通过共享表示传播到邻近样本。受此机制启发，我们提出了局部教师蒸馏，一种简单的缓解策略，用仅在遗忘集的保留邻居上训练的小教师生成的软标签替换随机目标。在CIFAR-100部分类别删除任务中，这种局部教师使遗忘模型更接近重新训练，尤其是在遗忘集附近，同时保持有竞争力的聚合遗忘指标。

英文摘要

Machine unlearning aims to remove the influence of selected training examples without full retraining. Standard evaluations often summarize unlearning quality with aggregate metrics, such as accuracy- and forgetting-based scores, which can hide localized failures. We study this failure mode at the example level by comparing the predictions of an unlearned model to those of the model retrained after deletion. We show that this pointwise discrepancy can be highly non-uniform: for gradient-ascent and random-labeling methods, with and without retain-set fine-tuning, it grows with geometric proximity to the forget set. We call this phenomenon localized collateral forgetting. Our analysis identifies a mechanism behind the effect: surrogate targets used during unlearning can be inconsistent with the local prediction structure induced by retraining, and this inconsistency propagates through shared representations to nearby examples. Motivated by this mechanism, we propose Local Teacher Distillation, a simple mitigation strategy that replaces random targets with soft labels from a small teacher trained only on retained neighbors of the forget set. On CIFAR-100 partial-class deletion, this local teacher brings the unlearned model substantially closer to retraining, especially near the forget set, while maintaining competitive aggregate unlearning metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.31315 2026-06-01 cs.LG 版本更新

Graph Neural Networks Are Not Continuous Across Graph Resolutions

图神经网络在图分辨率上不连续

Christian Koke, Yuesong Shen, Abhishek Saroha, Marvin Eisenberger, Bastian Rieck, Michael Bronstein, Daniel Cremers

发表机构 * Munich Center for Machine Learning（慕尼黑机器学习中心）； Technical University of Munich, Munich（慕尼黑技术大学）； University of Fribourg（弗里堡大学）； Institute of Computational Biology（计算生物学研究所）； University of Oxford（牛津大学）

AI总结本文证明图神经网络在自然图收敛模式下不连续，并提出一种基于信息传播方案的结构性修改，使其具备跨尺度连续性，从而实现对不同分辨率的稳定整合与泛化。

Comments arXiv admin note: text overlap with arXiv:2310.00431

2605.31311 2026-06-01 math.OC cs.DC cs.LG 版本更新

mRNAutilus：多目标引导的mRNA离散生成与优化治疗特性

Sawan Patel, Sophia Tang, Yesol Kim, Yinuo Zhang, Divya Srijay, Ping-Jung Lin, Shambhavi Shubham, Fengmei Pi, Cedric Wu, Sherwood Yao, Pranam Chatterjee

发表机构 * Atom Bioworks Inc.（Atom Bioworks公司）； Department of Computer and Information Science University of Pennsylvania（宾夕法尼亚大学计算机与信息科学系）； Department of Bioengineering University of Pennsylvania（宾夕法尼亚大学生物工程系）； Center of Computational Biology Duke-NUS Medical School（杜克-新加坡国立大学医学学院计算生物学中心）； GenScript USA Inc.（GenScript美国公司）

AI总结提出mRNAutilus框架，结合掩码离散扩散模型和蒙特卡洛树引导，实现同时优化密码子和从头设计UTR，生成多目标帕累托最优的完整mRNA序列，在多个靶标上显著提升表达和稳定性。

详情

AI中文摘要

治疗性mRNA设计需要协调整个转录本中多个相互作用的序列特征，其中密码子使用、非翻译区（UTR）及其耦合共同决定稳定性、翻译效率和蛋白质表达。在这里，我们提出通过展开轨迹和信息潜在更新生成mRNA（mRNAutilus），这是一个直接从序列进行同时密码子优化和从头UTR设计的框架。mRNAutilus结合了在数百万全长mRNA上训练的掩码离散扩散模型与蒙特卡洛树引导，在多个功能目标下生成帕累托高效序列，使用模型嵌入上的轻量级回归器预测半衰期、翻译效率和蛋白质丰度。与最近分别设计编码序列和UTR或依赖事后组装和筛选的方法不同，mRNAutilus在单个过程中生成完整转录本，并跨属性优化。在多种靶标上，编码P. pyralis荧光素酶的零样本mRNA表达量比野生型高400倍以上，并优于商业和机器学习设计的基线，包括零样本生成方法。零样本SARS-CoV-2 Spike mRNA超过临床使用和商业构建体，并匹配或超越实验室优化设计，同时具有更好的耐久性。我们进一步展示了在治疗环境中的通用性，包括先导编辑（PEMax）和可编程蛋白质组调节，其中mRNAutilus设计的构建体增强了用于β-连环蛋白降解的肽引导E3连接酶（uAbs）的表达。这些结果建立了一个基于序列的多目标框架，用于生成适用于多种生物应用的功能性mRNA。

英文摘要

Therapeutic mRNA design requires coordinating multiple interacting sequence features across the full transcript, where codon usage, untranslated regions (UTRs), and their coupling jointly determine stability, translation efficiency, and protein expression. Here, we present mRNA generation via unrolled trajectories and informed latent updates (mRNAutilus), a framework for simultaneous codon optimization and de novo UTR design directly from sequence. mRNAutilus combines a masked discrete diffusion model trained on millions of full-length mRNAs with Monte Carlo Tree Guidance to generate Pareto-efficient sequences under multiple functional objectives, using lightweight regressors over model embeddings to predict half-life, translation efficiency, and protein abundance. Unlike recent methods that design coding sequences and UTRs separately or rely on post hoc assembly and screening, mRNAutilus generates complete transcripts in a single process optimized across properties. Across diverse targets, zero-shot mRNAs encoding P. pyralis luciferase achieve over 400-fold higher expression than wild-type and outperform commercial and machine learning-designed baselines, including zero-shot generative approaches. Zero-shot SARS-CoV-2 Spike mRNAs exceed clinically used and commercial constructs and match or surpass lab-optimized designs with improved durability. We further demonstrate generality in therapeutic settings, including prime editing (PEMax) and programmable proteome modulation, where mRNAutilus-designed constructs enhance expression of peptide-guided E3 ligases (uAbs) for beta-catenin degradation. These results establish a sequence-based, multi-objective framework for generating functional mRNAs tailored to diverse biological applications.

URL PDF HTML ☆

赞 0 踩 0

2605.31295 2026-06-01 cs.SD cs.AI cs.IR cs.LG 版本更新

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

通过激活引导实现潜在空间解缠：符号音乐生成中可解释的属性控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics ； Innovation Lab Orfium Athens, Greece ； Department of Music Technology ； Acoustics Hellenic Mediterranean University Rethymno, Greece ； Institute of Informatics \& Telecommunications National Center for Scientific Research “Demokritos” Athens, Greece ； Department of Informatics Athens University of Economics

AI总结本文利用差分均值方法从多轨音乐Transformer的残差流中分离音高和时长的潜在方向，并通过Gram-Schmidt正交化实现双属性引导，从而在推理时实现可解释的确定性属性调制。

Comments Accepted at EUSIPCO 2026 (34th European Signal Processing Conference), 5 pages, 2 figures

详情

AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展，但在实现对离散信号属性的细粒度、可解释控制方面仍存在显著差距。本文研究了多轨音乐Transformer（MMT）的机制可解释性，并提出了一种无需重新训练的确定性属性调制框架，通过推理时的激活引导来弥合这一差距。利用差分均值（DiffMean）方法，我们在残差流中分离了信号属性（特别是音高和时长）的潜在方向。我们验证了该领域的线性表示假设，实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题，我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明，与简单的向量加法相比，这种几何解耦减少了概念干扰和信号退化，即使在强自回归条件下也能实现独立的确定性控制。

使用神经符号回归学习参数化氮肥响应曲线

Giorgio Morales, John Sheppard

发表机构 * Aston Centre for Artificial Intelligence Research and Application（阿斯顿人工智能研究与应用中心）； Aston University（阿斯顿大学）； Gianforte School of Computing（吉安福特计算学院）； Montana State University（蒙大拿州立大学）

AI总结提出一种基于神经符号回归的方法，无需预设函数形式即可学习氮肥响应曲线，并在真实冬小麦数据上验证其优于传统模型。

Comments Accepted at the Workshop on Symbolic Regression and Equation Discovery, part of the 2026 IEEE World Congress on Computational Intelligence (WCCI) and the IEEE Congress on Evolutionary Computation (CEC)

详情

AI中文摘要

准确模拟作物对氮肥的响应是精准农业中的基本挑战，因为它影响经济效益和环境可持续性。现有方法要么依赖预定义的参数形式，要么使用不透明的机器学习模型，限制了它们从数据中解释或发现特定地点函数关系的能力。在这项工作中，我们提出了一种神经符号回归方法，无需假设预定义的函数形式即可学习参数化的氮响应曲线。我们的方法集成了基于Transformer的多集符号骨架预测策略，能够发现多个子域或管理区之间的共享函数结构。通过构建多样化的输入子集并强制它们之间的一致性，该方法恢复了稳健的符号骨架，随后使用遗传算法将其拟合到观测数据上。该框架首先在合成一维问题上进行评估，以评估其在不同认知不确定性水平下的稳健性。结果表明，即使在数据稀缺的情况下，所提出的符号回归方法也能恢复正确的表达式。在这项工作中，我们展示了将我们的方法应用于真实冬小麦数据的结果，学习了田间不同管理区的不同参数化氮响应曲线。结果表明，发现的表达式不仅比二次平台和指数函数等传统模型实现了更低的拟合误差，而且还捕捉了不同空间区域的多样化函数行为。这证明了神经符号回归在发现特定地点农学关系和支持精准农业中知情决策方面的潜力。

英文摘要

Accurately modeling crop response to Nitrogen (N) fertilization is a fundamental challenge in precision agriculture, as it impacts both economic returns and environmental sustainability. Existing approaches either rely on predefined parametric forms or opaque machine learning models, limiting their ability to interpret or discover site-specific functional relationships from data. In this work, we propose a neuro symbolic regression (SR) approach to learn parametric N-response curves without assuming a predefined functional form. Our approach integrates a transformer-based Multi-Set Symbolic Skeleton Prediction strategy, enabling the discovery of shared functional structures across multiple subdomains or management zones (MZs). By constructing diverse input subsets and enforcing consistency across them, the method recovers robust symbolic skeletons that are subsequently fitted to observed data using a genetic algorithm. This framework was first evaluated on synthetic one-dimensional problems to assess its robustness under varying levels of epistemic uncertainty. The results demonstrate the ability of the proposed SR approach to recover correct expressions even in data-scarce regimes. In this work, we present the results of applying our method to real-world winter wheat data, learning distinct parametric N-response curves for different MZs within a field. The results show that the discovered expressions not only achieve lower fitting errors than traditional models such as quadratic-plateau and exponential functions, but also capture diverse functional behaviors across spatial regions. This demonstrates the potential that neuro SR has to enable the discovery of site-specific agronomic relationships and support informed decision-making in precision agriculture.

URL PDF HTML ☆

赞 0 踩 0

2605.31273 2026-06-01 cs.LG 版本更新

为什么线性循环记忆在部分可观测强化学习中有效

Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed, Michael Muehlebach

发表机构 * EPFL（苏黎世联邦理工学院）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）

AI总结本文通过构造两种线性滤波器，从理论上证明了线性循环神经网络在部分可观测强化学习中作为记忆单元的有效性，并扩展到动作控制的隐马尔可夫模型。

详情

AI中文摘要

线性循环神经网络家族在部分可观测强化学习中作为循环记忆单元表现出色。我们通过构造并研究两种线性滤波器为其经验有效性提供了理论依据：(i) 第一种在确定性转移矩阵下精确重现隐马尔可夫模型（HMM）中信念向量的预softmax logits，从而作为最优策略学习的充分统计量；(ii) 第二种在近似确定性转移矩阵下实现状态解码误差趋近于零，从而将状态模糊性降至接近零。结果扩展到动作控制的HMM，其中相应的线性滤波器变为随时间变化且依赖于动作的动态。我们通过数值实验说明了主要结果，并进一步展示了所构造的线性滤波器在小型强化学习游戏中作为强特征提取器的能力。

英文摘要

The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.

URL PDF HTML ☆

赞 0 踩 0

2605.31259 2026-06-01 cs.LG 版本更新

Lightweight CNN-Based Anomaly Detection for High Voltage Converter Modulators in the Spallation Neutron Source

基于轻量级CNN的散裂中子源高压转换器调制器异常检测

Alberto D. Cencillo, Leonardo Concepción, Julián Luengo, Isaac Triguero

发表机构 * Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI)（安达卢西亚数据科学与计算智能研究院）； Department of Computer Science and Artificial Intelligence (DECSAI), University of Granada（格拉纳达大学计算机科学与人工智能系）

AI总结针对高压转换器调制器多通道信号异常检测，通过改变时间滤波与跨通道混合的顺序并引入自适应通道重加权，在公开数据集上达到AUC-PR 0.816和AUC-ROC 0.934，超越现有方法。

Comments 21 pages, 8 figures

详情

AI中文摘要

高功率脉冲转换器的非计划停机是大型加速器设施停机的主要原因。在散裂中子源（SNS）中，高压转换器调制器（HVCM）始终是丢失束流时间的第二大贡献者。每个HVCM脉冲通过跨电流、电压和磁通量的传感器通道记录，这些通道的相互交互编码了系统的运行状态。故障前兆在这些通道中并非均匀显现：根据故障类型，它们可能改变单个信号的时间结构，改变通道间的统计依赖性，或两者兼有。现有的深度学习方法通常使用标准卷积流水线处理多通道信号，该流水线从第一层开始就纠缠时间和跨通道操作，使得模型没有明确的机制来表示通道独立性或结构化的通道间交互。我们假设架构归纳偏差，特别是时间滤波和跨通道混合的顺序，在这类数据的检测性能中起着核心作用。为了验证这一点，我们改变了这两个操作的顺序，并检查每个脉冲的自适应通道重加权是否进一步提高灵敏度。在涵盖所有四个SNS子系统（RFQ、DTL、CCL、SCL）的公开HVCM数据集上评估，我们最好的变体实现了池化AUC-PR为0.816和AUC-ROC为0.934，在大多数子系统和六个故障家族中的五个上优于现有技术。消融实验识别出三个主导输入通道，并将每个故障家族的性能与前兆表现为单个通道的幅度偏移还是需要联合通道表示才能显现的更细微模式联系起来。

英文摘要

Unscheduled trips of high-power pulsed converters are a leading source of downtime at large accelerator facilities. At the Spallation Neutron Source (SNS), the High Voltage Converter Modulators (HVCMs) are consistently the second-largest contributor to lost beam time. Each HVCM pulse is recorded across sensor channels spanning currents, voltages, and magnetic fluxes, whose mutual interactions encode the operating state of the system. Fault precursors do not manifest uniformly across these channels: depending on fault type, they may alter the temporal structure of individual signals, change the statistical dependencies among channels, or both. Existing deep-learning approaches typically process multi-channel signals with standard convolutional pipelines that entangle temporal and cross-channel operations from the first layer, giving the model no explicit mechanism to represent channel independence or structured inter-channel interaction. We hypothesise that architectural inductive bias, specifically the ordering of temporal filtering and cross-channel mixing, plays a central role in detection performance on this class of data. To test this, we vary the order in which these two operations are applied, and examine whether per-pulse adaptive channel reweighting further improves sensitivity. Evaluated on the public HVCM dataset across all four SNS subsystems (RFQ, DTL, CCL, SCL), our best variant achieves a pooled AUC-PR of 0.816 and AUC-ROC of 0.934, outperforming the state of the art on most subsystems and five of the six fault families. Ablations identify three dominant input channels and link per-fault-family performance to whether precursors manifest as amplitude shifts in individual channels or as subtler patterns requiring joint channel representations to surface.

URL PDF HTML ☆

赞 0 踩 0

2605.31257 2026-06-01 cs.LG stat.ML 版本更新

Fraud Type Decomposition and the Observation-Mechanism Taxonomy:Class-Specific Detection Limits in Payment Networks

欺诈类型分解与观测机制分类：支付网络中的类别特定检测极限

Gaurav Dhama

AI总结本文通过引入观测机制分类将欺诈分为五类，证明按类别分别估计欺诈率并聚合优于整体估计，并推导了每类检测的理论约束。

Comments 59 pages

详情

AI中文摘要

支付网络中的欺诈检测依赖于通过异质且不完美的观测过程生成的标签，但现有方法将欺诈视为同质二元变量。我们证明这一假设在结构上不正确，并导致可证明的低效。我们引入一个观测机制分类，将欺诈分为五类，每类由不同的审查和标记流程定义。我们证明按类别分别估计欺诈率并聚合严格优于整体估计，效率差距由异质观测率导致的Jensen惩罚刻画。对于每类，我们推导了检测的绑定理论约束，包括内生标签腐败、结构不可观测性和特征非信息性。这些结果确立了欺诈检测本质上是一组不同的估计问题，每个问题由其自身的观测结构和检测极限支配。

英文摘要

Fraud detection in payment networks relies on labels generated through heterogeneous and imperfect observation processes, yet existing approaches treat fraud as a homogeneous binary variable. We show that this assumption is structurally incorrect and leads to provable inefficiency. We introduce an observation-mechanism taxonomy that partitions fraud into five classes, each defined by a distinct censorship and labeling pipeline. We prove that estimating fraud rates separately by class and aggregating strictly dominates pooled estimation, with the efficiency gap characterized as a Jensen penalty arising from heterogeneous observation rates. For each class, we derive the binding theoretical constraint on detection, including endogenous label corruption, structural non-observability, and feature non-informativeness. These results establish that fraud detection is fundamentally a collection of distinct estimation problems, each governed by its own observation structure and detection limit.

URL PDF HTML ☆

赞 0 踩 0

2605.31250 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Entropic Projection Alignment: Estimating, Explaining, and Improving Model Performance Under Distribution Shift

熵投影对齐：估计、解释和改进分布偏移下的模型性能

Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra, Manuela Veloso

发表机构 * J.P. Morgan AI Research（摩根大通AI研究所）

AI总结提出熵投影对齐（EPA）方法，通过匹配选定矩并最小化KL散度来对齐源分布与目标分布，从而统一解决分布偏移下的性能估计、解释和改进问题。

Comments Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

2605.31249 2026-06-01 cs.LG cs.AI 版本更新

Learning Cardiac Latent Representations in Vectorcardiogram Space

在向量心电图空间中学习心脏潜在表示

Bosong Huang, Panzhen Zhao, Zengxiang Li, Patricia Lee, Wei Jin, Alan Wee-Chung Liew, Ming Jin, Shirui Pan

发表机构 * Griffith University, Australia（格里菲斯大学）； SingHealth Duke-NUS AI in Medicine Institute, Singapore（新加坡SingHealth Duke-NUS医学人工智能研究所）； Emory University, USA（埃默里大学）

AI总结针对标准十二导联心电图表示学习中的冗余和过拟合问题，提出基于Frank向量心电图模型的LVCG框架，在物理潜在空间中学习视图不变的心脏电活动表示，提升鲁棒性和泛化能力。

详情

AI中文摘要

心电图（ECG）是心脏评估的基石，学习信息丰富的ECG表示对于从疾病诊断到临床报告生成等任务至关重要。然而，现有方法几乎完全在可观测的ECG信号空间中操作。实际上，标准十二导联ECG代表了同一心脏电活动在不同空间方向上的多个投影。因此，在ECG空间中进行表示学习不可避免地引入了大量冗余，可能导致虚假相关性和过拟合风险增加。为了解决这个问题，受Frank向量心电图（VCG）模型启发，我们提出直接在VCG空间中学习心脏电活动的统一潜在表示。我们引入了LVCG，这是第一个设计用于在此物理基础潜在空间中运行的通用自监督表示学习框架。通过学习视图不变的潜在VCG表示而非导联特定伪影，LVCG最小化了冗余并提高了泛化能力。LVCG在各项任务中普遍优于ECG空间基线，展现出增强的鲁棒性和泛化能力，尤其在领域偏移设置中。

英文摘要

Electrocardiography (ECG) is a cornerstone of cardiac assessment, making the learning of informative ECG representations fundamental to tasks ranging from disease diagnosis to clinical report generation. However, existing methods operate almost exclusively in the observable ECG signal space. In practice, the standard twelve-lead ECG represents multiple projections of the same underlying cardiac electrical activity from different spatial orientations. Therefore, representation learning in the ECG space inevitably introduces substantial redundancy, which may lead to spurious correlations and increased risk of overfitting. To address this and motivated by the Frank vectorcardiogram (VCG) model, we propose learning a unified latent representation of cardiac electrical activity directly in the VCG space. We introduce LVCG, the first general self-supervised representation learning framework designed to operate in this physically grounded latent space. By learning view-invariant latent VCG representations rather than lead-specific artifacts, VCG minimizes redundancy and improves generalization. LVCG generally outperforms ECG-space baselines across tasks, demonstrating enhanced robustness and generalization, especially in domain shift settings.

URL PDF HTML ☆

赞 0 踩 0

2605.31245 2026-06-01 cs.LG 版本更新

Toward Identifiable Sparse Autoencoders

走向可识别的稀疏自编码器

Walter Nelson, Theofanis Karaletsos, Francesco Locatello

发表机构 * Institute of Science and Technology Austria（科学与技术奥地利研究所）； Pyramidal Inc.（Pyramidal公司）； Achira Inc., USA（Achira公司，美国）

AI总结针对稀疏自编码器训练不稳定的问题，通过理论分析模型属性并改进架构与训练流程，提出iSAE变体，实现更低重构误差与更高稳定性。

Comments International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

最近，稀疏自编码器（SAE）已成为解释和交互实际神经网络表示的有吸引力的工具。虽然常见的经验共识如此，但我们也在理论上表明SAE高度不稳定：不同的训练运行可能产生不同的概念字典和稀疏编码。我们刻画了阻碍实际SAE稳定性的模型属性，并通过架构和训练过程的最小改动解决每个问题。这些改动共同产生了两个版本的 extbf{可识别}SAE（iSAE），这是标准TopK SAE的变体，具有更低的重构误差和更高的稳定性。我们通过将SAE与传统字典学习方法联系起来，从理论上解释了这一改进，并表明实践中学习的字典满足近似受限等距条件，从而使这些模型中的相应稀疏编码接近可识别。

英文摘要

Recently, sparse autoencoders (SAEs) have emerged as an attractive tool for interpreting and interacting with representations in practical neural networks. While it is common empirical folklore, we also show theoretically that SAEs are highly unstable: different training runs are likely to produce different concept dictionaries and sparse codes. We characterize the model properties that hinder the stability of real-world SAEs, and address each of these problems through minimal changes to the architecture and training procedure. Together, these changes yield two versions of an \textbf{i}dentifiable SAE (iSAE), a variant of the standard TopK SAE with lower reconstruction error and improved stability. We explain this improvement theoretically by connecting SAEs with traditional dictionary learning approaches, and show that the dictionaries learned in practice satisfy an approximate restricted isometry condition, rendering the corresponding sparse codes in those models near-identifiable.

URL PDF HTML ☆

赞 0 踩 0

2605.31244 2026-06-01 cs.LG physics.comp-ph 版本更新

基于全纯神经网络的调和势控制的三维边值问题框架

Enrico Ballini, Allan Peter Engsig-Karup, Tito Andriollo

发表机构 * Department of Mechanical and Production Engineering, Aarhus University（阿arhus大学机械与生产工程系）； Department of Applied Mathematics and Computer Science, Technical University of Denmark（技术大学 of Denmark应用数学与计算机科学系）

AI总结提出一种基于Whittaker积分公式和全纯神经网络的框架，通过构造精确满足偏微分方程的神经网络求解三维调和势边值问题，仅需边界配点训练，在拉普拉斯和线弹性问题中验证了精度。

详情

AI中文摘要

我们提出了一种基于神经网络的框架，用于求解解可表示为调和势的三维边值问题。该方法利用Whittaker积分公式，通过关于合适复变量的全纯函数来表示解。这些函数随后使用全纯神经网络进行逼近，从而保证全纯性要求。该公式的一个关键特征是，控制偏微分方程（PDE）通过构造精确满足。因此，与标准的物理信息神经网络相比，在域内部不需要PDE的残差最小化，训练完全基于边界配点。该方法针对三维拉普拉斯和线弹性问题进行了验证，在后一种情况下，位移和应力场通过Papkovich-Neuber势表示。数值结果表明，标量和矢量场均得到精确逼近，误差在整个域内保持可控。总体而言，该工作表明，将解析结构融入神经网络架构为三维边值问题的无网格逼近提供了一种自然且有效的框架，同时保留了控制方程的基本性质。

英文摘要

We present a neural-network-based framework for the solution of three-dimensional boundary value problems where the solution is expressible in terms of harmonic potentials. The approach leverages the Whittaker integral formula, which allows representing the solution through functions that are holomorphic with respect to a suitable complex variable. These functions are subsequently approximated using holomorphic neural networks, which guaranty fulfillment of the holomorphicity requirement. A key feature of the proposed formulation is that the governing partial differential equations (PDEs) are satisfied exactly by construction. Therefore, in contrast to standard physics-informed neural networks, no residual minimization of PDEs is required in the interior of the domain, and training is based exclusively on boundary collocation points. The method is validated against three-dimensional Laplace and linear elasticity problems, where, in the latter case, displacement and stress fields are expressed via the Papkovich-Neuber potentials. The numerical results show an accurate approximation of both scalar and vector fields, with errors remaining controlled throughout the domain. Overall, the work demonstrates that the incorporation of analytical structures into neural network architectures provides a natural and effective framework for the meshless approximation of three-dimensional boundary value problems while preserving the underlying properties of the governing equations.

URL PDF HTML ☆

赞 0 踩 0

2605.31228 2026-06-01 cs.LG cs.AI 版本更新

EchoRL: Reinforcement Learning via Rollout Echoing

EchoRL：通过回滚回响进行强化学习

Jinhe Bi, Aniri, Minglai Yang, Xingcheng Zhou, Wenke Huang, Sikuan Yan, Yujun Wang, Zixuan Cao, Michael Färber, Xun Xiao, Volker Tresp, Yunpu Ma

发表机构 * Munich Center for Machine Learning（慕尼黑机器学习中心）； Huawei Heisenberg Research Center（华为海森堡研究所以）； University of Arizona（亚利桑那大学）； College of Computing（计算学院）； Data Science, Nanyang Technological University, Singapore（数据科学，南洋理工大学，新加坡）； MemAgents Lab（MemAgents实验室）

AI总结针对RLVR训练中优势退化问题，提出EchoRL模块，通过从成功回滚中提取EchoClip作为辅助监督信号，持续提升训练性能。

Comments ICML 2026

详情

AI中文摘要

基于几何的薛定谔桥用于可信多模态融合

Jiayu Xiong, Jing Wang, Qi Zhang, Wanlong Wang, Jun Xue

发表机构 * Department of Computer Science（计算机科学系）； Techonology, Huaqiao University（技术学系，华侨大学）； Xiamen Key Laboratory of Computer Vision（厦门计算机视觉实验室）； Pattern Recognition, Huaqiao University（模式识别，华侨大学）； Tongji University（同济大学）； School of Cyber Science（网络科学学院）； Engineering, Wuhan University（工程学院，武汉大学）

AI总结提出基于几何的多模态融合方法GMF，利用扩散薛定谔桥的初始速度平方作为独立于预测的可靠性信号，以提升对低质量数据的鲁棒性。

Comments ICML 2026 accepted paper

详情

AI中文摘要

现实世界的多模态系统必须对低质量数据具有鲁棒性，例如传感器噪声、不完整的多模态数据和冲突输入。然而，现有的可信融合方法依赖模型自身的预测置信度来判断数据质量，这造成了循环依赖：当模型自信但错误时，这些方法无法检测到错误。为了打破这一循环，我们提出了基于几何的多模态融合（GMF）。我们不依赖预测，而是通过测量输入在潜在空间中所需的传输校正量来评估可靠性。我们实现了带有整流流的扩散薛定谔桥传输，其中初始速度的平方提供了一个高效的学习校正分数。有效数据具有低的平方速度幅度，而噪声、不完整数据或冲突数据需要更强的传输校正。这种基于几何的可靠性信号充当独立判断，即使在分类器被欺骗时也能有效标记不可靠输入。大量实验表明，与基于置信度的基线相比，GMF显著提高了对严重传感器噪声和语义冲突的鲁棒性。

英文摘要

Real-world multimodal systems must be robust against low-quality data, such as sensor noise, incomplete multimodal data and conflicting inputs. However, existing trustworthy fusion methods rely on the model's own prediction confidence to judge data quality. This creates a circular dependency: when a model is confident but wrong, these methods fail to detect the error. To break this loop, we propose Geometry-based Multimodal Fusion (GMF). Instead of relying on predictions, we evaluate reliability by measuring how much transport correction the input needs in latent space. We implement Diffusion Schrödinger Bridge transport with Rectified Flow, where the squared initial velocity gives an efficient learned correction score. Valid data has low squared velocity magnitude, while noisy, incomplete data or conflicting data requires stronger transport correction. This geometry-based reliability signal acts as an independent judge, effectively flagging unreliable inputs even when the classifier is fooled. Extensive experiments demonstrate that GMF significantly improves robustness against severe sensor noise and semantic conflicts compared to confidence-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.31191 2026-06-01 cs.LG cs.CV 版本更新

Student Capacity Moderates Knowledge Distillation Effectiveness: A Systematic Study Across ResNet Teacher-Student Pairs on CIFAR-10

学生容量调节知识蒸馏有效性：基于CIFAR-10上ResNet教师-学生对的系统研究

Umut Onur Yasar

发表机构 * GitHub

AI总结通过ResNet教师-学生对在CIFAR-10上的图像分类实验，系统研究学生容量如何调节知识蒸馏（KD）的有效性，发现学生容量是蒸馏增益的关键调节因素，并指出实现正确性和输入分辨率感知架构的重要性。

Comments 9 pages, 2 figures, 5 tables. Code available at https://github.com/umutonuryasar/kd-capacity-gap

详情

AI中文摘要

我们研究了教师-学生容量关系如何调节基于ResNet的CIFAR-10图像分类中知识蒸馏（KD）的有效性。在三个教师-学生对（R50->R18、R34->R18和R50->R34）中，我们在受控、可重复的条件下（3个种子，全程报告均值±标准差）比较了Logit-KD和Feature-KD。我们报告三个主要发现。首先，学生容量是蒸馏增益的关键调节因素：即使教师-学生准确率差距相当，R34学生从KD中获得的收益也远大于R18学生，R50->R34 Feature-KD的最大增益为+0.30个百分点，而R34->R18 Feature-KD为+0.18个百分点，R34->R18 Logit-KD为+0.00个百分点。其次，实现的正确性对Feature-KD至关重要：一个排除了投影层的梯度裁剪错误抑制了Feature-KD的性能，并产生了与Logit-KD的误导性比较。修正后，Feature-KD在三个对中的两个上匹配或优于Logit-KD，在R50->R34上达到95.55%，基线为95.25%。第三，输入分辨率感知架构是有效蒸馏的先决条件：将ResNet主干修正为32x32输入使教师准确率提高超过5个百分点——比任何KD增益高出一个数量级。所有代码和结果可在github.com/umutonuryasar/kd-capacity-gap获取。

英文摘要

We investigate how teacher-student capacity relationships modulate knowledge distillation (KD) effectiveness in ResNet-based image classification on CIFAR-10. Across three teacher-student pairs -- R50->R18, R34->R18, and R50->R34 -- we compare Logit-KD and Feature-KD under controlled, reproducible conditions (3 seeds, mean+/-std reported throughout). We report three main findings. First, student capacity is a key moderating factor in distillation gain: R34 students benefit substantially more from KD than R18 students even when teacher-student accuracy gaps are comparable, with the strongest gain of +0.30pp observed for R50->R34 Feature-KD versus +0.18pp for R34->R18 Feature-KD and +0.00pp for R34->R18 Logit-KD. Second, implementation correctness critically affects Feature-KD: a gradient clipping bug that excluded projection layers suppressed Feature-KD performance and produced misleading comparisons with Logit-KD. After correction, Feature-KD matches or outperforms Logit-KD in two of three pairs, reaching 95.55% on R50->R34 against a baseline of 95.25%. Third, input-resolution-aware architecture is a prerequisite for effective distillation: correcting the ResNet stem for 32x32 inputs raises teacher accuracy by over 5pp -- an order of magnitude larger than any KD gain. All code and results are available at github.com/umutonuryasar/kd-capacity-gap.

URL PDF HTML ☆

赞 0 踩 0

2605.31189 2026-06-01 cs.LG 版本更新

FlagGAM: Rule-Based Generalized Additive Modeling for Explainable Tabular Prediction

FlagGAM：基于规则的可解释表格预测广义加性模型

Zijie Zhao, Roy E. Welsch

发表机构 * EECS Department, Massachusetts Institute of Technology, Cambridge, MA, USA（麻省理工学院电子工程与计算机科学系）； Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA（麻省理工学院斯隆管理学院）

AI总结提出FlagGAM框架，通过规则定义的基函数分离特征级规则构建与预测，在保持可解释性的同时提升对不完美输入的鲁棒性。

详情

AI中文摘要

在高风险领域的表格预测中，需要准确、透明且对不完美输入鲁棒的模型。我们提出FlagGAM，一个规则定义的基函数框架，将特征级规则构建与预测分离。Flag核心模块将数值和分类变量转换为稀疏、可读的单变量基函数，包括阈值标志、类别级标志、尾部偏差基和分类阶跃函数；默认的加性头部随后将这些基函数组合为受限的GAM风格预测器。FlagGAM不是将触发的规则简化为紧凑的计数摘要，而是保留稀疏的规则基矩阵，支持混合类型分类和回归、特征特定权重以及可选的灵活预测头部。在表格基准测试中，默认FlagGAM在透明加性模式下接近EBM，在混合类型回归上显著优于岭回归，并在缺失和噪声扰动下显示出比常见基线更小的AUROC下降。灵活头部进一步提高了准确性，接近强树基线，但需要注意，所得模型应解释为规则基表示后接非线性预测器，而非完全加性GAM。总体而言，FlagGAM为需要竞争性准确性、可传达规则和对不完美输入鲁棒性的表格设置提供了实用的中间地带。

英文摘要

Tabular prediction in high-stakes domains requires models that are accurate, transparent, and robust to imperfect inputs. We propose FlagGAM, a rule-defined basis framework that separates feature-level rule construction from prediction. A Flag Core Module converts numerical and categorical variables into sparse, human-readable univariate bases, including threshold flags, category-level flags, tail-deviation bases, and categorical step functions; a default additive head then combines these bases as a restricted GAM-style predictor. Rather than reducing triggered rules to compact count summaries, FlagGAM retains a sparse rule-basis matrix that supports mixed-type classification and regression, feature-specific weighting, and optional flexible prediction heads. Across tabular benchmarks, default FlagGAM remains close to EBM in transparent additive mode, improves substantially over ridge regression on mixed-type regression, and shows smaller AUROC degradation than common baselines under missing and noisy perturbations. Flexible heads further improve accuracy and approach strong tree-based baselines, with the caveat that the resulting model should be interpreted as a rule-basis representation followed by a nonlinear predictor rather than as a fully additive GAM. Overall, FlagGAM provides a practical middle ground for tabular settings that require competitive accuracy, communicable rules, and robustness to imperfect inputs.

URL PDF HTML ☆

赞 0 踩 0

2605.31187 2026-06-01 cs.CV cs.LG 版本更新

From Local Geometry to Global Pseudo Labeling for Robust Positive Unlabeled Learning under Covariate Shift

从局部几何到全局伪标注：协变量偏移下鲁棒的正无标记学习

Firas Gabetni, Alexandre Rocchi Henry, Nacim Belkhir, Ziyi Liu, Gianni Franchi

发表机构 * U2IS, ENSTA（U2IS，ENSTA）； Institut Polytechnique de Paris（巴黎政治学院）； AMIAD, Pôle Recherche, Palaiseau（AMIAD，研究学院，帕莱索）

AI总结提出SPUNA框架，利用局部流形结构逐步发现偏移数据，在协变量偏移下实现正无标记学习，性能达到全监督方法水平。

详情

AI中文摘要

检测协变量偏移对于构建可靠的视觉系统至关重要。虽然大多数先前工作专注于提高对偏移的鲁棒性，但显式检测协变量偏移仍未被充分探索。现有方法通常依赖于全监督训练，需要来自原始分布和偏移分布的有标签样本，这往往不切实际。在本文中，我们表明协变量偏移检测可以通过使用正无标记（PU）学习的弱监督有效解决。然而，在协变量偏移下，分布内数据和偏移数据显著重叠，使得经典PU方法不稳定且对噪声敏感。为克服这一挑战，我们引入了谱PU邻域标注（SPUNA），这是一种几何感知框架，通过利用视觉特征的局部流形结构逐步发现偏移数据。大量实验表明，SPUNA在PU设置中实现了最先进的性能，并且显著匹配了全监督方法的性能。此外，我们的方法在不同类型的偏移之间鲁棒地迁移，展示了强大的泛化能力。

英文摘要

Detecting covariate shift is critical for building reliable vision systems. While most prior work focuses on improving robustness to shift, explicitly detecting covariate shift remains underexplored. Existing approaches typically rely on fully supervised training, requiring labeled examples from both original and shifted distributions, which is often impractical. In this paper, we show that covariate shift detection can be effectively addressed with weaker supervision using Positive Unlabeled (PU) learning. However, under covariate shift, in distribution and shifted data overlap significantly, making classical PU methods unstable and sensitive to noise. To overcome this challenge, we introduce Spectral PU Neighborhood Annotation (SPUNA), a geometry aware framework that progressively discovers shifted data by leveraging the local manifold structure of visual features. Extensive experiments show that SPUNA achieves state of the art performance in PU settings and remarkably matches the performances of fully supervised methods. Moreover, our approach transfers robustly across different types of shifts, demonstrating strong generalization capabilities.

URL PDF HTML ☆

赞 0 踩 0

2605.31186 2026-06-01 cs.LG 版本更新

How well does Classification Accuracy capture Concept Drift Detection Quality? An overview of Concept Drift Detection evaluation

分类精度在多大程度上捕捉概念漂移检测质量？概念漂移检测评估综述

Joanna Komorniczak

发表机构 * Department of Systems and Computer Networks（系统与计算机网络系）

AI总结本文综述了概念漂移检测质量度量与分类性能之间的关系，通过七种合成数据流工具研究八种漂移检测质量度量，旨在确定最具信息量的度量集。

详情

AI中文摘要

数据流是当今最常分析的数据结构之一，概念漂移对处理系统构成了重大挑战。尽管提出了许多解决方案来应对概念漂移导致的精度下降，但科学界尚未建立统一的概念漂移检测评估框架。现有研究通常依赖分类质量度量，但这些度量可能受多种因素影响，无法可靠反映漂移检测质量。本文深入概述了合成非平稳数据流中漂移检测质量度量与分类性能之间的关系。研究通过七种合成数据流生成工具，考察了八种漂移检测质量度量与分类器性能的关系，并额外考虑了漂移动态因素。研究旨在识别最具信息量的漂移检测质量度量集，并提供对方法评估的深入理解。

英文摘要

Data streams are nowadays among the most frequently analyzed data structures, with the concept drift posing a major challenge encountered by processing systems. Despite the proposition of numerous solutions to counteract the accuracy degeneration due to concept drift, the scientific community has not yet established a unified framework for evaluating the concept drift detection task. Existing research often relies on classification quality metrics, but these can be affected by multiple factors and may not reliably reflect drift detection quality. In this work, we present an in-depth overview of the relationship between metrics for quantifying drift detection quality and classification performance in synthetic nonstationary data streams. The proposed research studies eight drift detection quality metrics in relation to the classifier's performance across seven synthetic data stream generation tools, additionally considering drift dynamics as a factor. The studies aim to identify the most informative set of drift detection quality metrics and provide a deep understanding of the method's evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.31183 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

引导LLM？实际上，稀疏自编码器可以胜过简单基线

Mikkel Godsk Jørgensen, Lars Kai Hansen

发表机构 * DTU Compute（丹麦技术大学计算学院）

AI总结本文通过监督流水线选择并标注特征，证明稀疏自编码器在模型引导任务上可接近LoRA性能，并发现高稀疏性对基于可解释性的引导并非关键。

详情

AI中文摘要

稀疏自编码器（SAEs）被视为探索大型语言模型（LLMs）内部机制和引导模型输出生成的有前途的途径。当Wu等人（2025）引入模型引导基准AxBench时，SAEs由于相对于一组简单基线的引导性能较差，似乎并未达到最初的期望。本文作为对稀疏自编码器的部分反驳，表明Wu等人（2025）的结果并未完全公正地评价它们。我们发现，当使用我们的监督流水线选择并标注特征时，稀疏自编码器实际上可以在AxBench基准上达到接近参考LoRA性能的水平。我们还发现，当仅使用基于可解释性的组件时，我们的流水线选择的特征与其识别标签具有令人惊讶的因果性。最后，我们提供证据表明，高稀疏性（低l0）可能对于基于可解释性的成功引导并非关键，这与Wang等人（2025）早期的发现相反。

英文摘要

Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to their original hype due to poor steering performance relative to a set of simple baselines. This work serves as a partial rebuttal for Sparse Autoencoders and suggests that the results of Wu et al. (2025) did not do them full justice. We find that Sparse Autoencoders can, in fact, perform close to on par with the reference LoRA performance on the AxBench benchmark, when features are selected and labelled with our supervised pipeline. We also find that our pipeline selects features that are surprisingly causal of their identified labels when using only its interpretability-based components. Lastly, we present evidence that high sparsity (low l0) may not be crucial for successful steering based on interpretability, which is in contrast to the earlier findings in Wang et al. (2025).

URL PDF HTML ☆

赞 0 踩 0

2605.31176 2026-06-01 cs.LG cs.DS 版本更新

Retriever Portfolios: A Principled Approach to Adaptive RAG

检索器组合：一种自适应RAG的原则性方法

Miltiadis Stouras, Vincent Cohen-Addad, Silvio Lattanzi, Ola Svensson

发表机构 * EPFL（瑞士联邦理工学院）； Google Research（谷歌研究院）

AI总结提出从大量候选检索器中自动选择小型多样子集（组合）的方法，通过期望最优k目标优化查询分布，实现自适应RAG，在多个QA基准上优于单检索器和朴素多检索器基线，并降低延迟和令牌成本。

Comments Accepted at ICML 2026. Code available at: https://github.com/mstou/retriever-portfolios

详情

AI中文摘要

检索增强生成（RAG）系统通常依赖单一检索器和一组超参数，尽管面临从简单事实性问题到复杂多跳推理的高度异构查询。我们提出一种方法，从大量候选检索器中自动选择一个小型、多样的子集（组合），以覆盖目标查询分布的不同区域。我们通过查询分布上的期望最优$k$目标形式化这一设置，并证明其存在一个具有近最优保证的高效组合构建算法。在多个QA基准上，我们学习的组合和路由管道在检索指标和答案质量上始终优于单检索器和朴素多检索器基线。此外，与推理时超参数调优方法相比，固定组合支持并行检索和LLM调用，在实现相当（有时更好）准确性的同时，显著降低延迟和令牌成本。

英文摘要

Retrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.

URL PDF HTML ☆

赞 0 踩 0

2605.31174 2026-06-01 cs.CV cs.LG 版本更新

信任域行为混合用于在线策略蒸馏

Daniil Plyusov, Alexey Gorbatovski, Alexey Malakhov, Nikita Balagansky, Boris Shaposhnikov, Daria Korotyshova, Daniil Gavrilov

发表机构 * T-Tech

AI总结提出信任域行为混合（TRB）预热方法，通过在学生中心的KL信任域内用最接近教师的行为策略替换早期学生策略，解决在线策略蒸馏中早期学生轨迹质量差的问题，在数学推理蒸馏中取得最佳平均性能。

2605.31156 2026-06-01 cs.LG 版本更新

TabCausal: Pretraining Across Causal Environments for Tabular Causal Discovery

TabCausal: 跨因果环境的表格因果发现预训练

Zi-Rong Li, Si-Yang Liu, Tian-Zuo Wang, Han-Jia Ye

发表机构 * Nanjing University（南京大学）

AI总结提出TabCausal，一种通过动态任务构建策略在多样化因果环境中进行大规模预训练的因果发现基础模型，在合成和语义基准上优于现有方法。

详情

AI中文摘要

因果发现旨在从观测和干预数据中恢复有向因果关系，为机制理解和可靠决策提供基础。因果发现基础模型（CDFMs）试图通过将数据集直接映射到因果图（单次前向传播）来分摊该问题，避免每个数据集上的测试、搜索或优化。然而，现有的CDFMs仍然有限，常常无法一致地匹配强大的经典方法，我们发现关键瓶颈在于因果预训练任务的构建方式。基于这一观察，我们提出了TabCausal，一种数据驱动的CDFM，在多样化的图先验、结构机制、噪声模型、维度、样本量和干预机制上进行广泛的因果预训练。一种动态任务构建策略将这些因果环境组合成多样的发现任务，使得从观测和混合干预数据中实现更具迁移性的结构学习。在大规模合成基准上，TabCausal实现了比多种因果发现基线更好的宏观平均性能。为了进一步弥合抽象合成生成器与现实因果推理场景之间的差距，我们引入了一个协议引导且LLM审计的语义因果环境基准，其中基于领域的结构因果模型（SCMs）生成可解释的观测和干预数据集，用于分布外分析。在合成和语义环境中，TabCausal均展现出鲁棒的结构恢复能力，尤其是在干预证据下，凸显了广泛因果预训练作为可迁移摊销因果发现的关键要素。

英文摘要

Causal discovery aims to recover directed causal relations from observational and interventional data, providing a basis for mechanistic understanding and reliable decision-making. Causal discovery foundation models (CDFMs) seek to amortize this problem by mapping a dataset directly to a causal graph in a single forward pass, avoiding per-dataset testing, search, or optimization. However, existing CDFMs remain limited, often failing to consistently match strong classical methods, and we find that a key bottleneck is how causal pretraining tasks are constructed. Based on this observation, we propose TabCausal, a data-driven CDFM trained with broad causal pretraining over diverse graph priors, structural mechanisms, noise models, dimensions, sample sizes, and intervention regimes. A dynamic task construction strategy composes these causal environments into varied discovery tasks, enabling more transferable structural learning from observational and mixed-interventional data. On large-scale synthetic benchmarks, TabCausal achieves better macro-averaged performance than a diverse set of causal discovery baselines. To further bridge abstract synthetic generators and realistic causal reasoning scenarios, we introduce a protocol-guided and LLM-audited semantic causal environment benchmark, where domain-grounded SCMs generate interpretable observational and interventional datasets for out-of-distribution analysis. Across both synthetic and semantic environments, TabCausal demonstrates robust structure recovery, especially under interventional evidence, highlighting broad causal pretraining as a key ingredient for transferable amortized causal discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.31155 2026-06-01 cs.LG 版本更新

Learning Hyperspherical Time-Frequency Representations for Time-Series Out-of-Distribution Detection

学习超球面时频表示用于时间序列分布外检测

Willian T. Lunardi, Samridha Shrestha, Martin Andreoni

发表机构 * Technology Innovation Institute（技术创新研究所）； Khalifa University（哈利法大学）

AI总结本文提出一种基于超球面嵌入的表示学习方法，通过von Mises-Fisher目标函数结合时频域编码器，实现时间序列的分布外检测，在UCR和UEA数据集上优于对比学习和后处理方法。

Comments 14 pages, 2 figures, 4 tables, accepted at IJCAI-ECAI 2026

详情

AI中文摘要

与视觉和语言领域相比，时间序列数据的分布外（OOD）检测仍然相对未被充分探索，对于如何利用监督时间序列表示在分布偏移下进行可靠检测，缺乏原则性的理解。本文将时间序列OOD检测形式化为具有超球面嵌入的表示学习，其中通过单位球面上的von Mises-Fisher（vMF）似然目标诱导类条件结构。学习到的表示通过特定领域的编码器结合输入信号的时域和频域视图，将它们整合到一个联合嵌入空间中进行OOD检测。检测使用基于距离的分数对学习到的嵌入进行评估，包括k近邻（k-NN）和马氏距离分数。我们在完整的UCR和UEA时间序列存档上，在跨数据集协议下大规模评估该方法。实验结果表明，在相同设置下，与强对比学习和后处理方法基线相比，k-NN和马氏距离评分均取得一致改进。代码可在https://github.com/tiiuae/hypertf-time-series-ood获取。

英文摘要

Out-of-distribution (OOD) detection for time-series data remains comparatively underexplored compared to vision and language, with a limited principled understanding of how supervised time-series representations can be leveraged for reliable detection under distributional shifts. This work formulates time-series OOD detection as representation learning with hyperspherical embeddings, where class-conditional structure is induced by a von Mises-Fisher (vMF) likelihood-based objective on the unit sphere. The learned representation combines time- and frequency-domain views of the input signal via domain-specific encoders, integrating them into a joint embedding space for OOD detection. Detection uses distance-based scores over the learned embeddings, including k-nearest neighbors (k-NN) and Mahalanobis scores. We evaluate the approach at scale on the complete UCR and UEA time-series archives under a cross-dataset protocol. Empirical results show consistent improvements under both k-NN and Mahalanobis scoring over strong contrastive learning and post-hoc baselines in the same setting. Code is available at https://github.com/tiiuae/hypertf-time-series-ood.

URL PDF HTML ☆

赞 0 踩 0

2605.31152 2026-06-01 stat.ML cs.LG cs.NA math.NA 版本更新

并非所有合成数据都适合学习

Sina Alemohammad, Li Chen, Richard G. Baraniuk, Zhangyang Wang

发表机构 * ECE Department（电子工程系）； Apple（苹果公司）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Rice University（里奇大学）

AI总结研究无提示、无教师、无验证器、无奖励模型的自训练中，语言模型能否从自身生成的文本中学习，发现合成数据与学生之间的兼容性是关键，并揭示了能力与逐字记忆可分离的现象。

详情

AI中文摘要

语言模型能否从自身采样的纯文本中改进，无需提示、教师、验证器或奖励模型？可以，但仅当合成语料库与学生兼容时，这是一种源-学生对的关联属性，而非数据的内在属性。我们称之为潜在能力重现假说：弱自训练可以放大预训练模型中已有的能力，但仅在这种兼容条件下。我们在无提示无条件自训练的最小设置中研究这一点，其中基础语言模型仅在BOS令牌生成的文本上进行微调，没有任务规范或外部监督。我们报告三个发现。首先，合成效用是关联的而非内在的：自生成数据是最有效的来源，同源迁移优于更强但不同来源的训练，跨家族迁移显著较弱。其次，常见的内在代理失效：基准级别的语义相似性和学生下的平均每令牌似然都不能预测哪些语料库有帮助。第三，这种机制产生了一个令人惊讶的副产品。在受控的Pythia实验中，能力和逐字记忆解耦：基准效用得以保留或改善，而保留的精确匹配提取下降超过95%，无需遗忘集、隐私目标或针对性遗忘。总之，这些结果表明，无提示自训练通过放大学生已知的内容来工作，而不是从数据中导入结构。它们还揭示了一种无需任何显式遗忘目标即可分离能力和逐字记忆的机制。

英文摘要

Can a language model improve from plain text sampled from itself, with no prompts, no teacher, no verifier, and no reward model? Yes, but only when the synthetic corpus is compatible with the student, a relational property of the source-student pair rather than an intrinsic property of the data. We call this the latent capability resurfacing hypothesis: weak self-training can amplify capabilities already present in the pretrained model, but only under this compatibility condition. We study this in the minimal setting of prompt-free unconditional self-training, where base language models are fine-tuned on text generated from the BOS token alone, with no task specification or external supervision. We report three findings. First, synthetic utility is relational rather than intrinsic: self-generated data is the most effective source, same-lineage transfer outperforms stronger but differently trained sources, and cross-family transfer is substantially weaker. Second, common intrinsic proxies fail: neither benchmark-level semantic similarity nor average per-token likelihood under the student predicts which corpora help. Third, this regime produces a surprising byproduct. In controlled Pythia experiments, capability and verbatim memorization decouple: benchmark utility is preserved or improved while held-out exact-match extraction drops by over 95 percent, with no forget set, privacy objective, or targeted unlearning. Together, these results suggest that prompt-free self-training works by amplifying what the student already knows, not by importing structure from the data. They also reveal a regime in which capability and verbatim memorization can be separated without any explicit unlearning objective.

URL PDF HTML ☆

赞 0 踩 0

2605.31120 2026-06-01 cs.GR cs.AI cs.LG 版本更新

SWIM: Single-Instance Whole-Body Imitation for swiMming

SWIM: 用于游泳的单实例全身模仿

Binglun Wang, Edmond S. L. Ho, He Wang

发表机构 * University College London（伦敦大学学院）； University of Glasgow（格拉斯哥大学）

AI总结提出一种基于物理的游泳动作合成方法SWIM，通过单实例模仿学习实现全身协调与流体连续交互，在数据效率、稳定性、鲁棒性和泛化性上优于现有方法。

详情

AI中文摘要

我们提出了一种合成基于物理的游泳动作的新方法。基于物理的角色动画旨在生成物理有效、可控且自然的动作，能够应对意外干扰，其中难度的一个决定性因素是任务的复杂性，尤其是与所需环境交互的复杂程度。现有研究已在静态和动态环境中的各种任务上取得成功。我们进一步将难度推向游泳，这需要全身协调和与流体的持续交互，这是与环境交互时的一个新复杂性层次。这种复杂性在学习控制时面临挑战，包括在易变的环境力下的控制学习、将控制泛化到不同环境和游泳风格、缺乏数据参考，以及在控制学习过程中不可避免的极其缓慢的物理模拟。为此，我们提出了SWIM，一种新的游泳动作模仿方法，它可以从单个游泳动作中学习，并泛化到未见过的环境、身体条件和游泳风格。广泛的评估和比较表明，SWIM具有数据效率高、稳定、鲁棒和可泛化的特点，在多个任务类别和指标上优于替代方法。

英文摘要

We propose a new method for synthesizing physically-based swimming motions. Physically-based character animation aims to generate physically valid, controllable, and natural-looking motions which can respond to unexpected disturbances, where one dictating factor of difficulty is the complexity of the task, especially the level of sophistication of the required interactions with the environment. Existing research has succeeded in various tasks in static and dynamic environments. We push the difficulty further to swimming, which requires full-body coordination and continuous interactions with fluids, a new level of complexity when it comes to interacting with the environment. This complexity imposes challenges in learning control under volatile environmental forces, generalizing control to different environments and swimming styles, lack of data references, and prohibitively slow physical simulation which is inevitable during control learning. To this end, we propose SWIM, a new imitation method for swimming motions, which can learn from a single swimming motion and generalize to unseen environments, body conditions, and swimming styles. Extensive evaluation and comparison demonstrate that SWIM is data-efficient, stable, robust, and generalizable, outperforming alternative methods across multiple classes of tasks and metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.31119 2026-06-01 cs.RO cs.LG 版本更新

Don't Fool Me Twice: Adapting to Adversity in the Wild with Experience-Driven Reasoning

不要愚弄我两次：通过经验驱动推理在野外适应逆境

Navin Sriram Ravie, Andrew Jong, Krrish Jain, John Liu, Omar Alama, Bijo Sebastian, Sebastian Scherer

发表机构 * Department of Engineering Design, Indian Institute of Technology, Madras（印度理工学院工程设计系，马德拉斯）； Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）

AI总结提出一种持续学习框架，使移动机器人能够在线从干扰中学习，通过语义将异常行为归因于原因，从而更好地预测和规划未来。

详情

AI中文摘要

在机器人学中，危险和逆境模式通常具有具体性且相对于每个智能体。自主移动机器人的一个前沿是使智能体能够在未见的非结构化环境中有效运行。在未见的非结构化环境中的一个重大挑战是可能无法预测特定机器人的所有危险。尽管最近的工作使用大型基础视觉语言模型（VLM）来预先预测一个详尽的常识性危险列表，但仍然难以捕捉可能的交互和依赖于具体性的逆境。我们提出了一个持续学习框架，使移动具身智能体能够在线从干扰中学习，并通过语义将异常行为归因于原因，从而更好地预测和规划未来世界。我们的框架“不要愚弄我两次”首先观察干扰并描述其对机器人的影响；该描述通过视觉上下文增强，以查询VLM预测可能的原因；使用核回归对局部干扰进行特征化，从而实现对瞬态异常的高效、少样本建模。我们利用语义体素中心建模来估计认知不确定性，通过将交互驱动的干扰视为可学习的空间行为，实现更丰富的下游恢复。我们提出了四个假设，并在仿真和硬件上跨具体性和逆境模式进行了验证。

英文摘要

In robotics, dangers and adversity modes are often embodiment-specific and relative to each agent. A frontier of autonomous mobile robotics is to enable agents to operate effectively in the wild in unseen unstructured environments. A significant challenge in unseen unstructured environments is that it may not be possible to predict all the dangers to the specific robot. Although recent work has used large foundation vision-language models (VLMs) to preemptively predict an exhaustive list of common-sense dangers, it remains difficult to capture possible interaction and embodiment-dependent adversities. We propose a continual learning framework for a mobile embodied agent to learn online from disturbances and attribute anomalous behaviours to causes through semantics, enabling better prediction and planning of the world in the future. Our framework, "Don't Fool Me Twice", first observes disturbances and describes their effects on the robot; this description is augmented with visual context to query a VLM to predict possible causes; the local disturbance is characterized using kernel regression, which allows for efficient, few-shot modeling of transient anomalies. We leverage semantic voxel-centric modeling to estimate epistemic uncertainty, enabling richer downstream recovery by treating interaction-driven disturbances as learnable spatial behaviors. We present four hypotheses and validate them in simulation and on hardware across embodiments and adversity modes.

URL PDF HTML ☆

赞 0 踩 0

2605.31111 2026-06-01 cs.LG 版本更新

Subspace-Decomposed JEPAs: Disentangling Progression and Content in Latent World Models

子空间分解的JEPA：解耦潜在世界模型中的进展与内容

Lucas Thil, Jesse Read, Rim Kaddah, Guillaume Doquet

发表机构 * LIX, École Polytechnique（巴黎高等学院LIX实验室）； IRT SystemX（系统X研究院）； Safran Tech（萨弗兰科技）

AI总结提出SD-JEPA方法，通过将JEPA潜在空间分解为正交的进展子空间和内容子空间，利用余弦边际三元组损失和SIGReg正则化分别约束，在控制基准上优于LeWM基线，并证明进展坐标可作为场景感知的指南针。

详情

AI中文摘要

联合嵌入预测架构（JEPA）通过预测未来嵌入来学习紧凑的潜在世界模型，但潜在空间的任何单一坐标都未被指定用于编码任务进展。我们将JEPA潜在空间分解为两个具有不相交角色的正交子空间：一个由余弦边际三元组损失塑造的低维进展子空间，以及一个由LeWM现有SIGReg目标正则化的高维内容子空间。我们证明两个抗坍塌力作用于不相交的坐标，因此它们加性组合而非在同一维度上竞争。我们的方法SD-JEPA在大多数控制基准上以匹配的计算量优于LeWM基线，并在Push-T上优于最强的非LeWM JEPA基线；子空间消融验证了分解是关键因素。除了规划之外，得到的一维角进展坐标在潜在空间中充当场景感知的指南针。它随任务进展而前进，当智能体回溯时后退，在受控扰动下既会尖峰也会重新定位到语义上合适的新任务阶段区域，以预测误差标量无法做到的方式将惊讶时刻与其意义分离。三个定量测试支持这一点：在40个保留的立方体情节中，|Δθ_t|在定位语义事件方面优于标准潜在预测误差惊讶度，最高可达+0.18的合并AUROC（在±1步容差下每情节胜率97.5%）；在所有四个环境（每个环境40个情节）的情节内线性探针显示，8维进展子空间（潜在空间的4.2%）解释了72-95%的任务进展方差。

英文摘要

Joint-Embedding Predictive Architectures (JEPAs) learn compact latent world models by predicting future embeddings, but no single coordinate of the latent is designated to encode task progression. We carve the JEPA latent into two orthogonal subspaces with disjoint roles: a low-dimensional progression subspace shaped by a cosine-margin triplet loss, and a high-dimensional content subspace regularised by the existing SIGReg objective of LeWM. We prove that the two anti-collapse forces act on disjoint coordinates, so they compose additively rather than competing on the same dimensions. Our method, SD-JEPA improves over the LeWM baseline on the majority of its control benchmarks at matched compute, and outperforms the strongest non-LeWM JEPA baseline on Push-T; a subspace-ablation falsifier confirms the split is the load-bearing ingredient. Beyond planning, the resulting 1-D angular progression coordinate functions as a scene-aware compass on the latent. It advances with task progress, regresses when the agent backtracks, and under controlled perturbations both spikes and relocalises to a semantically appropriate new task-phase sector, separating the moment of surprise from its meaning in a way that prediction-error scalars cannot. Three quantitative tests back this up: $|Δθ_t|$ outperforms the standard latent-prediction-error surprise at localising semantic events on 40 held-out cube episodes by up to +0.18 pooled AUROC (97.5% per-episode win rate at $\pm 1$-step tolerance); a within-episode linear probe across all four environments (40 episodes per env) shows the 8-dimensional progression subspace (4.2% of the latent) explains 72-95% of task-progress variance..

URL PDF HTML ☆

赞 0 踩 0

2605.31108 2026-06-01 cs.CV cs.LG 版本更新

Remembering by Reconstructing: Domain Incremental Learning With Test-Time Training on Video Streams

通过重建来记忆：视频流上的域增量学习与测试时训练

Jonathan Swinnen, Tinne Tuytelaars

发表机构 * ESAT, KU Leuven（ESAT，比利时鲁汶大学）

AI总结提出一种结合主任务头和自监督掩码自编码器头的域增量学习方法，通过测试时训练识别最佳LoRA适配器以重新记忆域，适用于视频流数据。

详情

AI中文摘要

在这项工作中，我们提出了一种新颖的域增量学习方法，使模型能够随时间适应不断演变的非平稳数据。与其他工作不同，我们不试图避免灾难性遗忘，而是允许并利用它。我们的模型结合了一个主任务头和一个自监督掩码自编码器（MAE）头。然后在增量训练期间学习特定于域的LoRA适配器。每个适配器专攻其域，自然地在两个头上诱导对其他域的遗忘。在推理时，我们在自监督MAE头上进行在线测试时训练，以识别哪些LoRA最匹配当前输入，从而使模型能够再次“记住”该域。我们的方案特别适用于现实世界的流数据，例如视频，其中连续样本高度相关且域变化是渐进的。我们在域增量动作识别和语义分割任务上展示了我们的方法。

英文摘要

In this work we introduce a novel approach to domain incremental learning, adapting models over time to evolving, non-stationary data. In contrast to other works, we do not attempt to avoid catastrophic forgetting, but rather allow it and exploit it. Our model combines a main task head with a self-supervised masked autoencoder (MAE) head. We then learn domain-specific LoRA adapters during incremental training. Each adapter specializes to its domain, naturally inducing forgetting on other domains in both heads. At inference, we perform online test-time training on the self-supervised MAE head to identify which LoRAs best matches the current input, so the model can `remember' the domain again. Our scheme is especially well-suited to real-world streaming data, such as video, where consecutive samples are highly correlated and domain shifts are gradual. We demonstrate our method on domain-incremental action recognition and semantic segmentation tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.31106 2026-06-01 cs.LG 版本更新

Riemannian Diffusion Models on General Manifolds via Physics-Informed Neural Networks

基于物理信息神经网络的通用流形上的黎曼扩散模型

Gyeonghoon Ko, Juho Lee

发表机构 * Korea Advanced Institute of Science and Technology, Korea（韩国科学技术院）

AI总结针对黎曼流形上热核难以解析计算的问题，提出用物理信息神经网络求解流形热方程来近似热核，从而实现扩散模型的训练与采样。

2605.31070 2026-06-01 cs.LG cs.GT 版本更新

Learning to Bid in FCR Markets: A Best-of-Both-Worlds Approach

在FCR市场中学习投标：一种两全其美的方法

Marius Potfer, Cheng Wan, Pierre Gruet

发表机构 * EDF Lab Paris-Saclay, FiME (Laboratoire de Finance des Marchés de l’Énergie)（EDF巴黎萨克雷实验室，FiME（能源市场金融实验室））

AI总结针对欧洲频率控制储备（FCR）市场中投标者仅能观察到部分反馈（如出清价格和分配数量）的问题，提出了一种将多国FCR出清问题转化为重复多单位统一价格拍卖的方法，并采用两全其美的组合半强盗算法实现对数伪遗憾（随机环境）和平方根遗憾（对抗环境），实验验证了其理论缩放性和实际竞争力。

Comments Algorithms and data available at https://data.mendeley.com/datasets/htprbf47dg/1

详情

AI中文摘要

在欧洲频率控制储备（FCR）市场中，由于竞争报价是隐藏的，投标者只能观察到来自市场的部分反馈，如出清价格和分配数量，因此对于灵活性提供商而言，投标具有挑战性。对于活跃在单个国家的参与者，我们证明多国FCR出清问题可以转化为针对内生对手报价向量的重复多单位统一价格拍卖。这种重新表述产生了一个在线学习问题，并使我们能够适应一种两全其美的组合半强盗算法，该算法可从这种标准市场反馈中实现。由此产生的投标者在随机环境中实现对数伪遗憾，在对抗环境中实现$\mathcal{O}(\sqrt{T})$遗憾。综合实验验证了预期的缩放性，对历史欧洲FCR数据的回测显示了实际中的竞争性能：该方法在稳定产品上表现尤其出色，而EXP3类型的基线在更强的非平稳性下可能更安全。总体而言，结果表明，当学习规则与产品级市场稳定性相匹配时，基于学习的FCR市场投标在理论上是有根据的，在实践中是有用的。

英文摘要

Bidding in the European Frequency Containment Reserve (FCR) market is challenging for flexibility providers because competing offers are hidden and bidders observe only partial feedback form the market, such as, clearing price and awarded quantity. For a participant active in a single country, we show that the multi-country FCR clearing problem can be recast as a repeated multi-unit uniform-price auction against an endogenous vector of opposing bids. This reformulation yields an online learning problem and allows us to adapt a Best-of-Both-Worlds combinatorial semi-bandit algorithm implementable from this standard market feedback. The resulting bidder achieves logarithmic pseudo-regret in stochastic environments and $\mathcal{O}(\sqrt{T})$ regret in adversarial ones. Synthetic experiments confirm the expected scaling, and backtests on historical European FCR data show competitive performance in practice: the method performs especially well on stable products, while EXP3-type baselines can be safer under stronger non-stationarity. Overall, the results show that learning-based bidding in FCR markets is theoretically grounded and practically useful when the learning rule matches product-level market stability.

URL PDF HTML ☆

赞 0 踩 0

2605.31063 2026-06-01 stat.ML cs.LG physics.chem-ph physics.comp-ph 版本更新

Free energy Estimation on Any State Space

任意状态空间上的自由能估计

Jiajun He, Zijing Ou, Francisco Vargas, Yingzhen Li, José Miguel Hernández-Lobato, Carles Domingo-Enrich, Yuanqi Du

发表机构 * University of Cambridge（剑桥大学）； Imperial College London（伦敦帝国理工学院）； Xaira Therapeutics（Xaira制药）； Microsoft Research New England（微软研究院新英格兰分部）

AI总结提出一种基于广义神经传输学习的框架，将自由能估计推广到任意状态空间，并揭示时间反演与Doob h-变换的群论结构。

详情

AI中文摘要

自由能估计是一个从物理学到统计学的基础且具有挑战性的问题。经典方法依赖于热力学变换，包括直接估计、准静态积分和有限时间平均。最近的工作[He and Du et al., 2025]通过学习神经传输显著加速了有限时间区间的效率。在本文中，我们将此框架推广到任意状态空间。基于这一观点，我们开发了一种广义神经传输学习方法以实现高效估计。实验验证了所提方法在连续设置之外的有效性和效率，扩展到离散和多模态空间以及自回归设置。除了自由能估计，我们还建立了代数恒等式并揭示了连接无穷小时间反演和广义Doob h-变换的群论结构，表明它们的组合形成一个广义二面体群。

英文摘要

Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.

URL PDF HTML ☆

赞 0 踩 0

2605.31061 2026-06-01 cs.LG cs.AI 版本更新

使用强化学习控制工业能源系统的挑战

Tobias Lademann, Théo Vincent, Jan Peters, Matthias Weigold

发表机构 * Institute for Production Management, Technology and Machine Tools (PTW), Technical University of Darmstadt（技术大学达姆施塔特生产管理、技术与机床研究所）； DFKI GmbH, SAIROL（DFKI GmbH，SAIROL）； Department of Computer Science, Technical University of Darmstadt（技术大学达姆施塔特计算机科学系）； Hessian.ai, Technical University of Darmstadt（黑森人工智能公司，技术大学达姆施塔特）

AI总结本文以热力供暖网络为例，研究强化学习在真实工业能源系统部署中的挑战，包括部分可观测性、动作空间设计、奖励设计及仿真到现实的差距，并基于实际部署发现强化学习虽能实现运行稳定性但存在性能差距。

Comments Submitted to Finding the Frame Workshop at RLC 2026

2605.31043 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?

Stiefel流形上的路由：自适应子空间选择何时有助于跨域脑电解码？

Isabella Costa Maia, Pedro L. C. Rodrigues, Salem Said, Marco Congedo

发表机构 * GIPSA-lab, University Grenoble Alpes, CNRS, Grenoble-INP（GIPSA实验室，格勒诺布尔阿尔卑斯大学，法国国家科学研究中心，格勒诺布尔-INP）； Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学，法国国家信息与自动化研究所，法国国家科学研究中心，格勒诺布尔-INP，LJK）； Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学，法国国家科学研究中心，格勒诺布尔-INP，LJK）

AI总结针对跨域脑电解码中协方差矩阵域偏移问题，提出动态Stiefel路由方法，通过Stiefel流形上的专家投影滤波器池和交叉注意力机制实现自适应子空间选择，并引入三种结构性质避免退化为集成平均，在三个数据集上取得一致提升。

详情

AI中文摘要

尽管黎曼深度学习取得了进展，跨域脑电解码仍然具有挑战性：来自不同受试者的协方差矩阵占据了SPD流形上系统不同的区域，然而现有的域适应方法要么需要目标域校准数据，要么学习无法跨域泛化的受试者特定组件。我们提出了动态Stiefel路由：在Stiefel流形上有一个包含$K$个专家投影滤波器的池，每个滤波器专门处理SPD流形上的不同区域，每个输入协方差通过交叉注意力路由到最合适的滤波器，从而为每个样本自适应调整子空间投影。一个核心发现是，这种朴素实现的方法会退化为集成平均：当路由权重均匀时，自适应滤波器恰好等价于专家的等贡献组合，与单个固定滤波器无法区分。三种结构性质打破了这种退化：一个对称锚点$W_{\mathrm{base}} \in \mathrm{St}(n,k)$消除了专家间的邻近偏差；一个冻结的域判别查询编码器将路由与任务优化解耦；以及一个解耦的键对齐损失，将专家键训练到稳定的域吸引子。它们共同产生了SPD流形上第一个真正承诺且域结构化的路由，在三个数据集上取得一致提升：平衡准确率分别从$0.773\to 0.823$、$0.757\to 0.809$和$0.801\to 0.839$，且对齐策略由单一数据驱动规则自动确定，无需数据集特定的超参数搜索。

英文摘要

Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the Stiefel manifold, each specialised for a different region of the SPD manifold, with each input covariance routed to the most appropriate filter via cross-attention, adapting the subspace projection per sample. A central finding is that this approach, implemented naively, provably collapses to ensemble averaging: when routing weights are uniform, the adaptive filter reduces exactly to an equal-contribution combination of experts, indistinguishable from a single fixed filter. Three structural properties break this degeneracy: a symmetric anchor $W_{\mathrm{base}} \in \mathrm{St}(n,k)$ that removes proximity bias among experts; a frozen domain-discriminative query encoder that decouples routing from task optimisation; and a decoupled key alignment loss that trains expert keys toward stable domain attractors. Together they produce the first genuinely committed and domain-structured routing on SPD manifolds, with consistent gains across three datasets: balanced accuracy improves from $0.773\to 0.823$, $0.757\to 0.809$, and $0.801\to 0.839$, with the alignment strategy determined automatically by a single data-driven rule and no dataset-specific hyperparameter search.

URL PDF HTML ☆

赞 0 踩 0

2605.31040 2026-06-01 cs.LG 版本更新

UniRTL: Unifying Code and Graph for Robust RTL Representation Learning

UniRTL：统一代码和图以实现稳健的RTL表示学习

Yi Liu, Hongji Zhang, Lei Chen, Mingxuan Yuan, Qiang Xu

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR（计算机科学与工程系，香港中文大学，香港特别行政区）； Noah's Ark Lab, Huawei, Hong Kong SAR（华为诺亚实验室，香港特别行政区）

AI总结提出UniRTL多模态预训练框架，通过互掩码建模和分层训练策略联合利用RTL代码与控制数据流图，实现细粒度对齐，在性能预测和代码检索任务上优于现有方法。

Comments Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

为寄存器传输级（RTL）设计开发有效的表示对于加速硬件设计工作流至关重要。然而，现有方法通常依赖于单一数据模态，即RTL代码或其相关的基于图的表示，限制了所学表示的表达能力和泛化能力。对于RTL，控制数据流图（CDFG）提供了保留完整信息的全面结构表示，而代码模态显式编码了语义和功能信息。我们认为，整合这些互补模态对于全面理解RTL设计至关重要。为此，我们提出UniRTL，一种多模态预训练框架，通过联合利用代码和CDFG学习统一的RTL表示。UniRTL通过互掩码建模实现代码和图之间的细粒度对齐，并采用分层训练策略，该策略结合了预训练的图感知分词器以及在图集成之前对文本（即功能摘要）和代码进行分阶段对齐。我们在两种下游任务（性能预测和代码检索）的多种设置下评估UniRTL。实验结果表明，UniRTL始终优于先前的方法，使其成为推进硬件设计自动化的更稳健和更强大的基础。

英文摘要

Developing effective representations for register transfer level (RTL) designs is crucial for accelerating the hardware design workflow. Existing approaches, however, typically rely on a single data modality, either the RTL code or its associated graph-based representation, limiting the expressiveness and generalization ability of the learned representations. For RTL, the control data flow graph (CDFG) offers a comprehensive structural representation that preserves complete information, while the code modality explicitly encodes semantic and functional information. We argue that integrating these complementary modalities is essential for a thorough understanding of RTL designs. To this end, we propose UniRTL, a multimodal pretraining framework that learns unified RTL representations by jointly leveraging code and CDFG. UniRTL achieves fine-grained alignment between code and graph through mutual masked modeling and employs a hierarchical training strategy that incorporates a pretrained graph-aware tokenizer and staged alignment of text (i.e., functional summary) and code prior to graph integration. We evaluate UniRTL on two downstream tasks, performance prediction and code retrieval, under multiple settings. Experimental results show that UniRTL consistently outperforms prior methods, establishing it as a more robust and powerful foundation for advancing hardware design automation.

URL PDF HTML ☆

赞 0 踩 0

2605.31036 2026-06-01 cs.GT cs.LG 版本更新

Model Monotonicity in Autobidding Auctions: When Do Better Predictions Lead to Better Outcomes?

自动竞价拍卖中的模型单调性：更好的预测何时带来更好的结果？

Ashwinkumar Badanidiyuru

发表机构 * Uber Technologies, Inc.（优步技术公司）

AI总结研究在线广告中推荐系统模型质量、拍卖格式和自动竞价者行为的相互作用，通过聚类精炼定义模型改进，并系统刻画不同竞价者类型、拍卖格式和预算约束下评估指标单调性的条件。

详情

Journal ref: ICML 2026

AI中文摘要

在线广告平台依赖机器学习模型预测点击率（pCTR）和转化率（pCVR）以用于拍卖机制。我们引入了一个新框架来研究推荐系统模型质量、拍卖格式和自动竞价者行为之间的相互作用。我们形式化了模型改进——通过受概率论中滤子启发的精炼关系定义——何时导致平台级评估指标（如收入、福利或流动性福利）的改进。我们的主要贡献是：（1）基于聚类精炼的模型改进的形式化定义，以及（2）跨不同竞价者类型（tCPA、max-CPA）、拍卖格式（第一价格、第二价格、VCG）和预算约束的ECM单调性的系统刻画。我们证明，具有统一竞价的第一价格拍卖保证了无预算的tCPA竞价者的收入单调性（通过Jensen不等式），而第二价格拍卖和预算约束可能破坏这一性质。我们为非单调性结果提供了完整的数值构造。我们的发现对寻求将模型改进与业务成果对齐的广告平台具有实际意义。

英文摘要

Online advertising platforms rely on machine learning models to predict click-through rates (pCTR) and conversion rates (pCVR) for auction mechanisms. We introduce a novel framework to study the interaction between recommender system model quality, auction format, and autobidder behavior. We formalize when model improvements -- defined via a refinement relation inspired by filtrations in probability theory -- lead to improvements in platform-level Evaluation Criteria Metrics (ECM) such as revenue, welfare, or liquid welfare. Our main contributions are: (1) a formal definition of model improvement based on cluster refinement, and (2) a systematic characterization of ECM monotonicity across different combinations of bidder types (tCPA, max-CPA), auction formats (first-price, second-price, VCG), and budget constraints. We show that first-price auctions with uniform bidding guarantee revenue monotonicity for tCPA bidders without budgets (via Jensen's inequality), while second-price auctions and budget constraints can break this property. We provide full numerical constructions for the non-monotonicity results. Our findings have practical implications for advertising platforms seeking to align model improvements with business outcomes.

URL PDF HTML ☆

赞 0 踩 0

2605.31034 2026-06-01 cs.LG cs.AI 版本更新

基于人格的生成式AI多元对齐评估框架

Atahan Karagoz

发表机构 * Atahan Karagöz（阿塔汗·卡拉戈兹）

AI总结提出一种状态空间约束仿真框架，通过合成认知轮廓替代单一评估函数，实现反映真实世界共识变异性的多元、视角依赖的基准测试，并分析仿真评估者的稳定性问题，论证动态调节机制的必要性。

详情

AI中文摘要

当前生成式人工智能的对齐范式主要依赖单一基准测试框架，将人类判断的多元性简化为聚合统计基线，从而掩盖了评估中的文化、人口和语境变异性。我们引入一种用于AI评估的状态空间约束仿真框架，用代表不同人类视角的合成认知轮廓的结构化流形替代单一评估函数。我们表明，现代生成架构能够以高度一致性实例化和维护这些评估人格，从而实现一种更接近现实世界共识变异性的多元、视角依赖的基准测试。然而，我们进一步分析了这些模拟评估者在顺序推理和随机提示扰动下的稳定性，揭示了人格一致性的系统性退化，表现为状态空间漂移和语义不一致。这些发现表明，静态对齐约束不足以维持随时间推移的稳健评估行为。相反，我们主张必须在生成系统中嵌入动态的、可行性驱动的调节机制，以保持连贯的认知仿真。通过将基于人格的评估视为潜在表征流形上的结构化动力系统，本研究为更自适应、更符合人类、更注重语境的AI评估方法奠定了基础。

英文摘要

Current alignment paradigms for generative artificial intelligence rely predominantly on monolithic benchmarking frameworks that reduce the plurality of human judgment to aggregated statistical baselines, thereby obscuring cultural, demographic, and contextual variability in evaluation. We introduce a state-space constrained emulation framework for AI evaluation that replaces singular assessment functions with a structured manifold of synthetic cognitive profiles representing diverse human perspectives. We show that modern generative architectures can instantiate and maintain these evaluative personas with high consistency, enabling a form of pluralistic, perspective-dependent benchmarking that more closely reflects real-world consensus variability. However, we further analyze the stability of these simulated evaluators under sequential inference and stochastic prompt perturbations, revealing systematic degradation in persona coherence that manifests as state-space drift and semantic inconsistency. These findings suggest that static alignment constraints are insufficient for sustaining robust evaluative behavior over time. Instead, we argue for the necessity of embedding dynamic, viability-driven regulatory mechanisms within generative systems to preserve coherent cognitive emulation. By framing persona-based evaluation as a structured dynamical system over latent representation manifolds, this study provides a foundation for more adaptive, human-aligned, and context-sensitive approaches to AI evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.31016 2026-06-01 cs.LG 版本更新

HetCCL：实现混合供应商异构集群的集体通信

Yuejie Wang, Tao Chang, Yuanyuan Zhao, Yulong Ao, Zeyu Gu, Zhiyu Li, Yanmin Jia, Yan Zhang, Mingjun Zhang, He Liu, Yongzhe He, Yonghua Lin, Guyue Liu

发表机构 * Peking University（北京大学）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）

AI总结提出HetCCL框架，通过高效P2P传输和边界通信器机制，在异构集群中实现跨供应商的集体通信，消除主机-设备内存拷贝开销，并优化带宽利用率。

详情

AI中文摘要

在异构集群上训练大型语言模型（LLM）给集体通信带来了重大挑战，因为来自多个供应商的硬件引入了多样化的网络和计算特性。现有的为同构环境设计的集体通信框架（如NCCL、RCCL）无法处理混合硬件设置，而支持异构的通信库（如Gloo、OpenMPI）在数据路径中引入了大量开销。本文提出了HetCCL，一个通过跨异构设备（如GPU）的高效P2P传输实现异构集体通信的框架，消除了主机-设备内存拷贝开销，同时将控制卸载到CPU。对于组合集体（如AllReduce、ReduceScatter），HetCCL引入了一种边界通信器机制，通过使用供应商集体通信库中组合集体的内在归约来实现供应商独立性。凭借高效的异构P2P传输和可移植的归约机制，HetCCL提出了异构集群的层次拓扑抽象，将集体通信分解为集群级原语，保证了最优的跨集群数据传输量和最优的带宽利用率。我们实现了支持4种不同供应商的HetCCL，并在4种异构设置下使用基准测试和端到端LLM任务进行了评估。评估结果表明，在异构通信中，HetCCL的带宽比Gloo高17-19倍，并且在端到端训练中每步时间加速高达16.9%。

英文摘要

Training Large Language Models (LLMs) on heterogeneous clusters presents significant challenges for collective communication, as hardware from multiple vendors introduces diverse network and computational characteristics. Existing collective communication frameworks (e.g., NCCL, RCCL) designed for homogeneous environments fail to address mixed-hardware setups, while communication libraries with heterogeneous support (e.g., Gloo, OpenMPI) incur heavy overhead in the data path. This paper presents HetCCL, a framework that enables heterogeneous collective communication by efficient P2P transport across heterogeneous devices (e.g., GPUs), eliminating the host-device memory copy overhead while offloading the control to the CPUs. For combining collectives (e.g., AllReduce, ReduceScatter), HetCCL introduces a border-communicator mechanism that achieves vendor independence by using the intrinsic reduction in the combining collectives in vendor collective communication libraries. With efficient heterogeneous P2P transport and portable reduction mechanism, HetCCL proposes a hierarchical topology abstraction for heterogeneous clusters, dissecting collective communication into cluster-level primitives that guarantee optimal cross-cluster data transfer volume and optimal bandwidth utilization. We implement HetCCL with 4 different vendor support and evaluate it in 4 heterogeneous settings with benchmarks and end-to-end LLM tasks. Our evaluation shows that HetCCL achieves 17-19x higher bandwidth than Gloo in heterogeneous communications, and speeds up end-to-end training by up to 16.9% in the per-step-time.

URL PDF HTML ☆

赞 0 踩 0

2605.30997 2026-06-01 stat.ML cs.LG 版本更新

Hedging on the Frontier: Learning New Tasks with Few Samples

前沿对冲：基于少量样本学习新任务

Tobias Wegel, Federico Di Gennaro, Geelon So, Fanny Yang

发表机构 * Department of Computer Science, ETH Zurich（苏黎世联邦理工学院计算机科学系）； Department of Computer Science and Engineering, UC San Diego（南加州大学计算机科学与工程系）

AI总结针对新任务样本少的问题，利用弱单调性假设，通过转移学习和模型选择聚合在模型前沿进行对冲，实现可证明的统计增益。

2605.30992 2026-06-01 cs.LG 版本更新

Eigenvectors of Experts are Training-free Non-collapsing Routers

专家特征向量是无需训练的非崩溃路由器

Giang Do, Hung Le, Truyen Tran

发表机构 * Applied Artificial Intelligence Intiative (A2I2), Deakin University, Victoria, Australia（应用人工智能倡议（A2I2），德肯大学，维多利亚，澳大利亚）

AI总结针对稀疏混合专家模型中专家崩溃问题，提出基于专家权重矩阵特征向量的无需训练路由框架SSMoE，通过奇异值分解利用谱特性提升模型性能。

Comments 24 pages

详情

Journal ref: ICML 2026

AI中文摘要

稀疏混合专家（SMoE）架构通过将输入令牌路由到选定的专家子集来提高大型语言模型（LLMs）的训练效率。尽管取得了显著成功，SMoE模型在训练和推理中仍面临专家崩溃问题（Chi等人，2022），这会降低模型性能。先前研究主要关注改进路由器；然而，这些方法依赖于从头训练或微调，需要高昂的计算和数据处理成本。此外，我们通过理论和实证结果证明，尽管有这些努力，在推进预训练良好的SMoE模型时，该问题仍然存在。为填补这一空白，我们分析了先进的SMoE模型，观察到专家权重矩阵的特征向量编码了丰富的语义信息，指向传统路由策略的有效替代方案。基于这一见解，我们提出了奇异值分解SMoE（SSMoE），一种新颖且无需训练的框架，利用专家权重的谱特性来解决崩溃问题并提升模型性能。在多种语言和视觉任务上的大量实验，包括干净和损坏数据设置，证明了SSMoE的强大泛化能力和鲁棒性。我们的发现强调了更深入理解模型内部结构如何指导开发更有效的SMoE架构。我们的实现已在https://github.com/giangdip2410/SSMoE公开。

英文摘要

Sparse Mixture of Experts (SMoE) architectures improve the training efficiency of Large Language Models (LLMs) by routing input tokens to a selected subset of specialized experts. Despite their remarkable success, both training and inference in SMoE models suffer from the expert collapse issue (Chi et al., 2022), which degrades model performance. Prior studies primarily focus on improving the router; however, such methods rely on training from scratch or fine-tuning, which requires high computational and data-processing costs. Furthermore, we demonstrate that, despite these efforts, the issue persists when advancing well-pretrained SMoE models, as evidenced by both theoretical and empirical results. To fill that gap, we analyze the advanced SMoE models and observe that the eigenvectors of expert weight matrices encode rich semantic information, pointing to an effective alternative to conventional routing strategies. Building on this insight, we propose Singular Value Decomposition SMoE (SSMoE), a novel and training-free framework that leverages spectral properties of the expert weights to address the collapse issue and enhance model performance. Extensive experiments across diverse language and vision tasks, under both clean and corrupt data settings, demonstrate the strong generalization and robustness of SSMoE. Our findings highlight how a deeper understanding of model internals can guide the development of more effective SMoE architectures. Our implementation is publicly available at https://github.com/giangdip2410/SSMoE.

URL PDF HTML ☆

赞 0 踩 0

2605.30991 2026-06-01 cs.LG cs.CV 版本更新

Parallel Tempering Initial Sampling in Inference-Time Reward Alignment

推理时奖励对齐中的并行回火初始采样

Myeongjun Oh, Gwangho Kim, Sungyoon Lee

发表机构 * Department of Artificial Intelligence（人工智能系）； Department of Computer Science（计算机科学系）

AI总结针对推理时奖励对齐中标准SMC方法因初始采样陷入局部模式的问题，提出基于并行回火的PATHS方法，通过耦合多条回火链实现高效探索，提升对齐质量。

Comments 31 pages, 11 figures

详情

AI中文摘要

推理时奖励对齐无需重新训练即可引导预训练的扩散和基于流的生成模型满足用户指定的奖励。最近，序贯蒙特卡洛（SMC）通过迭代过滤和传播多个粒子成为该任务的有力框架。然而，我们表明基于SMC的标准方法通常性能不佳，因为它们从标准先验初始化粒子，而复杂奖励景观中的高奖励区域极为罕见。此外，我们表明即使最近的奖励感知初始采样方法仍然容易陷入局部模式，因为复杂奖励景观通常是多模态的。为克服这些限制，我们提出PATHS（用于高复杂度奖励采样的并行回火），一种通过并行回火耦合多个采样链的新型初始化方法。PATHS维护一个奖励回火链的阶梯，并定期执行Metropolis交换，从而在平坦化的奖励景观中实现高效探索，缓解模式陷阱问题。我们的分析表明，该机制显著增强了有限预算下对通常难以采样的罕见高奖励区域的探索。在布局到图像和数量感知生成上的实验表明，PATHS在对齐质量上取得了一致的提升，尤其是在复杂提示上。

英文摘要

Inference-time reward alignment steers pretrained diffusion and flow-based generative models to satisfy user-specified rewards without retraining. Recently, Sequential Monte Carlo (SMC) has emerged as a powerful framework for this task by iteratively filtering and propagating multiple particles. However, we show that standard SMC-based methods often suffer from poor performance because they initialize particles from a standard prior, whereas high-reward regions in complex reward landscapes are extremely rare. Further, we show that even recent reward-aware initial sampling approaches remain vulnerable to getting trapped in local modes, as complex reward landscapes are often multi-modal. To overcome these limitations, we propose PATHS (PArallel Tempering for High-complexity reward Sampling), a novel initialization method that couples multiple sampling chains through parallel tempering. PATHS maintains a ladder of reward-tempered chains and periodically performs Metropolis swaps, enabling efficient exploration across flattened reward landscapes, thereby mitigating the mode-trapping issues. Our analysis reveals that this mechanism substantially enhances the finite-budget exploration of rare, high-reward regions that are typically challenging to sample. Experiments on layout-to-image and quantity-aware generation show that PATHS achieves consistent gains in alignment quality, particularly on complex prompts.

URL PDF HTML ☆

赞 0 踩 0

2605.30981 2026-06-01 cs.CL cs.LG 版本更新

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

自回归Transformer中的认知疲劳：形式化与测量

Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain, Michael Stewart, Amit Sheth

发表机构 * Guru Gobind Singh Indraprastha University, India（古鲁·戈宾德·辛格·印度普拉斯塔大学）； Artificial Intelligence Institute, University of South Carolina, USA（人工智能研究所，南卡罗来纳大学）； Indian Institute of Technology, Kanpur, India（印度理工学院，坎浦尔）； Indian AI Research Organization, India（印度人工智能研究组织）

AI总结本文形式化自回归语言模型在长程生成中的退化现象为认知疲劳，并提出轻量级诊断指标疲劳指数（FI），通过聚合注意力衰减、表征漂移和熵校准三个信号实现实时监测，实验表明FI能高精度预测任务退化和重复生成。

Comments 9 pages, 7 figures. Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

自回归语言模型在长程生成过程中经常退化，产生重复文本、失去指令遵循能力并表现出不稳定的熵。尽管这些失败普遍存在，但从业者缺乏在线诊断工具来实时检测它们。我们将这种退化形式化为认知疲劳，这是一种可测量的生成时状态，其特征是对原始提示的注意力衰减、表征漂移和熵校准错误。我们引入了疲劳指数（FI），这是一种轻量级、模型无关的诊断方法，在明确的公理（单调性、有界性、可解释性）下聚合这三个信号，从而实现可靠的运行时监控。在九个模型（1B-13B参数）上，FI轨迹表现出结构化的时间动态，预测任务退化（AUROC = 0.95）和重复（Spearman rho = 0.94），并揭示了非单调的缩放行为：低于3B的指令微调模型比基础模型退化更快，而在7B时这一趋势逆转。压力分析进一步表明，在更长的上下文、中间位置的证据和降低的数值精度下，FI onset加速。这些结果确立了认知疲劳作为一个连贯且可测量的现象，并将FI定位为生产级LLM系统中运行时可靠性监控的原则性工具。

英文摘要

Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real-time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axioms (monotonicity, boundedness, interpretability) enabling reliable runtime monitoring. Across nine models (1B-13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (Spearman rho = 0.94), and reveal non-monotonic scaling behavior: instruction-tuned models below 3B exhibit faster collapse than base models, with this trend reversing at 7B. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.

URL PDF HTML ☆

赞 0 踩 0

2605.30976 2026-06-01 stat.ML cs.IT cs.LG math.IT 版本更新

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

具有1比特通信约束的批量随机线性赌博机

Ivan Lau, Daniel McMorrow, Kevin Jamieson, Jonathan Scarlett

发表机构 * National University of Singapore（新加坡国立大学）； University of Washington（华盛顿大学）

AI总结研究在批量大小B和每批仅1比特反馈的通信约束下，随机线性赌博机的遗憾最小化问题，提出了两种基于G-最优设计和1比特均值估计的相位消除算法，实现了接近无约束线性赌博机的最优遗憾。

详情

AI中文摘要

我们研究了在批处理和通信约束的自然组合下的随机线性赌博机：时间范围被划分为大小相等的批次$B$，在每个批次中，学习器向一个智能体发送$B$个请求的臂拉动，智能体观察相应的$B$个奖励，并用单个比特的反馈回复学习器。对于每个批次，学习器指定智能体使用的1比特量化规则，该规则可能依赖于所有先前接收到的比特，但不直接依赖于任何过去的奖励。这一设置解决了先前模型（仅有每轮量化或仅有总比特预算）之间一个显著但尚未探索的“中间地带”。我们建立了一个极小极大下界，表明由于1比特通信瓶颈，即使在没有噪声的情况下，$Ω(B\min\{d,\log\lvert \mathcal{A} vert\})$的遗憾也是不可避免的。结合标准的统计极限，这给出了一个通用的下界$\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} vert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} vert\}})$。我们开发了两种基于$G$-最优设计和1比特均值估计的相位消除算法。第一种算法实现了$\widetilde{O}(dB + d\sqrt{T})$的遗憾，当$\lvert \mathcal{A} vert = \exp(Ω(d))$时，该下界在对数因子内匹配；第二种算法结合了安全臂识别和热启动过程，获得了$\widetilde{O}(B\log\lvert \mathcal{A} vert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} vert})$的遗憾，在$(\lvert \mathcal{A} vert, B, d, T)$的广泛缩放范围内接近最优。总之，我们的结果表明，每批仅需一个比特的反馈就足以在广泛的缩放范围内几乎匹配无约束线性赌博机的极小极大遗憾，即使对于$Θ(\sqrt{T})$这样大的批量大小也是如此。

英文摘要

We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all previously received bits but not on any past rewards directly. This setting addresses a significant yet unexplored ``middle ground'' between previous models having per-round quantization only or total bit budgets only. We establish a minimax lower bound showing that $Ω(B\min\{d,\log\lvert \mathcal{A} \rvert\})$ regret is unavoidable due to the 1-bit communication bottleneck, even in the absence of noise. Combined with standard statistical limits, this yields a general lower bound of $\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} \rvert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} \rvert\}})$. We develop two phased-elimination algorithms based on $G$-optimal designs and 1-bit mean estimation. The first achieves $\widetilde{O}(dB + d\sqrt{T})$ regret, matching the lower bound up to logarithmic factors when $\lvert \mathcal{A} \rvert = \exp(Ω(d))$, and the second incorporates a safe-arm identification and warm-start procedure to obtain $\widetilde{O}(B\log\lvert \mathcal{A} \rvert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} \rvert})$ regret, which is near-optimal in broad scaling regimes of $(\lvert \mathcal{A} \rvert, B, d, T)$. Together, our results demonstrate that a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes, even for batch sizes as large as $Θ(\sqrt{T})$.

URL PDF HTML ☆

赞 0 踩 0

2605.30960 2026-06-01 cs.LG 版本更新

Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens

重新审视零阶Hessian近似：单步策略优化视角

Junbin Qiu, Zhaowei Hong, Renzhe Xu, Yao Shu

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Shanghai University of Finance and Economics（上海财经大学）

AI总结本文通过单步策略优化视角统一零阶Hessian估计，提出方差缩减的ZoVH框架，实现全Hessian矩阵、正则化逆及偏差校正逆Hessian-梯度积的高效估计。

详情

AI中文摘要

精确的零阶Hessian估计是无导数方法的基石，对于双层优化、贝叶斯推断和不确定性量化等任务至关重要。然而，在高维设置中获取完整的低方差Hessian及其逆估计器仍然是一个重大挑战。为了解决这一问题，我们提出了一个统一框架，通过单步策略优化的视角重新解释零阶Hessian近似。该视角建立了通用零阶Hessian估计器与平滑策略优化目标Hessian之间的理论等价性，将不同的经典随机估计器统一为基线选择的特定实例。在此基础上，我们引入了ZoVH，一个针对全Hessian矩阵、其正则化逆以及偏差校正的逆Hessian-梯度积的方差缩减估计器套件。ZoVH利用两种关键技术：(1) 推导出的唯一最优基线，可证明最小化方差；(2) 一种查询重用策略，结合历史函数查询以提高样本效率而不增加成本。我们严格的理论分析证实了Hessian估计器的无偏性，验证了基线的方差最优性，提供了整个ZoVH套件的误差界，并为由此产生的曲率感知零阶算法建立了收敛保证。广泛的实证结果验证了我们的理论发现，表明ZoVH在实际应用中实现了卓越的估计精度和收敛性能。代码可在 https://github.com/Qjbtiger/ZoVH 获取。

英文摘要

Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimators for the Hessian and its inverse in high-dimensional settings remains a significant challenge. To address this, we propose a unified framework that reinterprets ZO Hessian approximation through the lens of single-step Policy Optimization (PO). This perspective establishes a theoretical equivalence between general ZO Hessian estimators and the Hessian of a smoothed PO objective, unifying distinct classical randomized estimators as specific instances of baseline selection. Building on this foundation, we introduce ZoVH, a comprehensive suite of variance-reduced estimators for the full Hessian matrix, its regularized inverse, and the bias-corrected inverse Hessian-gradient product. ZoVH leverages two key techniques: (1) a unique optimal baseline derived to provably minimize variance, and (2) a query reuse strategy that incorporates historical function queries to enhance sample efficiency without inflating costs. Our rigorous theoretical analysis confirms the unbiasedness of the Hessian estimator, validates the variance optimality of our baseline, provides error bounds for the entire ZoVH suite, and establishes convergence guarantees for the resulting curvature-aware ZO algorithm. Extensive empirical results validate our theoretical findings, demonstrating that ZoVH achieves superior estimation accuracy and convergence performance in real-world applications. Code is available at https://github.com/Qjbtiger/ZoVH

URL PDF HTML ☆

赞 0 踩 0

2605.30936 2026-06-01 cs.LG math.OC stat.ML 版本更新

Local linear convergence of gradient methods for overparameterized Gaussian mixtures

过参数化高斯混合模型梯度方法的局部线性收敛性

Jingxing Wang, Vasileios Charisopoulos, Maryam Fazel

发表机构 * Electrical & Computer Engineering, University of Washington（华盛顿大学电气与计算机工程系）； National Institute for Theory and Mathematics in Biology（生物理论与数学国家研究所）； Amazon, Inc.（亚马逊公司）

AI总结针对过参数化高斯混合模型，提出一种交替使用短梯度步和长Polyak步的方法，实现局部线性收敛速率，克服了过参数化导致的慢收敛问题。

Comments 45 pages, 7 figures

详情

AI中文摘要

我们研究了过参数化下学习高斯混合模型的问题。先前的工作表明，虽然过参数化对于避免虚假局部最优和通过梯度EM算法实现全局恢复真实模型至关重要，但它会显著减慢局部收敛速度。在混合权重的某些假设下，我们证明了统计学习过程最小化的标准散度度量具有一个缓慢增长的流形，在该流形上著名的Polyak步长可以几何级地减少损失，并设计了一种基于梯度的方法，该方法以局部线性速率收敛到极小值点。此外，我们表明，对于具有任意权重的混合模型，我们的方法收敛到接近最优的解——直到一个自然的误设阈值。在高层次上，该方法在接近流形的几个“短”梯度下降步和收缩到极小值点距离的“长”Polyak步之间交替。我们的结果表明，慢收敛不是过参数化的内在挑战，而是可以通过利用损失景观的有利结构来克服。

英文摘要

We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.

URL PDF HTML ☆

赞 0 踩 0

2605.30919 2026-06-01 cs.LG cs.AI 版本更新

De-attribute to Forget for LLM Unlearning

Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng, Anthony Kum Hoe Tung, Bryan Kian Hsiang Low

发表机构 * Department of Computer Science, National University of Singapore（新加坡国立大学计算机科学系）

AI总结本文提出基于数据归因奖励的LLM遗忘框架DareU，通过强化学习降低生成响应与遗忘数据的归因分数，实现有效遗忘并平衡模型效用。

详情

AI中文摘要

大型语言模型（LLM）的快速发展引发了对使用不当数据进行训练的担忧，这导致了对LLM遗忘研究的兴趣日益增长。许多现有的LLM遗忘方法依赖于优化预测损失，例如最大化遗忘集上的损失，但常常面临过度遗忘和模型效用差等关键问题。为了解决这些问题，本文创新地将LLM遗忘的优化目标定义为归零数据归因。具体而言，我们提出了第一个基于数据归因奖励的LLM遗忘框架，称为DareU，该框架通过强化学习来更新LLM，通过降低其生成响应与遗忘数据所有者的归因分数（即去归因）来实现遗忘。使用LLM分类器作为归因的有效近似进行的实证评估表明，DareU在实现有效遗忘的同时，很好地平衡了遗忘质量和模型效用，优于现有基线。

英文摘要

The rapid development of large language models (LLMs) has raised concerns on the use of inappropriate data for training, which has led to a growing interest in LLM unlearning. Many existing LLM unlearning approaches rely on optimizing prediction loss(es), such as maximizing the loss on the forget set, but often face critical issues like over-forgetting and poor model utility. To address them, this paper novelly frames the optimization objective for LLM unlearning as one of zeroing out data attribution instead. In particular, we propose the first LLM unlearning framework based on data attribution rewards called DareU that performs reinforcement learning to update the LLM by reducing the attribution score of its generated responses (i.e., de-attributing) to the forget data owners. Empirical evaluation using an LLM classifier as an efficient approximation of attribution shows that DareU outperforms existing baselines by achieving effective unlearning while balancing forget quality and model utility well.

URL PDF HTML ☆

赞 0 踩 0

2605.30916 2026-06-01 cs.LG cs.GT econ.TH 版本更新

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

福利、可改进性与方差：最优基准测试项聚合的主-代理方法

Andreas Haupt, Justin Hartenstein, Anka Reuel, Mykel Kochenderfer, Sanmi Koyejo

发表机构 * Department of Economics & Computer Science（经济与计算机科学系）； Institute for Computational and Mathematical Engineering（计算与数学工程研究所）； Department of Computer Science（计算机科学系）； Department of Aeronautics & Astronautics（航空与航天系）

AI总结提出将基准测试建模为多任务主-代理博弈，通过福利、可改进性和方差三个维度评估项目，并应用于OLMES数据集识别帕累托劣势项目。

详情

AI中文摘要

AI基准测试存在记录完善的局限性，先前研究探讨了污染、饱和以及构造不明确等问题。聚合受到的关注要少得多：基准测试通常通过统一平均项目级分数来总结，隐含地将每个测试项目视为同等重要。我们将基准测试建模为多任务主-代理博弈，并表明基准测试的福利损失由三个项目级原始要素共同决定：与规范性福利优先级的一致性、边际可改进性和性能方差。我们将该理论转化为一个审计框架，沿这三个轴对项目进行排序，并使用WORKBank（福利）、EvoLM 4B套件（可改进性）和PolyPythias 410M面板（方差）将其应用于OLMES项目。该框架揭示了在OLMES中，在亲工人福利操作化下帕累托劣势的项目。所有代码可在 https://github.com/stair-lab/principal-agent-benchmarks 获取。

英文摘要

AI benchmarks have well-documented limitations, with prior work examining contamination, saturation, and construct underspecification. Aggregation has received far less attention: benchmarks are typically summarized by uniformly averaging item-level scores, implicitly treating every test item as equally valuable. We model benchmarking as a multitask principal-agent game and show that the welfare loss from a benchmark is determined jointly by three item-level primitives: alignment with normative welfare priorities, marginal improvability, and performance variance. We translate the theory into an audit framework that ranks items along each of these three axes, and apply it to OLMES items using WORKBank for welfare, the EvoLM 4B suite for improvability, and the PolyPythias 410M panel for variance. The framework surfaces items that are Pareto-inferior within OLMES subject to a pro-worker welfare operationalization. All code is available at https://github.com/stair-lab/principal-agent-benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.30914 2026-06-01 cs.LG cs.SE 版本更新

Automating Formal Verification with Reinforcement Learning and Recursive Inference

用强化学习和递归推理自动化形式验证

Max Tan

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）； Massachusetts Institute of Technology（麻省理工学院）

AI总结研究通过可验证奖励的强化学习和验证器引导的推理搜索，提升大语言模型生成验证程序和证明的能力，在Dafny和Lean上取得显著进展。

Comments Master's thesis, 140 pages, 16 figures, 17 tables

详情

AI中文摘要

自动化形式验证对大语言模型仍然具有挑战性，因为证明助手和验证感知语言的数据稀缺，且正确性取决于满足精确的机器可检查规范，而非生成合理的代码。本文研究验证器环境如何通过可验证奖励的强化学习（RLVR）和验证器引导的推理时搜索，改进大语言模型生成验证程序和证明的能力。首先，我们使用组相对策略优化（GRPO）及相关变体，在Dafny中训练开源模型，将生成的候选程序组装成完整程序，并根据编译器和验证器的结果进行评分。在APPS衍生的Dafny数据集上的初步实验将验证奖励从2.2%提升至58.1%，但发现了规范破解问题，即模型利用弱形式规范而非实现预期解决方案。在过滤掉欠规范和易受攻击的任务后，多轮RLVR在改进的基准上将验证通过率从9.7%提升至31.1%。其次，我们在Lean中开发了一个验证器引导的推理框架，将证明生成视为对分解子目标、验证器反馈、诊断和修复的结构化搜索。使用固定的基础模型，包含证明修订器的完整框架在初始VeriCoding试点集上将通过率从直接修复的46.2%提升至69.2%。在更大的VERINA数据集上，整体任务分解加上证明修订器解决了42个先前未解决任务中的7个。我们还引入了Dalek-Bench，一个从Rust $ exttt{curve25519-dalek}$验证项目派生的仓库级Lean基准；初步结果仍然较弱，表明仍需更强的进度评估和特定任务的工具使用策略。

英文摘要

Automated formal verification remains challenging for large language models because data for proof assistants and verification-aware languages is scarce, and correctness depends on satisfying precise machine-checkable specifications rather than producing plausible code. This thesis studies how verifier environments can improve LLM generation of verified programs and proofs through reinforcement learning from verifiable rewards (RLVR) and verifier-guided inference-time search. First, we train open-source models in Dafny with RLVR using Group Relative Policy Optimization (GRPO) and related variants, assembling generated candidates into complete programs and scoring them with compiler and verifier outcomes. Initial experiments on an APPS-derived Dafny dataset increased verified reward from 2.2% to 58.1%, but revealed specification hacking, where models exploit weak formal specifications instead of implementing the intended solutions. After filtering underspecified and vulnerable tasks, multi-turn RLVR on the refined benchmark improves the verified pass rate from 9.7% to 31.1%. Second, we develop a verifier-guided inference scaffold in Lean that treats proof generation as structured search over decomposed subgoals, verifier feedback, diagnostics, and repair. With a fixed base model, the full scaffold with proof reviser improves pass rate on an initial VeriCoding pilot set from 46.2% under direct repair to 69.2%. On the larger VERINA dataset, whole-task decomposition plus proof reviser solves 7 of 42 previously unsolved tasks. We also introduce Dalek-Bench, a repository-scale Lean benchmark derived from the Rust $\texttt{curve25519-dalek}$ verification project; preliminary results remain weak, indicating that stronger progress evaluation and task-specific tool-use policies are still needed.

URL PDF HTML ☆

赞 0 踩 0

2605.30910 2026-06-01 cs.LG 版本更新

PINNs Failure Modes are Overfitting

PINNs 的失败模式是过拟合

Nigel T. Andersen, Takashi Matsubara

发表机构 * Graduate School of Information Science and Technology（信息科学与技术研究生学校）； RIKEN Center for Advanced Intelligence Project (AIP)（RIKEN高级智能项目中心（AIP））

AI总结本文通过可视化残差证明物理信息神经网络的失败模式源于过拟合，并提出基于正则化和双反向传播的方法来消除失败模式，在标准方程上以更少的配置点实现最先进性能。

详情

AI中文摘要

物理信息神经网络（PINNs）是一类常见的基于机器学习的偏微分方程（PDE）求解器，它们通过最小化编码 PDE 的残差损失来训练网络以表示解。尽管取得了成功，但已知它们在某些简单方程上会失败，收敛到不正确的解，尽管损失很低。这些失败模式在过去几年中引起了文献中的广泛关注，激发了基于架构和优化的解决方案。通过直接可视化残差，我们表明失败模式是过拟合的结果：损失在配置点上被最小化，但在其他地方则不然。应用正则化会使失败模式消失。最后，我们将双反向传播扩展到整个残差集，并使用它在四个标准失败模式方程上实现了最先进的性能，配置点数量减少多达 $23\times$，且使用普通架构。

英文摘要

Physics-Informed Neural Networks (PINNs) are a common class of machine learning-based partial differential equation (PDE) solvers which train a network to represent a solution by minimizing a residual loss that encodes the PDE. Despite their successes, they are known to fail on certain simple equations, converging to an incorrect solution despite low loss. These failure modes have garnered significant attention in the literature over the past several years, motivating both architectural and optimization based solutions. By directly visualizing the residual, we show that failure modes are the result of overfitting: the loss is minimized on the collocation points, but not elsewhere. Applying regularization causes the failure modes to vanish. Finally, we extend double backpropagation over the full set of residuals, and use it to achieve state-of-the-art performance on four standard failure mode equations with up to $23\times$ fewer collocation points and a vanilla architecture.

URL PDF HTML ☆

赞 0 踩 0

2605.30907 2026-06-01 cs.SE cs.AI cs.CL cs.LG 版本更新

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

BlueFin: 在金融电子表格上对LLM智能体进行基准测试

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta, Case Winter, George Fang, John Ling, Emma Strubell, Zach Kirshner

发表机构 * Longitude Labs Inc.（Longitude Labs公司）； Cornell University（康奈尔大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出BlueFin基准，通过131个真实金融电子表格任务评估LLM智能体的合成、操作和理解能力，并验证了LM评判与人类专家的一致性。

Comments 26 pages

详情

AI中文摘要

我们提出BlueFin，一个基准测试，要求大语言模型（LLM）智能体在专业金融领域的电子表格工作簿上执行合成、操作和理解任务。尽管全球电子表格软件付费用户估计数亿——比全球专业开发人员估计数量高一个数量级——但投入探索和扩展LLM在电子表格领域能力的资源相对较少，而专门用于反映专业金融角色实际职业任务的资源更少。为此，我们整理了131个具有现实相关性的挑战性复杂任务，包含3225个细粒度评分标准；值得注意的是，我们的评分标准和LM评判评估由一组专家人工标注员验证，从而对难以通过编程验证但可由LM评判智能体可靠评估的复杂任务进行高质量、细粒度的评估。我们的评判与专家共识达到一致（α=0.826），宏F1得分为0.839。前沿LLM在此挑战性基准上表现不佳，最强LLM在任务上的平均得分低于50%——模型在动态正确性方面表现出特别弱点。我们的贡献包括：涵盖三类电子表格任务的示例数据集、开源工具包和智能体评估框架，以及现有前沿模型在我们基准上的性能表征。

英文摘要

We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipulation, and comprehension tasks over spreadsheet workbooks in the professional finance domain. Though estimates of the global population of paying users of spreadsheet software range in the hundreds of millions -- an order of magnitude more than the estimated global population of professional developers -- comparatively fewer resources have been devoted to exploring and expanding LLM capabilities in the spreadsheet domain, with fewer still dedicated to mirroring real occupational tasks encountered by those in professional finance roles. In response, we curate a set of 131 challenging, complex tasks with real-world relevance in the domain, containing 3,225 granular rubric criteria; notably, our rubric criteria and LM judge evaluations are validated by a team of expert human annotators, resulting in high-quality, granular evaluations of complex tasks that are difficult to verify programmatically but can be reliably evaluated by an LM judge agent. Our judge achieves parity with expert consensus ($α=0.826$) with a macro-F1 score of 0.839. Frontier LLMs demonstrate poor performance on the challenging benchmark, with the strongest LLMs achieving less than 50\% average scores across tasks -- models exhibit particular weaknesses in dynamic correctness. Our contributions include a dataset of examples across three categories of spreadsheet tasks, an open source harness and agentic evaluation framework, and a characterization of existing frontier models' performance on our benchmark.

URL PDF HTML ☆

赞 0 踩 0

2605.30905 2026-06-01 math.OC cs.LG 版本更新

A Unifying View of Anchoring via Operator-Side Tikhonov Regularization

通过算子侧Tikhonov正则化实现锚定的统一视角

Zihao Chen

发表机构 * UC Berkeley（加州大学伯克利分校）

AI总结本文提出锚定固定点和单调方程方法可通过在基础方法查询的算子上添加消失的Tikhonov正则项来统一构造，并分析了四种变体的残差收敛率。

详情

AI中文摘要

锚定不动点和单调方程方法，包括Halpern迭代、额外锚定梯度及其相关方法，通过向参考点添加消失的拉力来获得最后迭代保证。现有的锚定变体通常能获得尖锐的最后迭代保证，但从更新层面来看，锚点的放置可能是算法特定的且概念上不透明。我们表明锚定允许一个单一的算子侧构造：用消失的Tikhonov项正则化基础方法查询的算子，然后运行未修改的基础方法。应用于Picard迭代，该配方重现了Halpern迭代；应用于前向步、外梯度（EG）和过去外梯度（PEG，也称为Popov方法），它产生了三种变体，其锚点放置继承了基础方法的查询模式。前向步实例化给出了一个新的残差收敛保证，而EG和PEG实例化给出了新的正则化变体。四种分析共享一个残差递推关系，恢复了Halpern残差范数的$O(1/k)$收敛速率，为正则化前向步给出了$O(1/\sqrt{k})$，并在无约束单调Lipschitz设置下为正则化EG和PEG变体给出了$O(1/k)$。

英文摘要

Anchored fixed point and monotone equation methods, including Halpern iteration, extra anchored gradient, and their relatives, add a vanishing pull toward a reference point to obtain last-iterate guarantees. Existing anchored variants often achieve sharp last-iterate guarantees, but from the update-level perspective the placement of the anchor can be algorithm-specific and conceptually opaque. We show that anchoring admits a single operator-side construction: regularize the operator queried by the base method with a vanishing Tikhonov term, then run the unmodified base method. Applied to the Picard iteration, this recipe reproduces the Halpern iteration; applied to the forward step, extragradient (EG), and past extragradient (PEG, also known as Popov's method), it yields three variants whose anchor placements inherit the base method's query pattern. The forward-step instantiation gives a new residual convergence guarantee, while the EG and PEG instantiations give new regularized variants. The four analyses share a residual recurrence, recovering the $O(1/k)$ Halpern residual-norm convergence rate, giving $O(1/\sqrt{k})$ for the regularized forward step, and giving $O(1/k)$ for the regularized EG and PEG variants in the unconstrained monotone Lipschitz setting.

URL PDF HTML ☆

赞 0 踩 0

2605.30903 2026-06-01 cs.LG cs.AI 版本更新

Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach

无最优演示者的逆强化学习：一种可行奖励集方法

Kihyun Kim, Shripad Deshmukh, Nikos Vlassis, Jiawei Zhang

发表机构 * MIT LIDS（麻省理工学院媒体实验室）； University of Massachusetts, Amherst（马萨诸塞大学阿姆赫斯特分校）； Adobe Research（Adobe研究院）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结针对多个非最优演示者数据，提出可行奖励集框架，通过线性约束联合可行集单调收缩，并给出恢复保证与高维环境离线算法。

详情

AI中文摘要

逆强化学习（IRL）通常假设来自单个最优演示者的演示，但在许多应用中，数据来自多个具有异质次优性水平的非完美演示者。我们通过可行奖励集框架研究这一设置下的奖励学习：对于每个演示者，我们将其声明的次优性水平编码为线性约束，并在演示者之间对所得可行集取交集。我们的理论分析表明，随着数据的增加，联合可行集单调收缩，并且我们精确刻画了新演示者何时严格收紧该集合。我们进一步为真实最优演示者的可行奖励集建立了两个恢复保证：一个界限依赖于与最优占用度的接近程度，而另一个仅需要足够的覆盖且没有接近最优的演示者。在实际方面，我们引入了解决所得奖励集中固有奖励模糊性的策略，并提供了适用于高维环境的函数逼近离线算法。在表格型网格世界和大语言模型（LLM）微调设置中的实验与理论预测一致，并证明了所提框架相对于基线的有效性。

英文摘要

Inverse reinforcement learning (IRL) typically assumes demonstrations from a single optimal demonstrator, but in many applications data come from multiple imperfect demonstrators with heterogeneous suboptimality levels. We study reward learning in this setting through a feasible-reward-set framework: for each demonstrator, we encode its declared suboptimality level as a linear constraint and intersect the resulting feasible sets across demonstrators. Our theoretical analysis shows that the joint feasible set shrinks monotonically as data are added, and we give an exact characterization of when a new demonstrator strictly tightens it. We further establish two recovery guarantees for the feasible reward set of the ground-truth optimal demonstrator: one bound depends on closeness to the optimal occupancy, while the other requires only sufficient coverage and no near-optimal demonstrator. On the practical side, we introduce strategies to address the inherent reward ambiguity in the obtained reward set and provide an offline algorithm with function approximation for high-dimensional environments. Experiments in tabular grid-world and large language model (LLM) fine-tuning settings are consistent with the theoretical predictions and demonstrate the effectiveness of the proposed framework over baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.30901 2026-06-01 cs.LG 版本更新

Density-Guided Robust Counterfactual Explanations on Tabular Data under Model Multiplicity

模型多重性下表格数据的密度引导鲁棒反事实解释

Jun Tan, Qing Guo, Zicheng Xu, Jinglin Li, Qi Fang, Ning Gui

发表机构 * School of Computer Science and Engineering, Central South University, Changsha, China（计算机科学与工程学院，中南大学，长沙，中国）

AI总结提出DensityFlow生成框架，利用神经ODE和密度评分构建鲁棒反事实解释，避免低密度区域，并在模型多重性下保持有效性。

Comments 26 pages, 11 figures, accepted by ICML 2026

详情

AI中文摘要

反事实解释（CEs）对于可操作的补救措施至关重要，但其可靠性在低密度区域常常受到损害，因为分类器在这些区域表现出高方差。与依赖昂贵的集成交集来定义稳定性的现有方法不同，我们提出了 extit{DensityFlow}，一种生成框架，通过遵循高置信度数据流形来构建鲁棒的反事实解释。具体来说，我们将反事实生成建模为由神经ODE参数化的连续时间动力学，并由可微密度评分引导，以主动避免不确定的低密度区域。该密度评分通过噪声对比估计学习，有效利用$(K{+}1)$路判别器来估计密度比。对于黑盒设置，我们引入了一种局部代理蒸馏机制，该机制在CE生成的轨迹内严格地将轻量级代理与目标模型对齐，从而实现高效的基于梯度的优化，且查询次数最少。实验表明，与基于集成的基线相比， extit{DensityFlow}在模型多重性下实现了优越的有效性，同时显著降低了查询成本。我们的实现可在https://github.com/G-AILab/DensityFlow获取。

英文摘要

Counterfactual explanations (CEs) are essential for actionable recourse, yet their reliability is often compromised in low-density regions, where classifiers exhibit high variance. Unlike existing methods that rely on expensive ensemble intersections to define stability, we propose \textit{DensityFlow}, a generative framework that constructs robust CEs by adhering to the high-confidence data manifold. Specifically, we model the counterfactual generation as continuous-time dynamics parameterized by Neural ODE, guided by a differentiable density score to actively avoid uncertain, low-density areas. This density score is learned via Noise Contrastive Estimation, effectively leveraging a $(K{+}1)$-way discriminator to estimate density ratios. For black-box settings, we introduce a local proxy distillation mechanism that aligns a lightweight surrogate with the target model strictly within the trajectory of CE generation, enabling efficient gradient-based optimization with minimal queries. Experiments demonstrate that \textit{DensityFlow} achieves superior validity under model multiplicity while significantly reducing query costs compared to ensemble-based baselines. Our implementation is available at https://github.com/G-AILab/DensityFlow.

URL PDF HTML ☆

赞 0 踩 0

2605.30896 2026-06-01 cs.LG 版本更新

Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments

零坍塌：策略梯度方法在不连续奖励环境中的一种失败模式

Nishant Kumar, Enrique Areyan Viqueira, Amy Greenwald

AI总结本文发现策略梯度方法在拍卖等不连续奖励环境中会出现“零坍塌”失败模式，即策略因梯度信号消失而陷入零奖励区域，并提出了缓解策略。

Comments 20 pages, 7 figures; includes Appendix

详情

AI中文摘要

重复拍卖中的竞价是强化学习（RL）的一个核心挑战，它结合了连续控制与数字广告的策略复杂性。尽管策略梯度和基于值的方法似乎适合这些设置，但它们常常难以应对拍卖奖励景观的不连续、“悬崖状”特性。例如，在首价拍卖中，竞拍者在达到特定阈值之前获得零奖励，之后奖励随出价增加而减少。这形成了由尖锐边界分隔的平坦零奖励区域。我们识别出这种设置中一个基本的失败模式，称为“零坍塌”。我们表明，随机探索和基于梯度的更新可能导致策略越过最优高奖励区域，进入平坦的零奖励区域。一旦进入，由于缺乏信息性的梯度信号，恢复变得极其样本低效，有效地困住了智能体。我们发现演员-评论家方法特别容易受到影响，因为偏差的值估计会加速向不稳定区域的移动。我们的贡献包括：（1）对不连续奖励如何导致信号消失和零坍塌的机制解释；（2）对策略随机性和步长之间相互作用的分析；（3）在REINFORCE和演员-评论家变体上对该现象的经验演示。我们提出了涉及初始化和架构选择的实用缓解策略以提高稳定性。最后，我们引入了一个正式的拍卖环境RL框架，突出了其独特的结构特性。

英文摘要

Bidding in repeated auctions is a central challenge for reinforcement learning (RL), combining continuous control with the strategic complexities of digital advertising. While policy gradient and value-based methods seem well-suited for these settings, they often struggle with the discontinuous, "cliff-like" nature of auction reward landscapes. In a first-price auction, for example, a bidder receives zero reward until they cross a specific threshold, after which the reward decreases as the bid increases. This creates a landscape of flat, zero-reward regions separated by sharp boundaries. We identify a fundamental failure mode in this setting termed "zero collapse." We show that stochastic exploration and gradient-based updates can cause policies to overshoot optimal high-reward regions and enter flat, zero-reward regimes. Once there, the lack of an informative gradient signal makes recovery extremely sample-inefficient, effectively trapping the agent. We find that actor-critic methods are particularly susceptible, as biased value estimates can accelerate this movement toward unstable regions. Our contributions include: (1) a mechanistic explanation of how discontinuous rewards lead to vanishing signals and zero collapse; (2) an analysis of the interaction between policy stochasticity and step size; and (3) an empirical demonstration of this phenomenon across REINFORCE and actor-critic variants. We propose practical mitigation strategies involving initialization and architectural choices to improve stability. Finally, we introduce a formal RL framework for auction environments highlighting their unique structural properties.

URL PDF HTML ☆

赞 0 踩 0

2605.30892 2026-06-01 cs.LG 版本更新

Bandwidth Allocation with Device Partitioning for Federated Learning over Industrial IoT networks

面向工业物联网联邦学习的设备分区带宽分配

Kangmin Kim, Jaeyoung Song

发表机构 * School of Electrical and Electronics Engineering, Pusan National University（釜山国立大学电气与电子工程学院）

AI总结针对联邦学习在工业物联网中的通信瓶颈，提出一种基于设备计算能力分区的带宽分配策略，通过顺序分配全带宽给子集来最小化训练时间，并理论证明其优于无分区方案，同时降低上行能耗。

详情

AI中文摘要

我们考虑一个联邦学习（FL）系统，其中工业物联网（IIoT）设备通过无线信道协作训练全局模型，而不共享本地数据。在此类系统中，通信时间是制约整体训练效率的主要瓶颈。与优先考虑个体服务质量需求的传统网络不同，FL系统旨在尽可能高效地收敛到最优全局模型，这需要一种根本不同的带宽分配方法。本文提出一种新颖的带宽分配策略，利用设备计算能力的异构性来最小化总训练时间。该策略并非同时将所有选定设备的带宽分配出去，而是将参与设备划分为有序子集，并依次授予每个子集全带宽的独占访问权。我们正式证明，无论底层调度算法如何，这种基于分区的策略都能实现比任何无分区带宽分配方案更低的训练时间。此外，通过减少每台设备的传输持续时间，该策略还最小化了上行能耗，这对电池受限的IIoT设备尤其有利。在真实数据集（包括工业表面缺陷基准GC10-Det和标准图像分类基准CIFAR-10）上的大量实验表明，与现有带宽分配方案相比，所提策略持续降低了训练时间和能耗，接近轮次时间的理论下界。

英文摘要

We consider a federated learning (FL) system in which Industrial Internet-of-Things (IIoT) devices collaboratively train a global model over wireless channels without sharing local data. In such systems, communication time is a primary bottleneck that constrains overall training efficiency. Unlike conventional networks that prioritize individual quality-of-service requirements, FL systems collectively aim to converge to an optimal global model as efficiently as possible, which calls for a fundamentally different approach to bandwidth allocation. In this paper, we propose a novel bandwidth allocation policy that exploits the heterogeneity of device computing capabilities to minimize total training time. Rather than distributing bandwidth among all selected devices simultaneously, the proposed policy partitions the participating devices into ordered subsets and sequentially grants each subset exclusive access to the full bandwidth. We formally prove that this partitioning-based policy achieves a strictly lower training time than any bandwidth allocation scheme without partitioning, irrespective of the underlying scheduling algorithm. Furthermore, by reducing per-device transmission duration, the proposed policy also minimizes uplink energy consumption, which is particularly beneficial for battery-constrained IIoT devices. Extensive experiments on real-world datasets - including GC10-Det, an industrial surface defect benchmark, and CIFAR-10, a standard image classification benchmark - demonstrate that the proposed policy consistently reduces training time and energy consumption compared to existing bandwidth allocation schemes, approaching the theoretical lower bound on round time.

URL PDF HTML ☆

赞 0 踩 0

2605.30889 2026-06-01 physics.chem-ph cs.LG 版本更新

GlucoFM: 一种用于连续血糖监测的双流基础模型

Zechen Li, Keerthana Natarajan, Weizhi Zhang, Menglian Zhou, Simon A. Lee, Yuwei Zhang, Maxwell A. Xu, Zeinab Esmaeilpour, Flora D. Salim, Mark Malhotra, Lindsey Sunden, Shwetak Patel, Yuzhe Yang, Ahmed A. Metwally

发表机构 * Google Research（谷歌研究）； University of New South Wales（新南威尔士大学）

AI总结提出GlucoFM，一种轻量级CGM基础模型，通过将血糖动态分解为慢生理状态和瞬态事件流，在7个临床预测任务上平均PR-AUC比最佳CGM专用模型提高4.1点。

详情

AI中文摘要

连续血糖监测（CGM）提供了日常代谢生理的密集视图，然而现有的通用时间序列和CGM专用基础模型通常将血糖轨迹编码为纠缠的单流序列，使得血糖动态的独特时间结构仅被隐式建模。我们提出GlucoFM，一种轻量级CGM基础模型，它将不规则记录对齐到24小时时间网格，保留观测掩码，并将血糖动态分解为慢生理状态和瞬态事件流，捕捉低频血糖基线和可能反映急性生理反应或传感器伪影的短期偏差。GlucoFM在来自477名受试者的109,066小时未标记CGM记录上进行了预训练，具有两个互补目标：融合每日表示上的掩码上下文潜在预测以及状态和事件流上的时间动态预测。在四个不同队列和七个临床预测任务中，GlucoFM在评估基线中实现了最强的受试者分离线性探测性能，比最佳CGM专用基础模型平均PR-AUC提高4.1点。其收益在核心代谢结果上最为显著，在所有糖尿病风险和β细胞功能障碍任务以及4个胰岛素抵抗任务中的3个上领先PR-AUC。GlucoFM还在评估方法中实现了最佳的整体跨数据集迁移性能和强大的少样本适应能力，并且在聚合多天进行受试者级别预测时获得一致收益，突出了生理感知分解作为可迁移CGM表示学习的有效归纳偏置。

英文摘要

Continuous glucose monitoring (CGM) provides a dense view of daily metabolic physiology, yet existing generic time-series and CGM-specific foundation models often encode glucose traces as entangled single-stream sequences, leaving the distinct temporal structure of glycemic dynamics only implicitly modeled. We present GlucoFM, a lightweight CGM foundation model that aligns irregular recordings to a 24-hour chronological grid, preserves observation masks, and decomposes glucose dynamics into slow physiological state and transient event streams, capturing low-frequency glycemic baselines and short-term deviations that may reflect acute physiological responses or sensor artifacts. GlucoFM is pretrained on 109,066 hours of unlabeled CGM recordings from 477 subjects with two complementary objectives: masked contextual latent prediction over fused daily representations and temporal dynamics prediction over state and event streams. Across four diverse cohorts and seven clinical prediction tasks, GlucoFM achieves the strongest subject-disjoint linear-probing performance among evaluated baselines, improving average PR-AUC by 4.1 points over the best CGM-specific foundation model. Its gains are most pronounced on core metabolic outcomes, leading PR-AUC on all diabetes-risk and $β$-cell dysfunction tasks and on 3 of 4 insulin-resistance tasks. GlucoFM also achieves the best overall cross-dataset transfer performance and strong few-shot adaptation among evaluated methods, and consistent gains when aggregating multiple days for subject-level prediction, highlighting physiology-aware decomposition as an effective inductive bias for transferable CGM representation learning.

URL PDF HTML ☆

赞 0 踩 0

2605.30860 2026-06-01 math.ST cs.LG math.PR stat.TH 版本更新

Bayesian Inference with Shaped Deep Non-linear MLPs

具有形状深度非线性MLP的贝叶斯推断

Boris Hanin, Tianze Jiang

发表机构 * Princeton University（普林斯顿大学）

AI总结本文通过神经协方差SDE分析深度非线性MLP在训练样本数、输入维数、隐藏层宽度和层数均较大时的贝叶斯推断，发现LP/N的一阶准则决定深度对模型证据的益处，并推导出贝叶斯预测后验等价于数据相关核方法。

Comments 35 Pages

详情

AI中文摘要

深度学习理论的一个核心目标是刻画神经网络在模型规模和训练集规模同时较大时的预测行为。由于模型参数数量和数据集大小发散极限不可交换，先验上并不清楚存在哪些极限。在这项工作中，我们通过研究深度非线性MLP在训练样本数（$P$）、输入维数（$N_0$）、隐藏层宽度（$N$）和隐藏层数（$L$）均可大时的贝叶斯推断，为这些问题提供了新的见解。我们基于神经协方差SDE（Li等人，2022）分析$LP/N\in\Theta(1)$（扮演有效网络深度角色）区域的预测后验。我们的框架涵盖光滑和ReLU激活函数，并适用于任意温度。我们发现，在$LP/N$的一阶近似下，存在一个简单准则，用于判断哪些数据生成过程能从深度中获益，即更大的$LP/N$会增加贝叶斯模型证据。我们还对物理学文献中的一个先前结果给出了新的推导：至少在$LP/N$的一阶近似下，贝叶斯预测后验极其简单，等价于一个数据相关的核方法。

英文摘要

A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.

URL PDF HTML ☆

赞 0 踩 0

2605.30859 2026-06-01 cs.LG cs.AI 版本更新

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

DARTS: 分布感知的主动展开轨迹塑造以加速LLM强化学习

Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu, Xupeng Miao, Fangcheng Fu, Bin Cui

发表机构 * School of Computer Science \& Beijing Key Laboratory of Software ； Hardware Cooperative Artificial Intelligence Systems, Peking University, Beijing, China ； School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China ； Institute of Computational Social Science, Peking University (Qingdao), Qingdao, China

AI总结针对强化学习中长尾响应分布导致的效率瓶颈，提出分布感知的主动轨迹塑造方法，通过细粒度识别提示内长尾并削减无效冗余，实现高达1.77倍的加速而不损失模型性能。

Comments 16 pages, 14 figures, 5 tables. Accepted to ICML 2026

详情

AI中文摘要

强化学习已成为提升模型能力的关键技术，但由于响应长度的长尾分布，其展开效率受到瓶颈制约。现有工作通过提示级尾部调度缓解长尾影响，但我们关注低效率的根本来源：分布本身。具体而言，我们以更细粒度刻画长尾分布，识别提示内长尾，并揭示它们通常包含无效冗余。为解决此问题，我们提出一种主动分布塑造的新范式，将展开分布向简洁性和确定性方向塑造，从而从根本上解决尾部带来的开销。我们通过一种分布感知的轨迹采样机制实现这一点，该机制为每个提示从冗余探索空间中选择轨迹，并采用自适应冗余分配方案以最大化塑造效果和系统效率。实验表明，与最先进系统相比，在不影响模型性能的情况下，实现了高达1.77倍的显著加速。

英文摘要

Reinforcement Learning (RL) has become pivotal for improving model capabilities yet suffers from rollout efficiency bottlenecks due to the long-tail response length distribution. While existing works mitigate the impact of long tails via prompt-level tail scheduling, we focus on the root source of inefficiency: the distribution itself. Specifically, we characterize the long-tail distribution at a finer granularity, identifying intra-prompt long tails, and revealing that they frequently consist of ineffective verbosity. To address this, we propose a novel paradigm of active distribution shaping to shape the rollout distribution towards conciseness and certainty, thereby fundamentally resolving tail-induced overheads. We achieve this through a distribution-aware trajectory sampling mechanism, which selects trajectories from a redundant exploration space for each prompt, and an adaptive redundancy allocation scheme to maximize both shaping effectiveness and system efficiency. Experiments demonstrate significant acceleration over state-of-the-art systems by up to 1.77x without compromising model performance.

URL PDF HTML ☆

赞 0 踩 0

2605.30858 2026-06-01 cs.LG 版本更新

ForecastCompass: Guiding Agentic Forecasting with Adaptive Factor Memory

ForecastCompass: 自适应因子记忆引导的智能预测

Yurui Chang, Yongkang Du, Yuanpu Cao, Jinghui Chen, Lu Lin

发表机构 * Pennsylvania State University（宾夕法尼亚州立大学）

AI总结提出ForecastCompass框架，通过分层预测任务分类和双组件记忆（因子记忆与推理记忆），结合回顾分析迭代修正，提升智能体在动态环境中的概率预测准确性和校准性。

详情

AI中文摘要

智能预测对于动态环境中的决策至关重要，但由于智能体必须从不完整、时间有限的证据中进行推理，并在结果确定之前产生校准的概率，因此仍然具有挑战性。记忆提供了一种自然机制，将经验从已解决的预测转移到未来的预测任务。然而，现有的智能体记忆方法并非为预测量身定制，因为它们通常存储过去的交互、反思或事实关联，而没有明确表示可重用的预测因子或校准知识。我们提出了ForecastCompass (FoCo)，一种用于智能预测的自适应因子记忆框架。FoCo通过分层预测任务分类来组织预测经验，从而能够检索与任务相关的预测知识。它维护两个互补的记忆组件：因子记忆（捕获可重用的预测维度）和推理记忆（编码概率更新、不确定性处理和校准原则）。利用回顾分析作为学习信号，FoCo通过口头记忆修正程序迭代修正记忆，使智能体能够随时间积累可迁移的预测知识。在Prophet Arena和FutureX上使用GPT-5-mini和Gemini-2.5-Flash进行的实验表明，FoCo提高了概率准确性和校准性。

英文摘要

Agentic forecasting is important for decision-making in dynamic environments, but it remains challenging because agents must reason from incomplete, time-limited evidence and produce calibrated probabilities before outcomes are resolved. Memory provides a natural mechanism for transferring experience from resolved forecasts to future prediction tasks. However, existing agent-memory methods are not tailored to forecasting, as they typically store past interactions, reflections, or factual associations without explicitly representing reusable predictive factors or calibration knowledge. We propose ForecastCompass (FoCo), an adaptive factor-based memory framework for agentic forecasting. FoCo organizes forecasting experience with a hierarchical forecasting-task taxonomy, enabling retrieval task-relevant forecasting knowledge. It maintains two complementary memory components: factor memory, which captures reusable predictive dimensions, and reasoning memory, which encodes probability updating, uncertainty handling, and calibration principles. Using retrospective analyses as learning signals, FoCo iteratively revises memory through a verbalized memory-revision procedure, enabling the agent to accumulate transferable forecasting knowledge over time. Experiments on Prophet Arena and FutureX with GPT-5-mini and Gemini-2.5-Flash show that FoCo improves both probabilistic accuracy and calibration.

URL PDF HTML ☆

赞 0 踩 0

2605.30843 2026-06-01 cs.LG econ.EM 版本更新

A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models

离线强化学习与逆强化学习讲义，第二部分：逆强化学习与动态离散选择模型的基础

Enoch Hyunwook Kang

发表机构 * University of Washington, Foster School of Business（华盛顿大学，福斯特商学院）

AI总结本文证明了逆强化学习（IRL）与动态离散选择（DDC）模型的等价性，回顾了经典识别结果和计算范式，并介绍了现代机器学习方法及其识别特性。

详情

AI中文摘要

在前向强化学习问题中，奖励是固定且已知的；学习者被要求找到一个好的策略或价值函数。这里我们反过来提问：给定由专家生成的离线数据，我们能否恢复专家所优化的奖励？这就是逆强化学习问题，值得注意的是，两个社区——研究动态离散选择（DDC）的结构计量经济学家和研究熵正则化IRL的机器学习者——一直在以不同的名称研究完全相同的概率模型。我们首先证明它们的等价性。然后，我们发展Magnac和Thesmar的经典识别结果以及由此产生的经典计算范式：Rust的嵌套不动点算法、Hotz和Miller的条件选择概率方法，以及Adusumilli和Eckardt的两种时间差分方法：线性半梯度TD和近似价值迭代。每种方法都有其局限性：维度、转移核估计、致命三元组或投影不动点偏差。接着，我们回顾现代ML/IRL分支：对抗性IRL、占用匹配、IQ-Learn和离线ML-IRL，推导每种方法的实际目标，并精确说明它识别了什么和没有识别什么。最后，我们介绍Kang等人的经验风险最小化框架，该框架为离线IRL/DDC提供了基于梯度的估计器。

英文摘要

In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing? This is the inverse reinforcement learning problem, and remarkably, two communities, structural econometricians studying dynamic discrete choice (DDC) and machine learners studying entropy-regularized IRL, have been working on exactly the same probabilistic model under different names. We begin by proving their equivalence. We then develop the classical identification result of Magnac and Thesmar and the classical computational paradigms that grew out of it: Rust's nested fixed-point algorithm, the conditional-choice-probability approach of Hotz and Miller, and the two temporal-difference approaches of Adusumilli and Eckardt: linear semi-gradient TD and approximate value iteration. Each route has its limits: dimensionality, transition-kernel estimation, the deadly triad, or projected fixed-point bias. We then walk through the modern ML/IRL strand: adversarial IRL, occupancy matching, IQ-Learn, and offline ML-IRL, deriving each method's actual objective and stating precisely what it does and does not identify. We close with the empirical-risk-minimization framework of Kang et al., which yields a gradient-based estimator for offline IRL/DDC.

URL PDF HTML ☆

赞 0 踩 0

2605.30842 2026-06-01 cs.LG 版本更新

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: 基于解耦长上下文模型的上下文管理

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

发表机构 * Halıcıoğlu Data Science Institute, University of California, San Diego（哈里卡卢斯数据科学研究所，加州大学圣地亚哥分校）； Amazon（亚马逊）

AI总结提出CoMem框架，通过将记忆管理与智能体工作流解耦并采用k步偏移异步流水线，利用奖励驱动训练策略，在SWE-Bench-Verified上实现1.4倍延迟改进且保持大部分性能。

Comments Work in progress

详情

AI中文摘要

上下文管理使智能体模型能够通过对先前交互历史的迭代总结来解决长时任务。然而，这一过程通常会因额外的总结标记而产生大量解码开销，显著影响部署时的端到端响应延迟。在本文中，我们介绍CoMem，一种新颖的框架，它将记忆管理与主要智能体工作流解耦，使这些过程能够并行执行。我们提出了一种k步偏移异步流水线，将记忆模型的总结与智能体的推理重叠，有效掩盖了上下文处理的延迟。为了确保在这种异步设置下的鲁棒性，我们引入了一种奖励驱动的训练策略，使记忆模型对齐以捕获足够统计信息供智能体决策。理论分析证实，与耦合架构相比，CoMem提供了更优的效率-效果权衡。我们在SWE-Bench-Verified上的广泛实验结果表明，CoMem在保留大部分性能的同时，相比普通长上下文解决方案提供了1.4倍的延迟改进。此外，我们证明这些延迟增益随系统吞吐量增加而有利地扩展，为智能体推理和记忆压缩的独立优化提供了一条模块化路径。

英文摘要

Context management enables agentic models to solve long-horizon tasks through iterative summarization of previous interaction histories. However, this process typically incurs substantial decoding overhead for the extra summarization tokens, which significantly affect the end-to-end response latency at deployment. In this paper, we introduce CoMem, a novel framework that decouples memory management from the primary agent workflow, enabling these processes to execute in parallel. We propose a $k$-step-off asynchronous pipeline that overlaps the memory model's summarization with the agent's inference, effectively masking the latency of context processing. To ensure robustness under this asynchronous setting, we introduce a reward-driven training strategy that aligns the memory model to capture sufficient statistics for the agent's decision-making. Theoretical analysis confirms that CoMem offers a superior efficiency-effectiveness trade-off compared to coupled architectures. Our extensive experimental results on SWE-Bench-Verified show that CoMem provides 1.4x latency improvements upon vanilla long-context solutions while preserving most of the performance. Furthermore, we demonstrate that these latency gains scale favorably with increased system throughput, offering a modular path forward for the independent optimization of agent reasoning and memory compression.

URL PDF HTML ☆

赞 0 踩 0

2605.30831 2026-06-01 q-bio.QM cs.LG physics.chem-ph 版本更新

IRIS: 时间结构化流形投影

Brian Ondov, Chia-Hsuan Chang, Weipeng Zhou, Xingjian Zhang, Xueqing Peng, Yutong Xie, Huan He, Qiaozhu Mei, Hua Xu

发表机构 * Department of Biomedical Informatics and Data Science, Yale School of Medicine（耶鲁医学院生物医学信息学与数据科学部）； School of Information, University of Michigan（密歇根大学信息学院）

AI总结提出IRIS算法，通过结合时间顺序和流形拓扑结构，解决t-SNE和UMAP无法体现时间动态的问题，适用于scRNA-seq、比较宏基因组学等动态生物医学数据可视化。

2605.30808 2026-06-01 cs.CR cs.AI cs.LG 版本更新

Differentially Private Preference Data Synthesis for Large Language Model Alignment

面向大语言模型对齐的差分隐私偏好数据合成

Fengyu Gao, Jing Yang

发表机构 * Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA（弗吉尼亚大学计算机科学系）； Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, Virginia, USA（弗吉尼亚大学电气与计算机工程系）

AI总结提出DPPrefSyn算法，基于Bradley-Terry偏好模型和DP-PCA生成差分隐私合成偏好数据，实现隐私保护的偏好对齐。

Comments Accepted to ICML 2026

详情

AI中文摘要

偏好对齐是大语言模型（LLMs）的关键后训练步骤，以确保其输出与人类价值观一致。然而，在真实人类偏好数据上进行后训练会引发隐私问题，因为这些数据集通常包含敏感的用户提示和人类判断。为了解决这一问题，我们提出了DPPrefSyn，一种用于生成差分隐私（DP）合成偏好数据的新算法，以实现隐私保护的偏好对齐。DPPrefSyn是一个基于Bradley-Terry偏好模型和成对人类偏好数据内在几何结构的原理性框架。它首先从具有正式差分隐私保证的私有数据中学习一个潜在的偏好模型，然后利用学习到的模型结合公共提示合成高质量的偏好数据。它利用每个簇奖励模型的共享线性结构来有效捕捉私有数据中的异构人类偏好，并利用差分隐私主成分分析（DP-PCA）来提高学习准确性。大量实验结果表明，DPPrefSyn在强DP保证下实现了具有竞争力的对齐性能。这些发现突显了合成偏好数据作为隐私保护偏好对齐的实用替代方案在广泛应用中的潜力。据我们所知，这是首项为LLM对齐生成DP合成偏好数据的工作。我们的代码可在https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis获取。

英文摘要

Preference alignment is a crucial post-training step for large language models (LLMs) to ensure their outputs align with human values. However, post-training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose DPPrefSyn, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving preference alignment. DPPrefSyn is a principled framework grounded in the Bradley-Terry preference model and the intrinsic geometric structure of pairwise human preference data. It first learns an underlying preference model from private data with formal differential privacy guarantees, and then leverages the learned model together with public prompts to synthesize high-quality preference data. It exploits the shared linear structure of per-cluster reward models to effectively capture heterogeneous human preferences in private datasets, and leverages DP Principal Component Analysis (DP-PCA) to improve learning accuracy. Extensive experimental results demonstrate that DPPrefSyn achieves competitive alignment performance under strong DP guarantees. These findings highlight the potential of synthetic preference data as a practical alternative for privacy-preserving preference alignment across a broad range of applications. To the best of our knowledge, this is the first work to generate DP synthetic preference data for LLM alignment. Our code is available at https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis.

URL PDF HTML ☆

赞 0 踩 0

2605.30807 2026-06-01 cs.LG 版本更新

Conformal Reliability: A New Evaluation Metric for Conditional Generation

共形可靠性：条件生成的新评估指标

Yachen Gao, Xinwei Sun, Yikai Wang, Ye Shi, Jingya Wang, Jianfeng Feng, Yanwei Fu

发表机构 * Institute of Science and Technology for Brain-Inspired Intelligence（脑启发式智能科学与技术研究院）； Shanghai Innovation Institute（上海创新研究院）； School of Data Science（数据科学学院）； Nanyang Technological University（南洋理工大学）； School of Information Science and Technology（信息科学与技术学院）

AI总结提出基于共形预测的可靠性分数作为条件生成模型的新评估指标，并开发CReL框架高效计算该分数，实验证明其有效性和可解释性。

Comments Accepted at ICML 2026

详情

AI中文摘要

成对参考对齐作为模型级序数可观测量

Mujing Li

发表机构 * Independent Researcher（独立研究者）

AI总结本文定义成对参考对齐为模型评分函数诱导的序数可观测量，提出中心化序参数统计量和基于边界的扩展，并给出有限样本估计和浓度界，通过Qwen2.5和RewardBench实验验证。

详情

AI中文摘要

成对偏好数据广泛用于语言模型评估和对齐，通常用于模型排名、奖励建模或偏好优化。本文提出了一个更基础的测量问题：给定成对偏好的参考分布，当我们测试模型是否将首选响应排在拒绝响应之上时，估计的是哪个模型级量？我们将成对参考对齐定义为由模型评分函数诱导的序数可观测量。给定三元组$(x,y^+,y^-)$上的参考对分布$P_{\mathrm{pair}}$和标量模型分数$S_M(x,y)$，我们将对齐可观测量定义为模型诱导的排序与参考偏好排序一致的概率。我们进一步定义了一个中心化的序参数类统计量，并讨论了基于边界的扩展。所得量在独立抽样假设下具有简单的有限样本估计量和浓度界。本文没有引入新的基准。它为成对参考对齐提供了概念和统计公式，阐明了参考对分布的作用，并将一般的序数可观测量与评分选择（如归一化对数概率或基于能量的分数）区分开来。我们还在Qwen2.5模型和RewardBench上进行了初步实证研究，其中所提出的统计量随模型大小和指令调优而增加，并根据公式在参考对子集之间变化。

英文摘要

Pairwise preference data is widely used in language-model evaluation and alignment, often for model ranking, reward modeling, or preference optimization. This note formulates a more basic measurement question: given a reference distribution of pairwise preferences, what model-level quantity is estimated when we test whether a model ranks preferred responses above rejected responses? We define pairwise reference alignment as an ordinal observable induced by a model scoring function. Given a reference pair distribution $P_{\mathrm{pair}}$ over triples $(x,y^+,y^-)$, and a scalar model score $S_M(x,y)$, we define the alignment observable as the probability that the model-induced ordering agrees with the reference preference ordering. We further define a centered order-parameter-like statistic and discuss a margin-based extension. The resulting quantities admit simple finite-sample estimators and concentration bounds under independent sampling assumptions. This note does not introduce a new benchmark. It provides a conceptual and statistical formulation for pairwise reference alignment, clarifies the role of the reference pair distribution, and distinguishes the general ordinal observable from scoring choices such as normalized log-probability or energy-based scores. We also provide an initial empirical study on Qwen2.5 models and RewardBench, where the proposed statistics increase with model size and instruction tuning and vary across reference-pair subsets as predicted by the formulation.

URL PDF HTML ☆

赞 0 踩 0

2605.30757 2026-06-01 cs.LG 版本更新

Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation

思维链与压缩循环Transformer：记忆预算分离

Haozhou Zhang

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结本文通过比较三种记忆机制（压缩潜在循环、全序列状态循环和思维链暂存区），证明压缩循环Transformer的记忆预算限制其推理能力，而思维链通过扩展上下文实现更强的问题求解。

详情

AI中文摘要

思维链提示和循环Transformer都赋予固定模型更多的测试时计算，但它们在记忆内容上有所不同。思维链将中间状态存储在生成的标记中，这些标记保留在上下文中，而循环Transformer通过循环隐藏激活传递状态。我们认为这种持久可变记忆是测试时推理的核心资源。我们比较了三种记忆机制：压缩潜在循环、全序列状态循环和思维链暂存区。我们的主要结果表明，压缩循环受其循环状态大小的限制。运行更长时间的循环增加了计算量，但本身不会创建增长的暂存区，因此即使运行多个步骤，具有小循环状态的循环仍然是小空间推理器。在标准复杂性假设下，这样的循环无法解决在logspace归约下P-complete的问题，而多项式长度的思维链可以。这种分离是压缩循环特有的，因为全序列状态循环在每个输入位置携带状态，并处于更接近显式暂存区的记忆丰富状态。受控的指针追逐和关联回忆扫描说明了这种记忆预算观点，其性能对持久状态预算是否匹配任务的工作记忆需求敏感。

英文摘要

Chain-of-thought prompting and looped Transformers both give a fixed model more test-time computation, but they differ in what they remember. Chain-of-thought stores intermediate state in generated tokens that remain in the context, whereas a looped Transformer carries state through recurrent hidden activations. We argue that this persistent mutable memory is a central resource for test-time reasoning. We compare three memory regimes, the compressed latent loop, the full sequence-state loop, and the chain-of-thought scratchpad. Our main result shows that a compressed loop is limited by the size of its recurrent state. Running the loop longer adds computation but does not by itself create a growing scratchpad, so a loop with a small recurrent state remains a small-space reasoner even when run for many steps. Under a standard complexity assumption, such loops cannot decide problems that are P-complete under logspace reductions, whereas polynomial-length chain-of-thought can. The separation is specific to compressed loops, as full sequence-state loops carry state at every input position and live in a memory-rich regime closer to explicit scratchpads. Controlled pointer-chasing and associative-recall sweeps illustrate this memory-budget view, with performance sensitive to whether the persistent-state budget matches the task's working-memory demand.

URL PDF HTML ☆

赞 0 踩 0

2605.30749 2026-06-01 cs.LG cs.RO 版本更新

SemStruct: 利用结构信息对语义嵌入进行上下文化以实现模式匹配

Inwon Kang, Kavitha Srinivas, Nandana Mihindukulasooriya, Sola Shirai, Parikshit Ram, Horst Samulowitz, Oshani Seneviratne

发表机构 * Rensselaer Polytechnic Institute（伦斯勒理工学院）； IBM Research（IBM研究院）

AI总结提出SemStruct框架，通过将冻结的预训练语言模型与图神经网络的结构归纳偏置相结合，利用行级共现关系作为结构信息，在模式匹配任务中实现最先进性能。

Comments Accepted to KDD 26 Research Track

详情

DOI: 10.1145/3770855.3817963

AI中文摘要

模式匹配是集成异构数据源的基本步骤。虽然预训练语言模型通过捕获语言语义彻底改变了这一任务，但它们通常将表格数据视为独立列描述的序列化文本。这种序列化丢弃了关键的结构信息——具体来说，行级共现，即关系上下文——迫使模型仅依赖列标题语义或独立分布。为弥补这一差距，我们提出了SemStruct，一个将冻结的PLM的语义能力与图神经网络的结构归纳偏置相结合的框架。我们将表格建模为一个异构图，其中列和值是由行连接的节点，允许GNN在结构上传播消歧上下文。与需要专有LLM访问和语言模型微调的其他最先进方法不同，SemStruct保持语言模型冻结，仅训练一个轻量级结构编码器。在Valentine和SOTAB-SM基准上的大量实验表明，SemStruct实现了最先进的性能，在复杂的、可语义连接的数据集上超越了完全微调的基线。此外，我们的消融研究表明，行表示主要作为拓扑导管而非语义实体，验证了在模式匹配中显式结构建模的必要性。

英文摘要

Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as serialized text sequences of standalone column descriptions. This serialization discards critical structural information -- specifically, the row-level co-occurrences, i.e. the relational context -- forcing models to rely solely on column header semantics or standalone distributions. To bridge this gap, we propose SemStruct, a framework that joins the semantic power of frozen PLMs with the structural inductive bias of Graph Neural Networks (GNNs). We model the table as a heterogeneous graph where columns and values are nodes connected by rows, allowing the GNN to propagate disambiguating context across the structure. Unlike other state-of-the-art methods that require proprietary LLM access and fine-tuning of language models, SemStruct keeps the language model frozen and trains only a lightweight structural encoder. Extensive experiments on the Valentine and SOTAB-SM benchmarks demonstrate that SemStruct achieves state-of-the-art performance, outperforming fully fine-tuned baselines on complex, semantically joinable datasets. Furthermore, our ablation studies reveal that row representations serve primarily as topological conduits rather than semantic entities, validating the necessity of explicit structural modeling in schema matching.

URL PDF HTML ☆

赞 0 踩 0

2605.30728 2026-06-01 cs.LG cs.DC 版本更新

Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended

通过无损压缩减少机器学习中的GPU内存瓶颈——扩展版

Aditya K Kamath, Arvind Krishnamurthy, Marco Canini, Simon Peter

发表机构 * University of Washington（华盛顿大学）

AI总结提出无损压缩算法IBP，通过识别和消除张量组中的不变位，利用GPU优化的解压缩和异步PCIe传输，显著减少数据传输时间，加速GNN训练、DLRM嵌入查找和LLM推理。

Comments Extended version of paper published at 21st European Conference on Computer Systems (EUROSYS '26), April 27-30, 2026, Edinburgh, Scotland Uk

详情

DOI: 10.1145/3767295.3803595
Journal ref: 2026. In Proceedings of the 21st European Conference on Computer Systems. Association for Computing Machinery, 899-918

AI中文摘要

机器学习（ML）训练和推理经常处理远超GPU内存容量的数据集，迫使它们依赖PCIe进行按需张量传输，导致关键的传输瓶颈。有损压缩已被提出以缓解瓶颈，但会引入依赖工作负载的精度损失，使得在现有ML部署中使用变得复杂甚至不可行。我们探索无损压缩作为替代方案，以避免这种部署复杂性。我们确定了无损压缩可以集成到ML流水线中的位置，同时最小化对GPU执行的干扰。基于我们的发现，我们引入了不变位打包（IBP），一种新颖的无损压缩算法，旨在最小化ML的数据传输时间。IBP识别并消除张量组中的不变位，通过利用warp并行性、低开销位操作和异步PCIe传输的GPU优化解压缩来提高吞吐量。我们提供易于使用的API，通过为GNN训练以及DLRM和LLM推理框架添加IBP支持来展示它们。IBP平均实现了74%更快的GNN训练、180%更快的DLRM嵌入查找和24%更快的LLM推理。

英文摘要

Machine learning (ML) training and inference often process data sets far exceeding GPU memory capacity, forcing them to rely on PCIe for on-demand tensor transfers, causing critical transfer bottlenecks. Lossy compression has been proposed to relieve bottlenecks but introduces workload-dependent accuracy loss, making it complex or even prohibitive to use in existing ML deployments. We explore lossless compression as an alternative that avoids this deployment complexity. We identify where lossless compression can be integrated into ML pipelines while minimizing interference with GPU execution. Based on our findings, we introduce Invariant Bit Packing (IBP), a novel lossless compression algorithm designed to minimize data transfer time for ML. IBP identifies and eliminates invariant bits across groups of tensors, improving throughput through GPU-optimized decompression that leverages warp parallelism, low-overhead bit operations, and asynchronous PCIe transfers. We provide easy-to-use APIs, showcasing them by adding IBP support to GNN training, as well as DLRM and LLM inference frameworks. IBP achieves, on average, 74% faster GNN training, 180% faster DLRM embedding lookup, and 24% faster LLM inference.

URL PDF HTML ☆

赞 0 踩 0

2605.30720 2026-06-01 cs.LG cs.AI econ.GN q-fin.EC stat.ML 版本更新

Kalimati Vegetable Price Index Forecasting with a Momentum Corrected Online Stacking Ensemble

Kalimati蔬菜价格指数预测：基于动量校正的在线堆叠集成方法

Sahaj Raj Malla

发表机构 * Department of Mathematics, Kathmandu University（数学系，加德满都大学）

AI总结针对新兴经济体农产品价格高波动性问题，提出动量校正在线堆叠集成模型，通过构建逆波动率加权综合指数和64个因果特征，在90天预测期实现RMSE=1.771、MAPE=0.68%、R²=0.845的优异性能。

Comments 21 pages, 8 figures, 2 tables

详情

AI中文摘要

由于高波动性、频繁的供应中断以及强烈的文化需求影响，新兴经济体的农产品价格预测十分困难。本研究引入了Kalimati蔬菜价格指数（KVPI），这是一个新的逆波动率加权综合指数，汇总了加德满都十年（2013-2023年）的135种日度批发商品。通过创建稳定的宏观信号，KVPI减少了单个作物建模固有的噪声。我们开发了包含64个因果有效特征的丰富特征集，包括节日领先滞后效应、滚动统计量和日历变量。对涵盖统计、树基、深度学习、混合和Transformer架构的14种预测模型，在短期（7天）、中期（14天和30天）和长期（90天）预测期上进行了严格评估。树基集成方法表现出显著的鲁棒性，而经典统计模型和复杂Transformer在处理噪声数据集时表现不佳。提出的动量校正在线堆叠集成模型取得了最强性能，在90天预测期上均方根误差（RMSE）为1.771，平均绝对百分比误差（MAPE）低至0.68%，并解释了84.5%的方差（R²=0.845）。这一开源流程为尼泊尔及类似市场的政策制定者和供应链参与者提供了实用、可靠的工具，以预测价格波动并加强粮食安全。

英文摘要

Forecasting agricultural commodity prices in emerging economies is difficult due to high volatility, frequent supply disruptions, and strong cultural influences on demand. This study introduces the Kalimati Vegetable Price Index (KVPI), a new inverse-volatility weighted composite index that aggregates 135 daily wholesale commodities from Kathmandu over ten years (2013-2023). By creating a stable macro-level signal, the KVPI reduces the noise inherent in modelling individual crops. A rich set of 64 causally valid features was developed, including festival lead-lag effects, rolling statistics, and calendar variables. Fourteen forecasting models spanning statistical, tree-based, deep learning, hybrid, and transformer architectures were rigorously evaluated across short (7-day), medium (14- and 30-day), and long-term (90-day) horizons. Tree-based ensembles proved notably robust, while classical statistical models and complex transformers struggled with the noisy dataset. The proposed Momentum-Corrected Online Stacking Ensemble achieved the strongest performance, yielding a Root Mean Square Error (RMSE) of 1.771, an exceptionally low Mean Absolute Percentage Error (MAPE) of 0.68%, and explaining 84.5% of the variance (R-squared = 0.845) at the 90-day horizon. This open-source pipeline provides policymakers and supply chain actors in Nepal and similar markets with a practical, reliable tool for anticipating price movements and strengthening food security.

URL PDF HTML ☆

赞 0 踩 0

2605.30719 2026-06-01 cs.LG cs.AI 版本更新

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

何时LLMs足以作为序列RL任务的策略优化器？

Stephane Hatgis-Kessell, Emma Brunskill

发表机构 * Department of Computer Science, Stanford University（计算机科学系，斯坦福大学）

AI总结提出PromptPO方法，利用LLM通过Python描述状态空间、动作空间和奖励函数，基于rollout反馈迭代生成和优化可执行策略，在多种环境中匹配或超越标准RL基线，但在细粒度连续控制任务中表现不足。

详情

AI中文摘要

我们研究大型语言模型（LLMs）何时可以作为强化学习（RL）任务的有效黑盒策略优化器，即何时可以用LLM替代经典RL算法？我们通过引入提示策略优化（PromptPO）来探索这个问题，这是一种迭代方法，它用状态空间、动作空间和奖励函数的Python描述提示LLM，然后让LLM根据rollout反馈生成并优化可执行策略。在硬探索环境、Meta-World机器人任务以及几个现实世界控制问题中，PromptPO通常匹配或超过标准RL基线的性能，同时使用显著更少的环境交互。为了最大化期望回报，且无需进一步显式提示，PromptPO输出的策略范围从调谐的比例控制器或基于规则的规划到运行值迭代等规划算法的策略。我们的结果表明，当LLM能够利用关于环境或优化策略的先验知识时，基于LLM的策略优化是足够的。PromptPO在MuJoCo领域中的表现不如标准RL基线，这展示了基于LLM的策略优化在需要细粒度连续控制的设置中可能存在的局限性。

英文摘要

We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by introducing Prompted Policy Optimization (PromptPO), an iterative method that prompts an LLM with Python descriptions of the state space, action space, and reward function, then has it generate and refine executable policies based on rollout feedback. Across hard exploration environments, Meta-World robotics tasks, and several real-world control problems, PromptPO often matches or exceeds the performance of standard RL baselines while using substantially fewer environment interactions. To maximize expected return, and without further explicit prompting, the policies PromptPO outputs range from tuned proportional controllers or rule-based plans to policies that run planning algorithms like value iteration. Our results demonstrate that LLM-based policy optimization is sufficient when the LLM can leverage prior knowledge about the environment or optimization strategy. PromptPO underperforms standard RL baselines in MuJoCo domains. This demonstrates possible limitations of LLM-based policy optimization to settings that requiring fine-grained continuous control.

URL PDF HTML ☆

赞 0 踩 0

2605.30713 2026-06-01 cs.LG cs.CV cs.MM 版本更新

Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

多样性至关重要：重新审视视觉-语言模型中的测试时计算

Yijie Tong, Yifan Hou, Shaobo Cui, Antoine Bosselut, Mrinmaya Sachan

发表机构 * ETH Zürich（苏黎世联邦理工学院）； Shanghai Jiao Tong University（上海交通大学）； EPFL（苏黎世联邦理工学院）

AI总结针对视觉-语言模型（VLM）中测试时计算（TTC）策略应用不足的问题，提出基于预测熵的ETTC方法，通过利用模型间的置信度差异提升集成性能，理论证明并实验验证其优于多数投票和最佳单模型。

Comments ICML 2026

详情

AI中文摘要

测试时计算（TTC）策略已成为提升大型语言模型（LLM）推理能力的一种轻量级方法。然而，它们在视觉-语言模型（VLM）中的应用和益处尚未得到充分探索。我们对七个VLM和六个基准进行了TTC的系统研究，特别分析了基于特征的评分和多数投票方法。我们发现特征启发式方法失败，而投票在单模型设置中仅带来微小提升。我们从理论上证明，这种局限性源于缺乏预测多样性：当输出高度相关时，投票收益甚微。相比之下，多模型集成提供了更丰富的多样性，但标准的多数投票未能考虑不同模型的能力差异。为解决这一问题，我们提出了基于熵的TTC（ETTC），它根据预测熵选择最自信的预测。在单模型情况下，我们的方法退化为多数投票，但在模型集成中，它利用置信度差异优先考虑更强的模型。我们证明，在温和假设下ETTC优于多数投票，并通过实验表明它始终优于投票和最佳个体模型。关键在于，我们的结果表明，较小的模型可以协同增强较大的模型，释放出标准策略无法实现的集成增益。

英文摘要

Test-time compute (TTC) strategies have emerged as a lightweight approach to boost reasoning in large language models (LLMs). However, their application and benefits for vision-language models (VLMs) remain underexplored. We present a systematic study of TTC across seven VLMs and six benchmarks, specifically analyzing feature-based scoring and majority voting methods. We find that feature heuristics fail and voting yields only modest gains in single-model settings. We theoretically show that this limitation stems from a lack of prediction diversity: when outputs are highly correlated, voting provides little benefit. In contrast, multi-model ensembles offer richer diversity, yet standard majority voting fails to account for varying model capabilities. To address this, we propose Entropy-based TTC (ETTC), which selects the most confident prediction based on predictive entropy. Our method reduces to majority voting in the single-model case, but in model ensembles, it leverages confidence disparities to prioritize stronger models. We prove that ETTC outperforms majority voting under mild assumptions and empirically demonstrate that it consistently surpasses both voting and the best individual model. Crucially, our results show that smaller models can synergistically enhance larger ones, unlocking ensembling gains not achievable with standard strategies.

URL PDF HTML ☆

赞 0 踩 0

2605.30711 2026-06-01 cs.CL cs.AI cs.LG stat.ML 版本更新

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

SAGE: 一种用于智能体大语言模型中高效记忆演化的新颖门控机制

Sijia Wang, Dhanajit Brahma, Ricardo Henao

发表机构 * Duke University（杜克大学）

AI总结提出SAGE门控机制，基于von Mises-Fisher密度估计和自适应阈值，将记忆写入控制建模为新奇性检测问题，在LoCoMo上以更低成本实现最优token-F1。

详情

AI中文摘要

智能体大语言模型必须持续决定新提取的事实是应添加、与现有记忆合并还是忽略，然而先前的工作更侧重于检索和存储，而非原则性的写入端控制。我们将记忆演化视为一个新颖性检测问题，并提出SAGE（Spherical Adaptive Gate for memory Evolution），一种用于记忆演化的球形自适应门控机制，它通过基于von Mises-Fisher的密度估计器对记忆嵌入上的候选事实进行评分，并使用跟踪记忆存储几何结构的自适应阈值对其进行路由。SAGE将明确新颖的事实解析为ADD，明确冗余的事实解析为NOOP，仅将不确定的情况发送给LLM合并步骤，从而减少了昂贵的写入时推理。在LoCoMo上，SAGE在所有七个开放权重骨干对比中均实现了对Mem0的最佳平均token-F1，而在GPT-4o-mini上，它将添加阶段的API成本降低了3.4倍，添加阶段延迟降低了2.5倍，且平均评判分数差距很小。作为A-Mem的即插即用二进制门控，SAGE在五个模型上跳过了大约16-18%的LLM调用，且在开放权重骨干上质量变化极小。这些结果表明，新颖性感知的写入控制是提高长期智能体记忆中记忆质量和系统效率的实用杠杆。

英文摘要

Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We frame memory evolution as a novelty-detection problem and propose SAGE, a Spherical Adaptive Gate for memory Evolution that scores candidate facts with a von Mises-Fisher-based density estimator over memory embeddings and routes them with an adaptive threshold that tracks memory-store geometry. SAGE resolves clearly novel facts as ADD, clearly redundant facts as NOOP, and sends only uncertain cases to an LLM merge step, reducing expensive write-time reasoning. On LoCoMo, SAGE achieves the best average token-F1 against Mem0 on all seven open-weight backbone comparisons, while on GPT-4o-mini it reduces add-phase API cost by 3.4$\times$ and add-phase latency by 2.5$\times$ with only a small average judge-score gap. As a drop-in binary gate for A-Mem, SAGE skips roughly 16-18% of LLM calls across five models with minimal quality change on open-weight backbones. These results suggest that novelty-aware write control is a practical lever for improving both memory quality and system efficiency in long-term agentic memory.

URL PDF HTML ☆

赞 0 踩 0

2605.30700 2026-06-01 cs.CV cs.LG 版本更新

Mathematical Morphology in Machine Learning

机器学习中的数学形态学

Erick Oliveira Rodrigues, Aura Conci

发表机构 * Universidade Federal Fluminense（里贝伦联邦大学）

AI总结将数学形态学引入机器学习，提出基于形态学重建的快速聚类算法和一种结合闵可夫斯基与切比雪夫距离的新型距离度量，并设计新型形态学分类器以建模形状、密度和分形信息。

详情

Journal ref: sibgrapi 2018

AI中文摘要

本工作将数学形态学——一种成熟的视觉计算理论——引入机器学习，以利用标准技术常忽视的形状和密度方面。我们提出了一种基于形态学重建的快速聚类算法，该算法能精确保留聚类形状和密度。该方案具有独特特性：内在的最大聚类感知、无成本的噪声去除以及由结构元素控制的多样化增长模式。此外，我们提出了一种结合闵可夫斯基距离和切比雪夫距离的新型距离度量，对于形态学膨胀非常高效。在 $Z^2$ 离散邻域迭代中，它比曼哈顿距离快约1.3倍，比欧几里得距离快约329.5倍。当使用k近邻（k-NN）分类器在33个UCI数据集上与其他14种距离度量进行评估时，我们的度量在大多数情况下（33例中的26例）达到了高于平均的准确率，并在9个案例中取得了最佳整体准确率。最后，我们引入了新型形态学分类器。与现有文献不同，本方案独特地对数据集中的形状、密度和分形信息进行建模。

英文摘要

This work introduces mathematical morphology-an established visual computing theory-into machine learning to exploit shape and density aspects often overlooked by standard techniques. We propose a fast clustering algorithm based on morphological reconstruction that accurately preserves cluster shapes and density. This scheme offers unique features: an intrinsic sense of maximal clusters, cost-free noise removal, and diverse growth patterns controlled by structuring elements.Additionally, we propose a novel distance metric combining Minkowski and Chebyshev distances, highly efficient for morphological dilations. In $Z^2$ discrete neighbourhood iterations, it is roughly 1.3 times faster than Manhattan and 329.5 times faster than Euclidean distances. When evaluated using a k-Nearest Neighbours (k-NN) classifier across 33 UCI datasets against 14 other distances, our metric achieved above-average accuracies most frequently (26 of 33 cases) and the best overall accuracy in 9 cases.Finally, we introduce novel morphological classifiers. Unlike current literature, this proposal uniquely models shape, density, and fractal information in datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.30699 2026-06-01 cs.LG cs.CV 版本更新

A Context-Aware Middleware for Medical Image Based Reports: An approach based on image feature extraction and association rules

基于医学图像报告的情境感知中间件：一种基于图像特征提取和关联规则的方法

Erick O. Rodrigues, Jose Viterbo, Aura Conci, Trueman Mac Henry

发表机构 * Department of Computer Science（计算机科学系）； Departament of Mathematics & Statistics（数学与统计学系）； Universidade Federal Fluminense（联邦Fluminense大学）； York University（约克大学）

AI总结提出一种情境感知中间件，通过图像特征提取和关联规则，自动将医学图像分派给最合适的医疗人员，以提高医疗工作流程效率。

详情

DOI: 10.1109/AICCSA.2015.7507147
Journal ref: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA)

AI中文摘要

本工作提出了一种用于医疗工作流程组织和效率提升的情境感知中间件。在医院、实验室和远程放射学公司中，每位医生或技术人员都专注于特定类型的诊断或分析。因此，某些类型的医学图像通常会被转发给特定的医生或特定群体。这种转发非常耗时。也就是说，反复决定谁是最合适的医生，以及他在特定情境下是否可用，既繁琐又可能非常低效。因此，所提出的中间件能够处理并收集每位医疗人员所分析图像的数据。基于收集的数据和当前临床情境，中间件能够推断出谁是最适合接收特定传入医学图像的人员。

英文摘要

This work proposes a context-aware middleware for medical workflow organization and efficiency improvement. In hospitals, laboratories and teleradiology companies, each physician or technician is specialized in a specific kind of diagnosis or analysis. Therefore, certain types of medical images are often forwarded to a certain physician or a certain group. This forwarding is time consuming. That is, repeatedly deciding who would be the best physician, whether he is available at a certain moment given a certain context is exhaustive and may be very inefficient. Thus, the proposed middleware has the ability to process and collect data from images analyzed by each medical staff. Based on the collected data and current clinical context, the middleware is able to infer who would be the best fit staff to receive a certain incoming medical image.

URL PDF HTML ☆

赞 0 踩 0

2605.30694 2026-06-01 cs.LG 版本更新

Universal Decision Learners

通用决策学习器

Sridhar Mahadevan

发表机构 * Adobe Research（Adobe研究院）； University of Massachusetts（马萨诸塞大学）； Amherst（阿默斯特）

AI总结本文提出通用决策学习器（UDL）的范畴论框架，通过左Kan扩展和右Kan扩展将局部决策行为规范地扩展到全局一致行为，统一了规划、强化学习、因果干预、在线学习和博弈均衡等多种决策形式。

Comments 15 pages

详情

AI中文摘要

许多决策理论——规划、强化学习、因果干预、在线学习和博弈均衡——将局部信息转化为全局一致的行为。本文提出一个共同的范畴论形式化：通用决策学习器（UDL）通过一对通用构造将部分指定的决策函子从观测上下文扩展到新上下文。左Kan扩展表达展开、聚合和候选生成；右Kan扩展表达一致性、约束满足和不动点语义。核心主张并非每个决策问题都有相同的算法，而是许多决策形式化实例化同一个通用问题：规范地扩展局部行为数据，然后刻画全局一致的扩展。我们给出抽象的UDL构造，证明其通用比较性质，定义Kan不变的行为等价性和最小抽象，并展示贝尔曼方程、规划递归、因果干预、在线遗憾和均衡如何作为特例出现。补充材料更详细地发展了强化学习特例。

英文摘要

Many theories of decision making -- planning, reinforcement learning, causal intervention, online learning, and game-theoretic equilibrium -- turn local information into globally coherent behavior. This paper proposes a common categorical formulation: a Universal Decision Learner (UDL) extends a partially specified decision functor from observed contexts to new contexts by a pair of universal constructions. Left Kan extensions express rollout, aggregation, and candidate generation; right Kan extensions express consistency, constraint satisfaction, and fixed-point semantics. The central claim is not that every decision problem has the same algorithm, but that many decision formalisms instantiate the same universal problem: extend local behavioral data canonically, then characterize the globally coherent extensions. We give the abstract UDL construction, prove its universal comparison property, define Kan-invariant behavioral equivalence and minimal abstractions, and show how Bellman equations, planning recursions, causal interventions, online regret, and equilibria arise as special cases. The supplementary material develops the reinforcement-learning specialization in more detail.

URL PDF HTML ☆

赞 0 踩 0

2605.30686 2026-06-01 cs.CR cs.AI cs.LG 版本更新

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

工具调用ReAct代理中深度相关的间接提示注入：注入深度、载荷框架和轮次预算敏感性

Mohammadreza Rashidi

发表机构 * Department of Computer Science（计算机科学系）； AI and Media Analysis Lab（人工智能与媒体分析实验室）； Berlin, Germany（柏林，德国）

AI总结通过四个对照实验（共460次试验），研究在工具调用ReAct代理中，注入深度、载荷框架和轮次预算对间接提示注入攻击成功率的影响，发现注入深度是主导变量，且仅清理第一个工具观察可捕获67%的注入成功。

Comments 17 pages, 16 figures

详情

AI中文摘要

将链式推理与工具调用交错的ReAct代理越来越多地用于实际任务，如调度、文件检索和数据访问。它们的工具观察循环创建了一个直接攻击面：控制任何工具返回值的攻击者可以嵌入指令，将代理从用户目标引开，这种威胁称为间接提示注入。现有基准在固定条件下评估固定注入位置的攻击成功率（ASR），留下了三个未探索的风险维度：载荷在工具序列中出现的位置（注入深度）、使用的修辞风格（框架）以及代理允许的轮次数（轮次上限）。我们在五个攻击类别的20个场景中进行了四项对照研究，总共对GPT-4o-mini和Claude Haiku进行了460次试验，总API成本低于0.36美元。研究1显示，GPT-4o-mini的ASR从深度1的60%衰减到深度4和5的0%（Cramer's V = 0.58，p < 0.001；限制在序列内深度1-3：V = 0.47，p = 0.0013），这是由于深度1的模型抵抗和更深位置在遇到载荷前任务完成所致。研究2在Claude Haiku上重复了深度实验，通过保守的工具调用和真正的指令抵抗，在每个深度均实现了0%的ASR。研究3显示，在深度1，框架将ASR调节在25%（中性）到75%（角色）之间，范围达50个百分点，但在每个条件下N=20时未达到统计显著性。研究4确认ASR在3、5和7的轮次上限下稳定，表明轮次预算在此设置中不是风险因素。我们的结果确立了注入深度为主导变量，并表明仅清理第一个工具观察可捕获67%的测量注入成功。

英文摘要

ReAct agents that interleave chain-of-thought reasoning with tool calls are increasingly deployed for real tasks such as scheduling, file retrieval, and data access. Their tool observation loop creates a direct attack surface: an adversary who controls any tool's return value can embed instructions that redirect the agent away from the user's goal, a threat known as indirect prompt injection. Existing benchmarks evaluate attack success rate (ASR) at a fixed injection position under fixed conditions, leaving three risk dimensions unexplored: where in the tool sequence the payload appears (injection depth), what rhetorical register it uses (framing), and how many turns the agent is permitted (turn cap). We conduct four controlled studies on 20 scenarios spanning five attack categories, totalling 460 trials against GPT-4o-mini and Claude Haiku at a combined API cost under 0.36 USD. Study 1 shows that ASR against GPT-4o-mini decays from 60% at depth 1 to 0% at depths 4 and 5 (Cramer's V = 0.58, p < 0.001; restricted to within-sequence depths 1-3: V = 0.47, p = 0.0013), driven by model resistance at depth 1 and task completion before payload encounter at deeper positions. Study 2 replicates the depth experiment on Claude Haiku, which achieves 0% ASR at every depth through a combination of conservative tool invocation and genuine instruction resistance. Study 3 shows that framing modulates ASR between 25% (neutral) and 75% (persona) at depth 1, a 50-percentage-point range that does not reach statistical significance at N = 20 per condition. Study 4 confirms that ASR is stable across turn caps of 3, 5, and 7, indicating the turn budget is not a risk factor in this setting. Our results establish injection depth as the dominant variable and show that sanitising only the first tool observation captures 67% of measured injection successes.

URL PDF HTML ☆

赞 0 踩 0

2605.30662 2026-06-01 cs.LG q-bio.PE 版本更新

Spatio-temporal stochastic graph-based learning for infectious disease forecasting

基于时空随机图的传染病预测学习

Luz Stefani Sotomayor Valenzuela, Susanna Cramb, Darren Wraith

发表机构 * School of Public Health and Social Work, Queensland University of Technology（昆士兰理工大学公共卫生与社会科学学院）； QUT Centre for Data Science, Queensland University of Technology（昆士兰理工大学数据科学中心）

AI总结提出一种集成随机公式和不确定性近似过程的时空图架构，用于预测新发传染病病例，在COVID-19和水痘数据集上表现出竞争性性能。

Comments Preprint under review

详情

AI中文摘要

时空图模型通常用于预测COVID-19和水痘爆发等传染病的新病例。然而，在其学习过程中使用随机建模的研究却出人意料地不足，并且很少考虑大国家的完整数据集。因此，尚不清楚这些模型是否能在真实疾病传播场景中提供准确的预测。在这项工作中，我们提出了一种时空随机图架构，该架构集成了随机公式和不确定性近似过程，以预测新的传染病病例。我们发现，我们的方法能够适应在单一模型架构中编码大小人口地理网络。使用两个真实世界数据集——美国COVID-19和匈牙利水痘，我们报告了所提出的架构在预测美国2022年第一波COVID-19和匈牙利2012-2014年水痘波次中的增强效果。通过与四种时空图模型进行基准测试，定量结果显示，所提出的方法在预测美国所有3218个县和匈牙利所有20个县的新病例方面，具有竞争性的整体周度性能。所提出的方法能够表示相对于基线的整体流行病进展，尽管存在一步延迟；同时表现出对高频低幅变异的低敏感性。

英文摘要

Spatio-temporal graph-based models have typically been used to forecast new cases of infectious diseases such as COVID-19 and chickenpox outbreaks. However, the use of stochastic modelling into their learning process has been surprisingly under-investigated and rarely considered entire data sets of large countries. As a result, it is unknown whether these models would provide accurate forecasts in real-world disease spread scenarios. In this work, we propose a spatio-temporal stochastic graph-based architecture that integrates a stochastic formulation and uncertainty approximation process to forecast new infectious disease cases. We find that our approach can adapt to encode large and small population geographical networks within a single model architecture. Using two real-world data sets, COVID-19 in the US and chickenpox in Hungary, we report an enhanced effect of the proposed architecture across predictions of the 2022 first wave for COVID-19 in the US and comparative results of chickenpox waves during 2012-2014 in Hungary. By benchmarking with four spatio-temporal graph-based models, quantitative results show competitive overall weekly performance of the proposed approach on forecasting new cases for all 3,218 US counties and all 20 Hungary counties. The proposed approach can represent overall epidemic progression relative to baselines, though with a one-step delay; while exhibiting a reduced sensitivity to high-frequency and low-amplitude variability.

URL PDF HTML ☆

赞 0 踩 0

2605.30660 2026-06-01 cs.LG cs.RO 版本更新

BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies

BOKBO (Best of K Bad Options): VLA策略的校准式弃权

Anya Singh, Cabrel Happi, Jai Relan, Varun Nair, Vidyut Baradwaj

AI总结针对视觉-语言-动作(VLA)策略的测试时扩展方法，提出首个共形弃权层BOKBO，通过全局和逐任务变体提供有限样本无分布保证的执行违规率控制，并揭示基于扰动的K采样下策略内部非一致性分数的结构性缺陷。

详情

AI中文摘要

针对视觉-语言-动作(VLA)策略的测试时扩展方法，如RoboMonkey、SEAL、MG-Select和V-GPS，在推理时采样K个候选动作块并执行验证器最优结果。当所有K个候选都不安全时，系统会执行违规动作且无警告。我们提出BOKBO，这是首个用于K样本VLA推理的共形弃权层，提供执行违规率的有限样本无分布保证。我们提供全局和逐任务（Mondrian）变体，其中逐任务变体缩小了最困难任务上的条件差距。我们的分析揭示了基于扰动的K采样下策略内部非一致性分数的结构性失败：基础策略置信度代理与K样本不一致性之间的相关性为0.98（与动作噪声超参数σ相关），而与实际安全违规的相关性处于噪声基底。我们通过复现令牌级温度采样下的分析来测试失败范围，发现该失败是机制特定的，并在基于策略随机性的采样下得到部分缓解。一个基于语义视觉特征和任务标识学习的违规预测器支持紧密校准：在libero_object_temp_x0.1上使用OpenVLA-OFT，ε=0.05时，条件CRC边界在86%的bootstrap分割上成立，覆盖率为78%，净任务成功率为70%。Mondrian-BOKBO将最小逐任务条件保持比例从0.71提高到0.93。结果在5个训练种子上稳定，在π_0-FAST上的bootstrap噪声内可复现，在libero_spatial_temp_x0.1作为同等基准上成立，并经受住了四个套件内分布偏移。我们还识别并纠正了一个方法论陷阱：全局设置的力阈值远低于专家典型的操作力，将不安全行为与正常操作混淆，导致违规率膨胀5倍。

英文摘要

Test-time scaling for vision-language-action (VLA) policies, methods such as RoboMonkey, SEAL, MG-Select, and V-GPS, samples K candidate action chunks at inference and executes the verifier-best. When all K candidates are unsafe, the system executes a violating action with no warning. We propose BOKBO, the first conformal abstention layer for K-sample VLA inference, providing finite-sample distribution-free guarantees on executed-violation rate. We provide both global and per-task (Mondrian) variants, with the per-task variant closing the conditional gap on the hardest tasks. Our analysis exposes a structural failure of policy-internal nonconformity scores under perturbation-based K-sampling: the base-policy confidence proxy and K-sample disagreement correlate at 0.98 with the action-noise hyperparameter $σ$, while correlating at the noise floor with actual safety violations. We test the failure's scope by replicating the analysis under token-level temperature sampling and find the failure is mechanism-specific and partially mitigated under policy-stochasticity-based sampling. A learned violation predictor conditioned on semantic visual features and task identity supports tight calibration: at $ε$ = 0.05 on libero_object_temp_x0.1 with OpenVLA-OFT, the conditional CRC bound holds on 86% of bootstrap splits with 78% coverage and 70% net task success. Mondrian-BOKBO raises the minimum per-task conditional hold fraction from 0.71 to 0.93. Results are stable across 5 training seeds, replicate within bootstrap noise on $π_0$-FAST, hold on libero_spatial_temp_x0.1 as a co-equal benchmark, and survive four within-suite distribution shifts. We additionally identify and correct a methodological pitfall: globally-set force thresholds well below expert-typical manipulation forces conflate unsafe behavior with normal manipulation, inflating violation rates by $5\times$.

URL PDF HTML ☆

赞 0 踩 0

2605.30656 2026-06-01 cs.LG 版本更新

Learning to Perceive the World Through Control: Empowerment-Based Representation Learning

通过控制学习感知世界：基于赋能的表示学习

Mahsa Bastankhah, Sophie Broderick, Benjamin Eysenbach

发表机构 * Princeton University, USA（普林斯顿大学，美国）

AI总结本文通过最大化赋能目标，研究如何学习仅捕捉环境控制相关特征的表示，并证明赋能代理诱导的前向和后向表示对控制无关特征具有不变性。

扩散模型优先记忆原型样本，或：为什么我的扩散模型喜欢“潦草”？

Marta Aparicio Rodriguez, Anastasia Borovykh, Grigorios A. Pavliotis, Daniel J. Korchinski

发表机构 * Department of Mathematics, Imperial College London, UK ； ML Lab, Capital Fund Management, France ； Department of Physics, \'Ecole Polytechnique F\'ed\'erale de Lausanne (EPFL), Switzerland

AI总结本文通过随机层次模型生成的字符串训练扩散模型，发现模型优先记忆常见子串组成的样本，即使数据完全去重，表明点级去重无法保证隐私，而数据集多样性（尤其是高层抽象）能延缓记忆，并识别出部分记忆的中间状态导致生成均值回归的“潦草”现象。

详情

AI中文摘要

生成模型存在一个持久限制：它们记忆训练数据的倾向可能产生法律责任并削弱创意多样性。因此，理解哪些样本被全部或部分记忆，以及在什么条件下被记忆，仍然是一个重要的开放问题。本文对“非典型或稀有样本是否首先被记忆？”这一问题给出了否定答案。我们根据随机层次模型（RHM）的产生规则生成的字符串训练扩散模型，发现由常见子串组成的样本被优先记忆。即使训练数据由完全独特的样本组成，这一结论仍然成立，表明在数据点级别进行去重并不能提供有意义的隐私保证。相应地，我们预测并随后观察到，对于重尾数据集（即包含更多非典型样本的数据集），记忆会延迟。当重尾特性引入高层产生规则时，这种效应会放大。这些结果共同表明，数据集多样性，尤其是在更高抽象层次上，在延缓记忆方面起着重要作用。最后，我们识别出一个部分记忆的中间状态，其中常见子串首先被学习，随后在生成过程中过度产生。如果在此状态停止训练，模型将表现出均值回归的平淡性，常被讥讽为“潦草”。

英文摘要

Generative models have a persistent limitation: their tendency to memorize training data can create legal liabilities and erode creative diversity. Understanding which samples are memorized in whole or in part, and under what conditions, therefore remains an important open problem. Here we answer the question "Are atypical or rare samples memorized first?" in the negative. We train diffusion models on strings generated according to the production rules of the Random Hierarchy Model (RHM), and find that samples composed of common substrings are preferentially memorized. This holds true even if the training data consists of entirely unique samples, indicating that deduplication at the data point level does not provide a meaningful privacy guarantee. Correspondingly we predict, then observe, delayed memorization for fat-tailed datasets (i.e., those with more atypical samples). This effect is amplified when fat-tails are introduced into high-level production rules. These together suggest that dataset diversity, particularly at higher levels of abstraction, plays an important role in staving off memorization. Finally, we identify an intermediate regime of partial memorization in which common substrings are learned first and subsequently overproduced during generation. If training is stopped in this regime, models will exhibit the reversion-to-the-mean blandness often derided as "slop".

URL PDF HTML ☆

赞 0 踩 0

2605.30640 2026-06-01 cs.LG cs.CL 版本更新

CSULoRA: Closest Safe Update Low-Rank Adaptation

CSULoRA：最近安全更新低秩适应

Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh, Salima Lamsiyah

发表机构 * Department of Computer Science, University of Luxembourg（卢森堡大学计算机科学系）

AI总结提出CSULoRA方法，通过后处理校正LoRA适配器，在保留任务相关性的同时抑制不安全更新方向，降低攻击成功率。

Comments 10 pages, 3 figure

详情

AI中文摘要

低秩适应已成为大型语言模型参数高效微调的标准方法，但即使少量不安全或对抗性微调数据也会显著削弱对齐模型的安全行为。现有的安全保持LoRA方法通常依赖硬干预，如投影、剪枝、阈值化或额外训练目标。虽然这些方法可以抑制不安全更新方向，但它们也可能移除任务相关信息或需要额外调优。我们提出CSULoRA，一种通过最近安全更新估计来校正训练后LoRA适配器的后处理方法。CSULoRA从安全对齐模型与其对应基础检查点之间的权重位移中估计安全对齐子空间。然后，它将每个LoRA更新分解为完全对齐、部分对齐和子空间外分量。CSULoRA不丢弃估计安全子空间外的分量，而是求解一个闭式惩罚最小变化问题，该问题保留完全对齐分量，同时根据相对能量平滑衰减潜在不安全方向。在对抗性微调实验中，CSULoRA显著降低了攻击成功率，同时保留了标准LoRA微调获得的大部分效用增益。

英文摘要

Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned models. Existing safety-preserving LoRA methods often rely on hard interventions such as projection, pruning, thresholding, or additional training objectives. While these methods can suppress unsafe update directions, they may also remove task-relevant information or require extra tuning. We introduce CSULoRA, a post-hoc method for correcting trained LoRA adapters through closest safe update estimation. CSULoRA estimates a safety-aligned subspace from the weight displacement between a safety-aligned model and its corresponding base checkpoint. It then decomposes each LoRA update into fully aligned, partially aligned, and off-subspace components. Instead of discarding components outside the estimated safety subspace, CSULoRA solves a closed-form penalized minimum-change problem that preserves the fully aligned component while smoothly attenuating potentially unsafe directions according to their relative energy. In adversarial fine-tuning experiments, CSULoRA substantially reduces attack success rate while preserving most of the utility gains obtained from standard LoRA fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2605.30638 2026-06-01 cs.LG cs.AI 版本更新

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

分数广播与去相关：基于广播的信用分配通用框架

Mustafa Uzun, Mete Erdogan, Cengiz Pehlevan, Alper T. Erdogan

发表机构 * KUIS AI Center, Koc University, Turkey（科克大学KUIS人工智能中心，土耳其）； Electrical and Electronics Engineering, Koc University, Turkey（科克大学电子与电气工程系，土耳其）； Department of Electrical Engineering, Stanford University, USA（斯坦福大学电气工程系，美国）； John A. Paulson School of Engineering & Applied Sciences, Harvard University, USA（哈佛大学约翰·A·保罗森工程与应用科学学院，美国）； Kempner Institute, Harvard University, USA（哈佛大学凯姆纳研究所，美国）； Center for Brain Science, Harvard University, USA（哈佛大学脑科学中心，美国）

AI总结提出分数广播与去相关（SBD）框架，通过输出分数与隐藏层激活的正交性原理，统一了多种可微损失函数下的广播式信用分配，并理论支撑了三因子学习规则。

详情

AI中文摘要

我们引入了分数广播与去相关（SBD），一个用于一般可微损失族基于广播的信用分配的原则性框架。误差广播是反向传播的一种生物合理替代方案，它无需权重传输即可将输出信息发送到隐藏层。最近针对均方误差（MSE）设置引入的误差广播与去相关（EBD）框架，将这一机制建立在最优估计量的随机正交性基础上，即最优残差与输入的函数正交。我们通过引入输出分数（损失对最终层输出的梯度）与隐藏层激活之间的正交性原理来推广这一基础，该原理在最优分数条件均值为零时成立。这一单一原理统一了标准可微损失族（包括交叉熵、Bregman散度、适当评分规则和指数族负对数似然）的广播式信用分配。该框架为一般损失下的三因子学习规则提供了理论基础，其中神经调节因子被推导为广播损失分数。我们明确推导了交叉熵情况，刻画了可接受损失类，并引入了一种分数向量扩展技术，该技术在保持正交性框架的同时丰富了广播信号。在CIFAR-10和Tiny ImageNet上的实验表明，SBD显著优于现有的广播方法，而分数向量扩展带来了进一步的提升。总体而言，这项工作确定了损失分数作为广播信号，提供了正交性理论以及神经科学中三因子学习规则的理论基础，并展示了分数向量扩展如何丰富所得目标函数的去相关方向。

英文摘要

We introduce Score Broadcast and Decorrelation (SBD), a principled framework for broadcast-based credit assignment for general families of differentiable losses. Error broadcast is a biologically plausible alternative to backpropagation that sends output information to hidden layers without weight transport. The Error Broadcast and Decorrelation (EBD) framework, recently introduced for the mean-squared-error (MSE) setting, grounded this mechanism in the stochastic orthogonality of optimal estimators, under which the optimal residual is orthogonal to functions of the input. We generalize that foundation by introducing an orthogonality principle between the output score (the gradient of loss with respect to the final-layer output) and hidden-layer activations, which holds whenever the optimal score has conditional mean zero. This single principle unifies broadcast-based credit assignment across the standard differentiable-loss families, including cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods. The framework supplies a theoretical grounding for the three-factor learning rule under general losses, with the neuromodulatory factor derived as the broadcast loss score. We derive the cross-entropy case explicitly, characterize the admissible loss class, and introduce a score vector expansion technique that enriches the broadcast signal while preserving the orthogonality framework. Experiments on CIFAR-10 and Tiny ImageNet show that SBD substantially improves over existing broadcast approaches, with score vector expansion delivering further gains. Overall, this work identifies the loss score as the signal to broadcast, supplies the orthogonality theory and theoretical grounding for the three-factor learning rule from neuroscience, and shows how score vector expansion enriches the decorrelation directions of the resulting objective.

URL PDF HTML ☆

赞 0 踩 0

2605.30635 2026-06-01 cs.LG q-bio.GN 版本更新

CellBRIDGE: Learning Cellular Trajectories via Interaction-Aware Alignment

CellBRIDGE：通过交互感知对齐学习细胞轨迹

Silas Ruhrberg Estévez, Nicolas Huynh, Tennison Liu, Roderik M. Kortlever, Gerard I. Evan, David L. Bentley, Mihaela van der Schaar

发表机构 * DAMTP, University of Cambridge（剑桥大学应用数学与理论物理系）； Francis Crick Institute（弗朗西斯·克里克研究所）； University of Colorado Anschutz Medical Campus（科罗拉多大学安舒茨医学校区）

AI总结提出CellBRIDGE方法，通过将配体-受体介导的细胞间通信成本融入最优传输框架，改进了单细胞RNA测序数据中的轨迹推断和跨快照耦合。

详情

Journal ref: ICML 2026

AI中文摘要

从群体快照推断动态是机器学习和生物学中的一个基本挑战。在单细胞RNA测序（scRNA-seq）中，破坏性测量阻止了跨时间直接追踪单个细胞，使得轨迹推断欠定。最优传输（OT）为快照对齐提供了一个原则性框架，但一个长期存在的建模问题是哪些成本函数能产生生物学上有意义的耦合。标准的OT方法依赖于基因表达距离，隐含地将细胞视为独立点，并忽略了由配体-受体信号介导的结构化细胞间通信。我们引入了CellBRIDGE（基于细胞的规则化交互驱动基因表达），它用源自配体-受体活性的定向、类型化交互成本来增强基于特征的OT。通过显式建模细胞间通信，与仅基于特征的基线相比，CellBRIDGE在合成和真实scRNA-seq数据集上改善了跨快照耦合和下游轨迹估计。值得注意的是，CellBRIDGE实现了可机械解释的计算机扰动：在肺癌数据上，沉默特定的配体-受体对诱导的轨迹变化重现了预期靶向通路抑制的效果。

英文摘要

Inferring dynamics from population snapshots is a fundamental challenge in machine learning and biology. In scRNA-sequencing (scRNA-seq), destructive measurements preclude direct tracking of individual cells across time, making trajectory inference underdetermined. Optimal Transport (OT) provides a principled framework for snapshot alignment, but a long-standing modeling question is which cost functions yield biologically meaningful couplings. Standard OT approaches rely on gene-expression distances, implicitly treating cells as independent points and neglecting structured cell-cell communication mediated by ligand-receptor signaling. We introduce CellBRIDGE (Cell-Based Regularized Interaction-Driven Gene Expression), which augments feature-based OT with a directed, typed interaction cost derived from ligand-receptor activity. By explicitly modeling cell-cell communication, CellBRIDGE improves cross-snapshot couplings and downstream trajectory estimates across synthetic and real scRNA-seq datasets relative to feature-only baselines. Notably, CellBRIDGE enables mechanistically interpretable in silico perturbations: on lung cancer data, silencing specific ligand-receptor pairs induces trajectory shifts that recapitulate expected effects of targeted pathway inhibition.

URL PDF HTML ☆

赞 0 踩 0

2605.30632 2026-06-01 cs.HC cs.AI cs.LG 版本更新

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Rationalize: 人机对齐的共享语义推理

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen, Subhodeep Ghosh, Xun Song

发表机构 * New Jersey Institute of Technology（新泽西理工学院）

AI总结提出Rationalize角色对框架，通过共享推理空间中的互补角色对（如探索者-引导者）实现人类与AI在数据驱动意义建构中的语义对齐，并设计元素级和角色特定的对齐评估方法。

Comments Accepted by ACM CHI 2026 BiAlign Workshop

详情

AI中文摘要

我们介绍了Rationalize，一个用于数据驱动意义建构中人类与AI模型之间共享语义推理的角色对框架。基于人机协作和批判性思维的思路，我们将人机交互概念化为一系列互补的角色对（探索者-引导者、调查者-告知者、教师-学生、法官-倡导者），这些角色对在共享推理空间中运作。在这个空间中，人类分析师和AI模型（如LLM）使目的、问题、假设、证据、推理和影响变得明确，不仅促进输出层面的对齐，而且促进双方意图和行动的合理化层面的对齐。我们将这些角色对与双向人机对齐框架联系起来，说明“使AI对齐人类”和“使人类对齐AI”如何因角色而异，并勾勒出一个使用元素级和角色特定方法进行对齐设计和评估的协作研究议程。

英文摘要

We introduce Rationalize, a role-pair framework for shared semantic reasoning between humans and AI models in data-driven sensemaking. Building on ideas in human-machine teaming and critical thinking, we conceptualize human-AI interaction as a series of complementary role pairs (Explorer-Guide, Investigator-Informant, Teacher-Student, Judge-Advocate) operating in a shared reasoning space. In this space, human analysts and AI models (such as LLMs) make purposes, questions, assumptions, evidence, inferences, and implications explicit, facilitating alignment not only at the output level but at the level of rationalization of intent and action by each side. We relate these role pairs to the bidirectional human-AI alignment framework, illustrating how "aligning AI to humans" and "aligning humans to AI" differ by role, and sketch a collaborative research agenda for alignment design and assessment using element-level and role-specific approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.30631 2026-06-01 cs.CV cs.AI cs.LG 版本更新

从 Best-of-$N$ 偏好数据中学习奖励：目标、权衡与设计原则

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

发表机构 * Machine Learning Department（机器学习系）

AI总结本文分析了从 Best-of-$N$ 采样构建的成对偏好数据中 Bradley-Terry 奖励学习的目标，揭示了 $N$ 和基础分布对奖励估计的影响，并提出了基于样本效率和连通性权衡的设计原则。

详情

AI中文摘要

Best-of-$N$ 采样被广泛用于构建成对偏好数据：从基础分布中抽取 $N$ 个候选，并将最佳响应与拒绝响应配对。尽管其广泛使用，但 Bradley-Terry (BT) 奖励学习从这类数据中提取了什么，以及如何选择 $N$ 和基础分布，仍不清楚。我们将近期通过诱导条件分布对偏好数据的分析专门应用于 Best-of-$N$。对于独立参考变体，我们推导出作为 $N$ 和基础分布显式函数的闭式奖励目标，并证明它们保留了潜在奖励排名。对于实用的 Best-vs-Random 和 Best-vs-Worst 变体，所选和拒绝的响应通过同一候选集耦合，因此精确的 BT 可表示性通常不成立；然而，随着 $N$ 增长，有界类最小化器接近参考目标。尽管已知边界和连通性在成对偏好学习中控制样本效率，但 Best-of-$N$ 通过 $N$ 以相反方向耦合它们：更大的 $N$ 加宽成对边界但降低连通性。这种权衡产生了两个设计原则：当偏好标签是瓶颈时使用较大的 $N$，当生成是瓶颈时使用较小的 $N$；并塑造基础分布，使其质量集中在测试时比较最重要的响应之间。在合成和真实偏好数据上的实验支持了对样本量和基础分布形状的预测依赖性。

英文摘要

Best-of-$N$ sampling is widely used to construct pairwise preference data: $N$ candidates are drawn from a base distribution, and the best is paired with a rejected response. Despite its widespread use, what Bradley--Terry (BT) reward learning extracts from such data, and how to choose $N$ and the base distribution, remain unclear. We specialize a recent analysis of preference data via its induced conditional distribution to Best-of-$N$. For independent-reference variants, we derive closed-form reward targets as explicit functions of $N$ and the base distribution, and show that they preserve the latent reward ranking. For the practical Best-vs-Random and Best-vs-Worst variants, chosen and rejected responses are coupled through the same candidate set, so exact BT representability generally fails; nevertheless, bounded-class minimizers approach the reference targets as $N$ grows. Although margin and connectivity are known to govern sample efficiency in pairwise preference learning, Best-of-$N$ couples them through $N$ in opposing directions: larger $N$ widens pairwise margins but reduces connectivity. This trade-off yields two design principles: use larger $N$ when preference labels are the bottleneck, smaller $N$ when generation is the bottleneck; and shape the base distribution to place mass between the responses whose comparison matters most at test time. Experiments on synthetic and real preference data support the predicted dependence on sample size and base-distribution shape.

URL PDF HTML ☆

赞 0 踩 0

2605.30615 2026-06-01 cs.LG 版本更新

Improving Selective Classification with Pairwise Queries for Binary Classification

通过成对查询改进二分类的选择性分类

Harsh Vardhan, Sunav Choudhary, Natwar Modani, Arya Mazumdar

发表机构 * Adobe Research（Adobe研究院）

AI总结针对选择性分类中模型置信度与预测不一致导致高错误率的问题，提出使用成对查询检测高错误样本，以降低非拒绝样本的错误率，并通过理论和实验验证了其有效性。

详情

DOI: 10.1007/s10994-026-07078-y

AI中文摘要

从有限的漂流观测中学习有效的马尾藻输运动力学

F. J. Beron-VEra, M. J. Olascoaga, J. Morell, E. Cruz

发表机构 * Rosenstiel School of Marine, Atmospheric, and Earth Science（罗森斯蒂尔海洋、大气与地球科学学院）； University of Miami（迈阿密大学）； Department of Atmospheric Sciences（大气科学系）； Department of Ocean Sciences（海洋科学系）； Department of Marine Sciences（海洋科学系）； University of Puerto Rico（波多黎各大学）

AI总结针对浮游物质输运中未解析过程的影响，提出基于物理诊断和有限记忆表示的数据驱动输运学习框架，通过MLP集成和SINDy方法从有限拉格朗日观测中学习有效输运修正，并在波多黎各和墨西哥湾流区域验证了诊断信息的有效性及延迟稀疏符号修正的局限性。

详情

AI中文摘要

浮游物质输运受到未解析过程的影响，这些过程通常无法从现有的环流产品中获得。我们开发了一个数据驱动的输运学习框架，利用物理驱动的海洋-大气诊断和部分受惯性粒子记忆效应启发的有限记忆表示，从有限的拉格朗日观测中学习有效的输运修正。通过留一轨迹验证，使用预测性和稀疏符号发现方法分析诊断表示。在波多黎各地区和墨西哥湾流的马尾藻跟随漂流器应用中，结果表明诊断包含超越基线环流产品的输运相关信息。多层感知器（MLP）集成提供了灵活的预测轨迹修正，而非线性动力学稀疏辨识（SINDy）测试是否可以从诊断中提取瞬时或延迟的稀疏符号输运结构。结果在不同流态下有所不同：（i）在波多黎各，延迟稀疏符号修正提供了适度但系统的改进；（ii）在墨西哥湾流应用中，尽管延迟预测信息持续存在，但动态有用的稀疏符号修正主要保持瞬时性。这些结果支持粗粒度浮游物质输运中的有限记忆效应，同时也说明了获得稳定延迟稀疏符号闭合的困难。

英文摘要

Floating-material transport is influenced by unresolved processes that are often absent from available circulation products. We develop a data-driven transport-learning framework for learning effective transport corrections from limited Lagrangian observations using physically motivated ocean--atmosphere diagnostics and finite-memory representations motivated in part by inertial-particle memory effects. The diagnostic representation is analyzed through predictive and sparse symbolic-discovery approaches under leave-one-trajectory-out validation. Applications to Sargassum-following drifters in the Puerto Rico region and the Gulf Stream show that the diagnostics contain transport-relevant information beyond the baseline circulation products. Multilayer perceptron (MLP) ensembles provide flexible predictive trajectory corrections, while Sparse Identification of Nonlinear Dynamics (SINDy) tests whether instantaneous or delayed sparse symbolic transport structure can be extracted from the diagnostics. The results differ across flow regimes: (i) in Puerto Rico, delayed sparse symbolic corrections provide modest but systematic improvement; (ii) in the Gulf Stream application, dynamically useful sparse symbolic corrections remain primarily instantaneous even though delayed predictive information persists. These results support finite-memory transport effects in coarse-grained floating-material transport while also illustrating the difficulty of obtaining stable delayed sparse symbolic closures.

URL PDF HTML ☆

赞 0 踩 0

2605.30601 2026-06-01 cs.LG 版本更新

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

TASER: 面向几何驱动鲁棒性的任务感知斯坦正则化

Michał Kozyra, Gesine Reinert

发表机构 * Department of Statistics, University of Oxford, United Kingdom（英国牛津大学统计系）

AI总结提出TASER（任务感知斯坦正则化），一种基于Langevin斯坦算子的训练时正则化框架，通过惩罚训练分布下的逐点斯坦残差，诱导各向异性数据感知平滑性，从而提升模型在分布偏移和对抗扰动下的鲁棒性。

2605.30600 2026-06-01 cs.LG cs.IT math.IT 版本更新

The Fast Mixing Mechanism for Differential Privacy

差分隐私的快速混合机制

Omri Lev, Moshe Shenfeld, Vishwak Srinivasan, Katrina Ligett, Ashia C. Wilson

AI总结提出基于快速变换的差分隐私草图机制，在保持隐私保证的同时匹配经典快速草图方法的运行时间，并应用于差分隐私线性回归实现首个快速方法。

详情

AI中文摘要

随机草图是压缩大规模优化问题同时保持准确性的核心工具。特别是基于结构化矩阵（如Hadamard矩阵）的草图可以高效应用，并且通常以更低的计算成本得到接近原始问题的解。在差分隐私（DP）中，高斯草图已被用于解决DP线性回归，始于\citet{sheffet2017differentially, sheffet2019old}，随后由\citet{lev2025gaussianmix, lev2026near}改进。然而，尽管这些方法实现了强大的效用保证，它们通常不会比经典DP方法提高运行时间。在这项工作中，我们引入了一种基于快速变换的新DP草图机制，该机制在某些情况下匹配经典快速草图方法的运行时间。我们证明了该机制的最先进隐私保证，并表明在有利情况下，它们与高斯草图的隐私保证相差一个常数因子。作为一个应用，我们将该机制与最近的基于草图的DP线性回归方法相结合，得到了一种具有强效用和改进运行时间的新算法。我们为该算法建立了隐私和准确性保证，据我们所知，这是第一个用于DP普通最小二乘法的快速方法。

英文摘要

Randomized sketching is a central tool for compressing large-scale optimization problems while preserving accuracy. In particular, sketches that are based on structured matrices, such as the Hadamard matrix, can be applied efficiently and often yield solutions that approximate those of the original problem at much lower computational cost. In differential privacy (DP), Gaussian sketching has been used to solve DP linear regression, beginning with \citet{sheffet2017differentially, sheffet2019old} and later refined by \citet{lev2025gaussianmix, lev2026near}. However, although these methods achieve strong utility guarantees, they usually do not improve runtime over classical DP approaches. In this work, we introduce a new DP sketching mechanism based on fast transforms, which, in certain cases, matches the runtime of classical fast sketching methods. We prove state-of-the-art privacy guarantees for this mechanism and show that, in favorable regimes, they match those of the Gaussian sketch up to a constant factor. As an application, we combine this mechanism with recent sketch-based methods for DP linear regression to obtain a new algorithm with strong utility and improved runtime. We establish privacy and accuracy guarantees for this algorithm, yielding, to the best of our knowledge, the first fast method for DP ordinary least squares.

URL PDF HTML ☆

赞 0 踩 0

2605.30599 2026-06-01 cs.LG cs.CL 版本更新

面向发动机健康管理与剩余寿命预测的科学机器学习

Jostein Barry-Straume, Changmin Son, Adrian Sandu, Gavan Burke, Rekha Sundararajan, Andrew Rimell, James G. Steinrock

发表机构 * Computational Science Laboratory（计算科学实验室）； Department of Computer Science（计算机科学系）； Virginia Tech（弗吉尼亚理工学院）

AI总结提出一个多任务科学机器学习框架，通过联合预测涡轮气体温度、温差和剩余寿命并提供量化不确定性区间，以支持基于风险的维护决策。

详情

AI中文摘要

发动机健康管理依赖于对剩余寿命的可靠预测以及对涡轮气体温度等热指标的跟踪。在实际应用中，真实机队数据具有异质性和非平稳性，仅靠点预测不足以支持风险感知的维护决策。本文提出了一种用于涡轮机预测的多任务科学机器学习框架，该框架联合预测未修剪涡轮气体温度、涡轮气体温差和剩余寿命，并以预测区间的形式提供量化不确定性，并评估其经验覆盖率。共享序列编码器（带有残差双向LSTM层和注意力池化的卷积前端）为任务特定头部提供输入，包括用于概率回归的均值-方差估计，以及可选的用于基于阈值事件建模的生存头部。该框架设计为可通过少量面向实践者的参数（例如，温差阈值规则和剩余寿命目标构建）进行调整，以便部署能够与内部策略和专有标准保持一致。使用点指标和区间指标评估所提出框架的预测性能，包括平均绝对误差、预测区间覆盖概率、平均预测区间宽度以及覆盖-宽度准则。结果按总体和按飞行阶段与维护段分层报告，以突出运营环境的影响并支持不确定性感知监控。

英文摘要

Engine Health Management (EHM) depends on reliable forecasting of Remaining Useful Life (RUL) and on tracking thermal indicators such as turbine gas temperature (TGT). In practice, real-world fleet data are heterogeneous and non-stationary, and point predictions alone are insufficient for risk-aware maintenance decisions. This paper presents a multi-task scientific machine learning framework for turbine prognostics that jointly predicts turbine gas temperature untrimmed (TGTU), Delta Turbine Gas Temperature (DTGT), and RUL, with quantified uncertainty in the form of prediction intervals whose empirical coverage is evaluated. A shared sequence encoder (convolutional front-end with residual bidirectional LSTM layers and attention pooling) feeds task-specific heads, including mean--variance estimation for probabilistic regression and, optionally, a survival head for threshold-based event modeling. The framework is designed to be tunable via a small set of practitioner-facing parameters (e.g., DTGT thresholding rules and RUL target construction) so that deployment can align with in-house policies and proprietary criteria. The predictive performance of the proposed framework is evaluated using both point and interval metrics, including mean absolute error (MAE), prediction interval coverage probability (PICP), mean prediction interval width (MPIW), and the coverage--width criterion (CWC). Results are reported both in aggregate and stratified by flight phase and maintenance segment to highlight operational-context effects and to support uncertainty-aware monitoring.

URL PDF HTML ☆

赞 0 踩 0

2605.30592 2026-06-01 cs.LG 版本更新

Learning Transferable Predictability Representations

学习可迁移的可预测性表示

Diyali Goswami, Auroop R. Ganguly

发表机构 * Sustainability and Data Sciences Laboratory (SDS Lab)（可持续性与数据科学实验室）； AI4CaS: AI for Climate and Sustainability（AI4CaS：为气候与可持续性的人工智能）； Institute for Experiential AI（体验式人工智能研究所）； Pacific Northwest National Laboratory (PNNL)（太平洋西北国家实验室）

AI总结提出Gauge-Fixed Ordinal Network (GON)模型，通过锚定方差目标学习跨系统一致的序数评分，解决可预测性评估中的尺度模糊问题。

Comments 27 pages, 3 figures

详情

AI中文摘要

我们研究将标量分数分配给短轨迹窗口的问题，该分数反映其在有序可预测性机制连续体上的位置，范围从结构化确定性动力学到非结构化随机噪声。现有方法在单个系统内进行确定性-随机性判别，并且不能产生跨系统具有一致数值解释的分数。我们将此形式化为五级可预测性阶梯上的序数估计，并识别出跨系统模糊性的结构来源：仅排序监督使分数坐标在单调重参数化下未固定，我们称之为序数评分的规范自由度。我们提出了规范固定序数网络（GON），这是一种时间卷积模型，使用锚定方差目标训练，将级别-wise分数均值固定到共享目标坐标。GON操作于2-jet特征，这些特征暴露局部轨迹几何结构，由平滑流保持，并被随机代理过程破坏。在五个保留的动力学系统上，从预训练的GON检查点初始化在所有窗口预算上始终优于从头训练，适应深度反映了与训练家族的几何接近性。零样本分数在随机边界保留序数结构，其中代理过程最强烈地破坏非线性几何，并且预训练初始化在所有窗口预算上始终优于从头训练。成对判别和全局一致的序数评分是不同的属性，需要稳定的分数坐标以实现跨系统迁移，这对自然和工程动力学系统的可预测性评估、模型选择和早期预警诊断具有直接影响。

英文摘要

We study the problem of assigning a scalar score to a short trajectory window that reflects its position on an ordered continuum of predictability regimes, spanning structured deterministic dynamics to unstructured stochastic noise. Existing methods address deterministic-versus-stochastic discrimination within a single system and do not produce scores with a consistent numerical interpretation across systems. We formalize this as ordinal estimation over a five-level predictability ladder and identify a structural source of cross-system ambiguity: ranking supervision alone leaves the score coordinate unfixed up to a monotone reparameterization, which we term the gauge freedom of ordinal scoring. We propose the Gauge-Fixed Ordinal Network (GON), a temporal convolutional model trained with an anchor-and-variance objective that pins level-wise score means to shared target coordinates. GON operates on 2-jet features that expose local trajectory geometry, preserved by smooth flows and disrupted by stochastic surrogate procedures. On five held-out dynamical systems, initializing from a pretrained GON checkpoint consistently outperforms training from scratch across all window budgets, with adaptation depth reflecting geometric proximity to the training family. Zero-shot scores retain ordinal structure at the stochastic boundary, where surrogate procedures most strongly disrupt nonlinear geometry, and pretrained initialization consistently beats scratch across all window budgets. Pairwise discrimination and globally coherent ordinal scoring are distinct properties requiring a stable score coordinate for cross-system transfer, with direct implications for predictability assessment, model selection, and early-warning diagnostics across natural and engineered dynamical systems.

URL PDF HTML ☆

赞 0 踩 0

2605.30590 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

反事实评估揭示临床LLM和智能体的隐藏能力画像

Matt Turk

发表机构 * Protege Data Lab（Protege数据实验室）

AI总结提出因果敏感性评分（CSS），通过沿五个临床维度变异肿瘤病例来评估模型是否按预期方向更新推荐，发现与覆盖度指标排名相反，并揭示所有前沿模型在手术状态干预上的安全盲点。

Comments Accepted to RLEval @ ACM CAIS 2026 (Workshop on Methods and RL Environments for Evaluating AI Agents) and selected for an invited talk based on reviewer ratings. 4-page short paper + appendix

详情

AI中文摘要

两个临床AI系统在基于覆盖度的评分标准上得分几乎相同，但当患者输入变化时行为却截然不同：一个更新其推荐以匹配新的临床信号，而另一个无论输入如何都产生相同输出。我们引入因果敏感性评分（CSS），这是一个预注册的干预性指标，沿五个临床有意义的维度——生物标志物翻转、先前治疗失败、生物标志物移除、手术状态变化和分期扰动——变异肿瘤肿瘤委员会病例，并使用{0, 0.5, 1.0}量表对每个模型是否在预注册的正确方向上更新其推荐进行评分。与基于覆盖度的加权召回指标共识匹配评分（CMS）相比，来自三个实验室的六个前沿模型在224个病例的单次推理中评估，排名几乎完全相反：所有六个模型排名发生变化，CMS最差的模型成为CSS最好的模型，而一个中上CMS模型在CSS上排名最后。我们进一步揭示了一个普遍的安全盲点：每个前沿模型在手术状态干预上失败（D家族最多17.2%的CSS），这是CMS未暴露的发现。该指标也适用于使用工具的智能体：在ReAct风格的实验中，工具使用改善了六个模型中五个的CSS（+2.5到+20.3个百分点），然而CSS最低的模型检索相同的图表部分但仍未能更新其推荐——揭示了仅在反事实评估下可见的结构性响应缺陷。跨评判者复制和三位评估者的医学专业验证确认了总体发现。像CSS这样的干预性预注册指标补充了临床AI智能体的基于覆盖度的评估：它们捕捉了覆盖度指标遗漏的响应性，并为未来的智能体强化学习系统提供了候选的密集奖励信号。

英文摘要

Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other produces the same output regardless. We introduce the Causal Sensitivity Score (CSS), a pre-registered interventional metric that mutates oncology tumor-board cases along five clinically meaningful dimensions - biomarker flips, prior-treatment failures, biomarker removals, surgery-status changes, and stage perturbations - and scores whether each model updates its recommendations in the pre-registered correct direction using a {0, 0.5, 1.0} scale. Benchmarked against the Consensus Match Score (CMS), a coverage-based weighted recall metric, six frontier models from three labs evaluated in single-shot inference across 224 cases rank in nearly opposite orders: all six models change rank, the CMS-worst model becomes CSS-best, and one upper-mid CMS model ranks last on CSS. We further surface a universal safety blind spot: every frontier model fails on surgery-status interventions (at most 17.2% CSS on Family D), a finding CMS does not expose. The metric also transfers to tool-using agents: in a ReAct-style experiment, tool use improves CSS for five of six models (+2.5 to +20.3 percentage points), yet the lowest-CSS model retrieves the same chart sections and still fails to update its recommendations - revealing a structural responsiveness deficit visible only under counterfactual evaluation. Cross-judge replication and three-rater medical-professional validation confirm the aggregate findings. Interventional pre-registered metrics like CSS complement coverage-based evaluation for clinical AI agents: they capture responsiveness that coverage metrics miss and offer a candidate dense reward signal for future agentic RL systems.

URL PDF HTML ☆

赞 0 踩 0

2605.30585 2026-06-01 cs.LG cs.AI cs.CE 版本更新

破坏是学习生成的一般策略；扩散的优势在于认真对待它；探索是未来

Pierre-André Noël

发表机构 * ServiceNow AI Research（ServiceNow AI研究院）

AI总结本文提出扩散模型作为信息隐藏与猜测框架的一部分，论证其破坏式信息隐藏比手工设计更灵活，尤其在数据稀缺场景有优势，并探讨强化学习技术移植到扩散上下文时的微妙问题及原生探索方向。

Comments Published April 27th, 2026 as an ICLR blogpost https://iclr-blogposts.github.io/2026/blog/2026/destruction/

详情

Journal ref: Noël, Piere-André. "Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future", ICLR Blogposts, 2026

AI中文摘要

我将扩散模型视为机器学习技术家族的一部分，这些技术从模型输入中隐藏信息，并训练模型猜测被隐藏的信息。我认为扩散的破坏式信息隐藏方法比典型的手工设计信息隐藏技术更灵活，提供了一个丰富的训练环境，在某些场景（尤其是数据稀缺场景）中可能具有优势。然后，我讨论了将强化学习技术移植到扩散上下文时可能出现的微妙问题，并思考如何以更扩散原生的方式解决这些探索问题。我没有确定的答案，但我指出了我认为有趣的方向。本文之后附有一篇教程，进一步阐述了先破坏后生成的观点。为了便于教程的阐述，引入了一种新型的概率图模型。

英文摘要

I present diffusion models as part of a family of machine learning techniques that withhold information from a model's input and train it to guess the withheld information. I argue that diffusion's destroying approach to withholding is more flexible than typical hand-crafted information withholding techniques, providing a rich training playground that could be advantageous in some settings, notably data-scarce ones. I then address subtle issues that may arise when porting reinforcement learning techniques to the diffusion context, and wonder how such exploration problems could be addressed in more diffusion-native ways. I do not have definitive answers, but I do point my fingers in directions I deem interesting. A tutorial follows this thesis, expanding on the destroy-then-generate perspective. A novel kind of probabilistic graphical models is introduced to facilitate the tutorial's exposition.

URL PDF HTML ☆

赞 0 踩 0

2605.30550 2026-06-01 cs.LG 版本更新

Early Prediction of Future Behavioral Strategy from Process Traces

从过程轨迹早期预测未来行为策略

Robert Kasumba, Dennis Barbour, Chien-Ju Ho

发表机构 * Division of Computational and Data Sciences（计算与数据科学系）； Department of Biomedical Engineering（生物医学工程系）； Department of Computer Science（计算机科学系）

AI总结提出过程级潜变量模型（PLVM），通过跨任务过程轨迹融合共享人级潜在表示，实现早期跨任务行为策略预测。

详情

AI中文摘要

自适应系统通常需要从有限的证据中做出关于人的特定任务决策：导师可能需要预测学习者将如何解决新问题，游戏可能需要适应玩家进入新关卡，人机系统可能需要推断合作伙伴是会坚持计划还是切换目标。这些决策依赖于塑造人们如何解决相关任务的人级倾向，但这类倾向难以从标准行为证据中推断。一种方法是使用聚合结果摘要，如分数、完成率或生产率；这些摘要紧凑且跨任务可用，但可能将不同的行为过程压缩为相似的结果。另一种方法是使用过程级轨迹，记录行为如何展开；然而，单一任务内的过程建模可能将稳定的人级倾向与任务特定布局和可供性纠缠在一起。在本工作中，我们研究早期跨任务行为推断：部分源任务过程轨迹是否能揭示可迁移的人级结构，从而预测保留目标任务中的策略。我们引入过程级潜变量模型（PLVM），该模型编码任务特定轨迹并将其融合为共享的人级潜在表示以进行跨任务预测。在自然主义的人类游戏遥测数据集PowerWash Simulator中，PLVM使用来自两个清洁任务的部分轨迹，预测保留的消防站关卡中局部持久的区域规划者行为与频繁的区域跳跃者行为。具有已知潜在类型的受控模拟表明，当源任务揭示共享潜在过程的互补维度时，跨任务融合有所帮助。这些结果表明，当观察足够的目标任务行为不切实际时，过程级跨任务建模可以支持目标任务策略的早期预测。

英文摘要

Adaptive systems often need to make task-specific decisions about people from limited evidence: a tutor may need to anticipate how a learner will approach a new problem, a game may need to adapt when a player enters a new level, and a human-AI system may need to infer whether a partner will persist with a plan or switch goals. These decisions depend on person-level tendencies that shape how people solve related tasks, but such tendencies are difficult to infer from standard behavioral evidence. One approach is to use aggregate outcome summaries, such as scores, completion rates, or productivity; these summaries are compact and available across tasks, but can collapse distinct behavioral processes into similar outcomes. Another approach is to use process-level traces, which record how behavior unfolds; however, process modeling within one task can entangle stable person-level tendencies with task-specific layout and affordances. In this work, we study early cross-task behavioral inference: whether partial source-task process traces can reveal transferable person-level structure that predicts strategy in a held-out target task. We introduce a Process-Level Latent Variable Model (PLVM), which encodes task-specific traces and fuses them into a shared person-level latent representation for cross-task prediction. In PowerWash Simulator, a naturalistic telemetry dataset of human gameplay, PLVM uses partial traces from two cleaning tasks to predict locally persistent Zone Planner behavior versus frequent Zone Hopper behavior in the held-out Fire Station level. Controlled simulations with known latent types show that cross-task fusion helps when source tasks reveal complementary dimensions of a shared latent process. These results suggest that process-level cross-task modeling can support early prediction of target-task strategy when observing sufficient target-task behavior is impractical.

URL PDF HTML ☆

赞 0 踩 0

2605.30541 2026-06-01 cs.LG physics.geo-ph 版本更新

SubsurfaceGen: Procedural Generation of Field-Scale Earth Models and Seismic Data

SubsurfaceGen: 野外尺度地球模型与地震数据的程序化生成

Joseph Stitt, Pratik Rathore, Madeleine Udell, Ching-Yao Lai

发表机构 * Stanford University（斯坦福大学）

AI总结提出SubsurfaceGen，一个GPU加速的3D速度模型与地震数据生成器，并发布包含4276个2D速度切片、5秒波场和8秒炮集记录的数据集，用于评估机器学习在全波形反演中的表现。

Comments 38 pages

详情

AI中文摘要

全波形反演（FWI）是地下成像的黄金标准，应用范围从碳封存到能源和矿产勘探再到地震灾害评估。机器学习方法进行FWI需要野外尺度、地质多样性和物理真实的训练数据，但现有资源如Marmousi、SEAM和OpenFWI在空间范围、时间范围、地质多样性和物理真实性方面存在不足。我们通过SubsurfaceGen（一个用于3D速度模型和地震数据的GPU加速生成器）来解决这些限制。与SubsurfaceGen一起，我们发布了一个配对数据集，包含来自42个真实、野外尺度的3D速度模型的4276个2D速度切片、5秒波场和8秒炮集记录，每个模型横向跨度10 km x 10 km，深度6.19 km，分辨率为10 m。该数据集涵盖六种地质环境——四种由SubsurfaceGen构建，两种来自先前来源——与碳封存和碳氢化合物勘探相关。我们使用该数据集评估神经算子进行波场预测和编码器-解码器进行端到端速度反演，并保留一种地质环境用于分布外测试。这些实验揭示了野外尺度的失败模式，并展示了SubsurfaceGen及相关数据集如何影响基于机器学习的FWI。

英文摘要

Full waveform inversion (FWI) is the gold standard for subsurface imaging, with applications from carbon sequestration to energy and mineral exploration to earthquake hazard assessment. Machine learning approaches to FWI need field-scale, geologically diverse, and physically realistic training data, but existing resources such as Marmousi, SEAM, and OpenFWI fall short on spatial extent, temporal extent, geological diversity, and physical realism. We address these limitations with SubsurfaceGen, a GPU-accelerated generator for 3D velocity models and seismic data. Along with SubsurfaceGen, we release a paired dataset of 4,276 2D velocity slices, 5 s wavefields, and 8 s shot gathers drawn from 42 realistic, field-scale 3D velocity models, each spanning 10 km x 10 km laterally and 6.19 km deep at 10 m resolution. The dataset spans six geological settings -- four built with SubsurfaceGen and two drawn from prior sources -- relevant for carbon sequestration and hydrocarbon exploration. We use this dataset to evaluate neural operators on wavefield prediction and encoder-decoders on end-to-end velocity inversion, holding out one geological setting for out-of-distribution testing. These experiments surface failure modes at field-scale and demonstrate how SubsurfaceGen and the associated dataset can impact ML-based FWI.

URL PDF HTML ☆

赞 0 踩 0

2605.30538 2026-06-01 cs.LG 版本更新

DisasterLex: An Expert Concept-to-Schema Knowledge Graph for Geospatial Reasoning in Disaster Analytics

DisasterLex：面向灾害分析中地理空间推理的专家概念到模式知识图谱

Yiming Xiao, Ankit Basu, Kai Yin, Sahil Vartak, Christian Swords, Ali Mostafavi

发表机构 * Texas A&M University（德克萨斯大学）

AI总结提出DisasterLex框架，通过插入专家知识图谱（EKG）将用户查询与数据库模式桥接，在灾害分析场景中实现文本到SQL的准确转换，性能优于现有方法1.4-2.75倍。

详情

AI中文摘要

灾害不可避免且日益昂贵，有效响应依赖于查询结构化表格数据：支撑灾害管理的精确、信息密集的危害、暴露度、脆弱性和生命线基础设施记录。当前的文本到SQL方法允许自然语言访问此类表格，但迁移到灾害领域时效果不佳，因为查询跨越异构地理空间模式，并需要对因果关系进行推理。我们引入DisasterLex，一个知识图谱中介的框架，在用户查询和数据库之间插入一个包含精选概念和类型化因果边的专家知识图谱（EKG），并通过概念到表格链接与模式桥接。该编排运行四个阶段（识别查询实体、路由到操作域、在因果边上规划、以及生成SQL），在每个步骤限制传递给模型的模式。我们在一个灾害分析数据库（36个地理空间表，150列）上实例化，该数据库具有包含107个概念、117条因果边和52个概念到模式链接的EKG，并在75个查询的测试集上评估。在所有七个涵盖专有和开源权重系列的基础模型上，DisasterLex以1.65到3.56（满分5.0）的绝对分数，比四个最先进的基线（LightRAG、HippoRAG 2、ReFoRCE、CHESS）高出1.4到2.75倍。错误分析显示基线失败集中在路由和多表SQL组合上，这正是我们的编排明确解决的操作。代码、数据和EKG工件可在https://github.com/YimingXiao98/DisasterLex 和Zenodo https://doi.org/10.5281/zenodo.20388029 获取。

英文摘要

Disasters are inevitable and increasingly costly, and effective response depends on querying structured tabular data: precise, information-dense records of hazard, exposure, vulnerability, and lifeline infrastructure that underpin disaster management. Current text-to-SQL methods enable natural-language access to such tables but transfer poorly to the disaster domain, where queries span heterogeneous geospatial schemas and require reasoning over causal relations. We introduce DisasterLex, a knowledge-graph-mediated framework that inserts an Expert Knowledge Graph (EKG) of curated concepts and typed causal edges between the user query and the database, bridged to schema by concept-to-table links. The orchestration runs four stages (identifying query entities, routing to the operational domain, planning over causal edges, and grounding the SQL), restricting the schema passed to the model at each step. We instantiate it on a disaster-analytics database (36 geospatial tables, 150 columns) with an EKG of 107 concepts, 117 causal edges, and 52 concept-to-schema links, evaluated on a 75-query test set. On all seven base models spanning proprietary and open-weight families, DisasterLex beats four state-of-the-art baselines (LightRAG, HippoRAG 2, ReFoRCE, CHESS) by 1.4x to 2.75x, with absolute scores of 1.65 to 3.56 (of 5.0). Error analysis shows baseline failures cluster in routing and multi-table SQL composition, the operations our orchestration explicitly addresses. Code, data, and the EKG artifact are available at https://github.com/YimingXiao98/DisasterLex and on Zenodo at https://doi.org/10.5281/zenodo.20388029.

URL PDF HTML ☆

赞 0 踩 0

2605.30537 2026-06-01 cs.LG 版本更新

The Long-Term Effects of Data Selection in LLM Fine-Tuning

LLM微调中数据选择的长期影响

Yuxin Yang, Aoxiong Zeng, Xiangquan Yang

发表机构 * Shanghai University（上海大学）； East China Normal University（华东师范大学）

AI总结研究多阶段LLM微调中，短视数据选择策略（如基于当前效用）可能导致后续学习变慢、遗忘加剧和排名反转，提出长视距感知选择（LHAS）目标函数以缓解此问题。

Comments work in process

详情

AI中文摘要

数据选择越来越多地被用于降低大型语言模型（LLM）微调的成本，近期方法根据当前效用、多样性、质量或影响力对样本进行优先级排序。本文研究一个不同的问题：当微调在多个阶段进行时，当前看起来最优的选择策略是否会使模型后续适应性变差？我们引入LLM数据选择的长期视角，其中选择器不仅通过即时任务性能评估，还通过未来适应速度、遗忘、能力不平衡和分布外鲁棒性评估。我们在统一的多阶段协议下比较了代表性的随机、基于损失、基于梯度、基于多样性、基于质量和基于效用-多样性的选择家族。通过旨在实例化该协议的控制实验，我们展示了短期选择器如何表现出排名反转：它们改善了当前阶段，同时减慢了后续学习并增加了遗忘。我们将这种行为形式化为“短视选择”，提供了其可能发生的简单局部分析，并提出了一个诊断性的长视距感知选择（LHAS）目标函数，该函数在即时效用基础上增加了覆盖度、未来代理迁移和反集中项。该研究认为，数据选择应被评估为一种塑造模型学习轨迹的训练干预，而不仅仅是一种局部数据效率机制。

英文摘要

Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different question: when fine-tuning occurs over multiple stages, can selection strategies that look optimal now make the model less adaptable later? We introduce a long-horizon view of LLM data selection in which a selector is evaluated not only by immediate task performance, but also by future adaptation speed, forgetting, capability imbalance, and out-of-distribution robustness. We compare representative random, loss-based, gradient-based, diversity-based, quality-based, and utility-diversity selection families under a unified multi-stage protocol. Through controlled experiments designed to instantiate this protocol, we show how short-term selectors can exhibit rank reversal: they improve the current stage while slowing subsequent learning and increasing forgetting. We formalize this behavior as \emph{myopic selection}, provide a simple local analysis of why it can occur, and propose a diagnostic Long-Horizon Aware Selection (LHAS) objective that augments immediate utility with coverage, future-proxy transfer, and anti-concentration terms. The study argues that data selection should be evaluated as a training intervention that shapes the model's learning trajectory, rather than only as a local data-efficiency mechanism.

URL PDF HTML ☆

赞 0 踩 0

2605.30532 2026-06-01 stat.CO cs.LG stat.ML 版本更新

True Self-Avoiding Walk for Accelerating Markov-Chain Monte Carlo Integration

真实自回避行走用于加速马尔可夫链蒙特卡洛积分

Qinghua, Ding, Venkat Anantharam

发表机构 * Department of Electrical Engineering and Computer Sciences University of California at Berkeley（加州大学伯克利分校电子工程与计算机科学系）

AI总结本文提出使用真实自回避行走（TSAW）改进马尔可夫链蒙特卡洛（MCMC）积分估计，通过惩罚过度访问的转移概率，使得经验积分误差达到几乎必然的O(√log t / t)量级，显著优于标准随机游走的t^{-1/2}误差。

详情

AI中文摘要

我们研究真实自回避行走（TSAW）作为一种通过马尔可夫链蒙特卡洛（MCMC）改进经验积分估计的机制。我们考虑与有限集上不可约马尔可夫核$P$（具有平稳分布$π$）相关的有限状态自适应采样动力学，其中转移概率根据经验过度使用而受到惩罚。我们的主要结果是，由此产生的基于TSAW的行走的经验占用计数$L_t(i)$和转移计数$N_t(i,j)$满足\[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad ext{和}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad ext{几乎必然} \]对于每个状态$i$和每个满足$P_{ij}>0$的边$(i,j)$。因此，对于每个有界函数$f:V o\mathbb R$，我们的积分估计器的误差收敛为\[ \left| rac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i) ight| = O\left( rac{\sqrt{\log t}}{t} ight) \qquad ext{几乎必然}. \]这些结果表明，与标准随机游走方法下经验平均的通常$t^{-1/2}$误差标度相比，基于TSAW的估计器产生的经验积分误差几乎必然为$O(\sqrt{\log t}/t)$量级，从而实现了对样本量$t$的显著更尖锐的依赖性。

英文摘要

We study true self-avoiding walk (TSAW) as a mechanism for improving empirical integral estimation via Markov chain Monte Carlo (MCMC). We consider finite-state adaptive sampling dynamics associated with an irreducible Markov kernel $P$ on a finite set, with stationary distribution $π$, in which the transition probabilities are penalized according to empirical overuse. Our main result is that the empirical occupation counts $L_t(i)$ and transition counts $N_t(i,j)$ of the resulting TSAW-based walk satisfy \[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad\text{and}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad\text{almost surely} \] for every state $i$ and every edge $(i,j)$ with $P_{ij}>0$. Consequently, for every bounded function $f:V\to\mathbb R$, the error of our integral estimator converges as \[ \left|\frac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i)\right| = O\left(\frac{\sqrt{\log t}}{t}\right) \qquad\text{almost surely}. \] These results show that, in contrast with the usual $t^{-1/2}$ error scaling for empirical averages under standard random-walk-based methods, TSAW-based estimator yields empirical integral errors of order $O(\sqrt{\log t}/t)$ almost surely, thereby achieving a substantially sharper dependence on the sample size $t$.

URL PDF HTML ☆

赞 0 踩 0

2605.30529 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

通用嵌入还是特定嵌入，哪个更好？非英语语言临床编码搜索的实证研究

David Rey-Blanco, Roberto Cruz

发表机构 * TietAI

AI总结本研究通过使用大型生成语言模型生成的合成数据微调双语编码器，构建两阶段检索器，解决了非英语语言临床编码检索中召回率下降的问题，并在多语言基准上取得了优于BioBERT-ST的性能。

Comments 24 pages, 12 figures, 6 tables

详情

AI中文摘要

用于语义搜索的句子嵌入模型绝大多数是在英语语料库上开发和评估的。当应用于其他语言的临床检索——特别是ICD-10-CM/CIE-10代码的检索——召回率会下降，而这种下降往往被聚合基准所掩盖。我们研究大型生成语言模型是否可以作为数据工厂来缩小这一差距。我们构建了一个两阶段检索器（双编码器后接交叉编码器重排序器），该检索器在Gemini生成的合成数据（涵盖英语、西班牙语、加泰罗尼亚语、意大利语、葡萄牙语和法语）上对西班牙生物医学编码器（PlanTL-GOB-ES/bsc-bio-ehr-es）进行微调，并与BioBERT-ST和未调优的西班牙编码器进行评估。仅双编码器在MRR（0.876 vs. 0.866）上匹配BioBERT-ST，并在R@3（0.650 vs. 0.626）和R@5（0.804 vs. 0.790）上超越它，且无需英语生物医学预训练。添加交叉编码器重排序器将聚合R@5提升至0.822，并在五种语言中的四种上占据主导地位（西班牙语+0.017，加泰罗尼亚语+0.033，法语+0.018，葡萄牙语+0.037），但以英语的小幅回归为代价。这种权衡在临床上是可接受的：葡萄牙语的R@5达到0.829，而BioBERT-ST为0.714。贡献：一个基于LLM生成数据构建领域特定医学检索器的开放配方；学习增益的量化（MRR从0.755到0.876，+15.9%，使用约19,500个合成对）；以及按语言和排名对增益集中区域的刻画。

英文摘要

Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora. When applied to clinical retrieval in other languages -- particularly retrieval of ICD-10-CM / CIE-10 codes -- recall degrades in ways often masked by aggregate benchmarks. We study whether large generative language models can serve as data factories to close this gap. We build a two-stage retriever (bi-encoder followed by cross-encoder reranker), fine-tuned from a Spanish biomedical encoder (PlanTL-GOB-ES/bsc-bio-ehr-es) on Gemini-generated synthetic data covering English, Spanish, Catalan, Italian, Portuguese and French, and evaluate against BioBERT-ST and the un-tuned Spanish encoder. The bi-encoder alone matches BioBERT-ST on MRR (0.876 vs. 0.866) and overtakes it on R@3 (0.650 vs. 0.626) and R@5 (0.804 vs. 0.790) without English biomedical pretraining. Adding a cross-encoder reranker lifts aggregate R@5 to 0.822 and dominates on four of five languages (+0.017 Spanish, +0.033 Catalan, +0.018 French, +0.037 Portuguese) at the cost of a small English regression. The trade-off is clinically acceptable: Portuguese reaches R@5 = 0.829 vs. BioBERT-ST's 0.714. Contributions: an open recipe for building domain-specific medical retrievers from LLM-generated data; quantification of the learning gain (MRR 0.755 to 0.876, +15.9% with ~19,500 synthetic pairs); and a characterisation of where gains concentrate by language and rank.

URL PDF HTML ☆

赞 0 踩 0

2605.30526 2026-06-01 cs.LG cs.CL 版本更新

Measuring, Localizing, and Ablating Alignment Signatures in LLMs

测量、定位和消融LLMs中的对齐特征

Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür, Nick Feamster

发表机构 * University of Chicago（芝加哥大学）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）

AI总结研究通过对比人类文本、基模型和对齐模型生成，发现对齐训练引入AI风格特征，并提出PASTA方法通过消融对齐方向来降低AI检测率。

详情

AI中文摘要

对齐语言模型通常表现出可识别的AI风格，但其与后训练和内部表示的联系尚不清楚。本文研究后训练是否引入或放大了AI风格规律，以及这些规律是否具有局部内部特征。为此，我们在匹配的人类源前缀下比较人类文本、基模型生成和对齐模型生成。对齐生成显示出比基生成更低的人类语料库亲和力和更高的AI检测率，表明后训练使生成文本偏离人类语料库风格，转向检测器可见的AI风格文本。然后我们引入PASTA（后训练对齐特征目标消融），一种无需训练的方法，通过对齐-基残差对比估计后训练对齐特征，并在解码过程中消融相应方向。在11个对齐模型和6个AI检测器上，PASTA降低了对大多数对齐模型的检测率；该效果在检测器间良好迁移，且不被随机方向复现。定性分析表明，PASTA生成保持相关性和连贯性，同时表现出更大的风格变化。这些结果共同表明，后训练的AI风格效果可以通过激活消融进行测量、定位和因果测试。

英文摘要

Aligned language models often exhibit a recognizable AI-like style, yet its connection to post-training and internal representations remains poorly understood. In this work, we study whether post-training introduces or amplifies AI-like stylistic regularities and whether these regularities have a localized internal signature. To this end, we compare human text, base-model generations, and aligned-model generations under matched human-source prefixes. Aligned generations show lower human-corpus affinity and higher AI-detection rates than base generations, suggesting that post-training shifts generated text away from human-corpus style and toward detector-visible AI-like text. We then introduce PASTA (Post-training Alignment Signature Targeted Ablation), a training-free method that estimates a post-training alignment signature from aligned-base residual contrasts and ablates the corresponding direction during decoding. Across 11 aligned models and 6 AI detectors, PASTA lowers the detection rate for most aligned models; this effect transfers well across detectors and is not reproduced by random directions. Qualitative analysis suggests that PASTA generations remain relevant and coherent while exhibiting greater stylistic variation. Together, these results show that AI-like stylistic effects of post-training can be measured, localized, and causally tested through activation ablation.

URL PDF HTML ☆

赞 0 踩 0

2605.30524 2026-06-01 cs.LG 版本更新

Representation Collapse in Sequential Post-Training of Large Language Models

大型语言模型顺序后训练中的表示坍缩

Yichen Liu, Mingyu Chen, Hao Wang, Xiaoran Xu, Chenxi Lin, Rui Zhang, Yutong Zhou, Yuxin Yang, Jiarui Wu, Wei Sun

发表机构 * Hangzhou Dianzi University（杭州电子科技大学）； Zhejiang Gongshang University（浙江工商大学）； Ningbo University（宁波大学）； Shanghai University（上海大学）

AI总结研究大型语言模型在顺序后训练阶段中内部表示逐渐压缩为低秩、各向异性且同质的特征空间，并提出轻量级干预措施以保持未来可学习性。

Comments work in progress

详情

AI中文摘要

大型语言模型现在通过一系列后训练阶段进行适配，而不是通过单次指令微调。本文研究这种顺序后训练是否逐渐将内部表示压缩为低秩、各向异性且同质的特征空间。我们定义了一套针对隐藏状态、logits、token轨迹和LoRA更新的测量方法，并利用它来分析在受控阶段顺序下的监督微调、偏好优化、安全/拒绝调优、数学和代码专业化以及长思维链调优。中心假设是，过度的表示集中不仅仅是几何上的奇特性：它预示着后期适配中可塑性降低、域外泛化能力减弱以及校准效果变差。我们进一步评估了轻量级干预措施，包括混合域重放、特征刷新、表示多样性正则化和LoRA更新去相关，作为在不放弃后训练行为收益的情况下保持未来可学习性的方法。

英文摘要

Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies whether such sequential post-training gradually compresses internal representations into low-rank, anisotropic, and homogeneous feature spaces. We define a measurement suite for hidden states, logits, token trajectories, and LoRA updates, and we use it to analyze supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings. The central hypothesis is that excessive representation concentration is not merely a geometric curiosity: it predicts reduced plasticity during later adaptation, weaker out-of-domain generalization, and poorer calibration. We further evaluate lightweight interventions, including mixed-domain replay, feature refresh, representation diversity regularization, and LoRA update decorrelation, as ways to preserve future learnability without giving up the behavioral gains of post-training.

URL PDF HTML ☆

赞 0 踩 0

2605.30523 2026-06-01 cs.LG cs.AI cs.CC cs.CL cs.FL 版本更新

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

重新审视填充Transformer的表达能力：哪些架构选择重要，哪些不重要

Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

发表机构 * ETH Zürich（苏黎世联邦理工学院）； Allen Institute for AI（人工智能研究所）

AI总结本文通过连接布尔电路，系统研究了填充Transformer的表达能力，发现数值精度和模型深度是影响表达能力的主要因素，而注意力类型、模型宽度和均匀性等架构选择对表达能力影响不大。

详情

AI中文摘要

近期工作通过连接布尔电路描述了Transformer能计算和不能计算的内容，但现有结果缺乏精确刻画，且对建模选择敏感。填充Transformer——在其输入后附加填充符号如“...”——通过为自适应并行计算提供多项式空间，成为建立与电路类等价关系的有用工具。然而，目前仅研究了有限的填充Transformer理想化模型，这些等价关系在注意力类型、模型宽度和均匀性变化下的稳健性仍待探索。我们发现，在实际假设下，填充Transformer对所有这些变化都出奇地稳健，并确定数值精度和模型深度是影响表达能力的主要因素。具体地，我们证明多项式填充的L-均匀常数精度Transformer等价于L-均匀AC⁰，而增长精度的Transformer达到L-均匀TC⁰，与宽度无关。此外，循环机制允许类似电路的顺序处理：log^d N次循环的常数精度Transformer达到FO-均匀AC^d，增长精度的达到FO-均匀TC^d。有趣的是，宽度或精度超过对数增长并不会增加表达能力，且我们所有结果对softmax和平均硬注意力Transformer均成立。

英文摘要

Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.

URL PDF HTML ☆

赞 0 踩 0

2605.30514 2026-06-01 cs.LG cs.CL 版本更新

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

MAAT: 多阶段适配器感知的定向遗忘学习

Suryash Yagnik, Shubham Gaur, Saksham Thakur, Vinija Jain, Aman Chadha, Amitava Das

发表机构 * Indian Institute of Information Technology, Bhopal, India（印度比哈尔理工学院）； University of California, Santa Cruz, USA（加州大学圣克鲁兹分校）； Independent Researcher（独立研究者）； Stanford University, USA（斯坦福大学）； BITS Pilani Goa, India（比斯拉米印度学院）

AI总结针对现有机器遗忘评估中因果知识（Why类）样本极少导致评估失衡的问题，提出5WBENCH平衡基准和MAAT多阶段框架，首次在Why类知识上同时实现高遗忘与高保留。

Comments 16 pages, 4 figures, 10 tables

详情

AI中文摘要

机器遗忘评估在结构上存在偏差：Why类问题（探究因果和关系知识）在CounterFact中占比不足0.06%，在ZSRE中占0.6%，在TOFU、MUSE和WMDP-Cyber中占不到1.3%。这种近乎为零的表示意味着，在因果知识上失败的方法可以在整体上获得高分，而这种失败在没有平衡评估的情况下是无法检测的。我们提出了5WBENCH，一个平衡的5000样本基准，每个5W类别（谁、什么、何时、何地、为什么）包含1000个样本，首次使因果遗忘失败变得可量化。使用5WBENCH，我们表明现有基线方法无法在Why类问题上同时实现高遗忘和高保留：激进的遗忘会降低保留知识，而保守的方法则无法遗忘因果事实。Why类问题的难度源于多跳推理链（Why条目占44%，其他类别≤2%）以及超过40.1个token答案跨度上的梯度稀释。我们提出了MAAT（多阶段适配器感知的定向遗忘学习），一个三阶段框架，作用于LoRA适配器权重，结合梯度投影上升、SVD秩维度剪枝、任务向量否定以及混合KL-隐藏状态保留修复。MAAT是第一个在Why类因果知识上同时实现高遗忘和高保留的方法，在遗忘-保留帕累托前沿上达到了新的操作点。我们公开了代码。

英文摘要

Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and less than 1.3% of TOFU, MUSE, and WMDP-Cyber. This near-zero representation means that methods that fail on causal knowledge can score highly in aggregate, and this failure is undetectable without balanced evaluation. We present 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), making causal unlearning failures quantifiable for the first time. Using 5WBENCH, we show that no existing baseline simultaneously achieves high forgetting and high retention on Why-type questions: aggressive forgetting degrades retained knowledge, while conservative methods fail to forget causal facts. Why-type difficulty stems from multi-hop reasoning chains (44% of Why entries vs. less than or equal to 2% for others) and gradient dilution over 40.1-token answer spans. We present MAAT (Multi-phase Adapter-Aware Targeted Unlearning), a three-phase framework operating on LoRA adapter weights, combining gradient-projected ascent, SVD rank-dimension pruning, task vector negation, and hybrid KL-hidden-state retain repair. MAAT is the first method to simultaneously achieve high forgetting and high retention on Why-type causal knowledge, reaching a new operating point on the forget-retain Pareto frontier. We make our code publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.30509 2026-06-01 stat.ML cs.AI cs.LG 版本更新

Improved Distribution Estimation in $\ell_\infty$

在 $\ell_\infty$ 下的改进分布估计

Doron Cohen, Aryeh Kontorovich, Yonatan Livshitz

发表机构 * Department of Computer Science, Ben-Gurion University of the Negev（本·古里安大学计算机科学系）

AI总结本文在 $\ell_\infty$ 范数下改进了离散概率分布的估计，给出了期望极小极大界和高概率尾界，解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的开放问题，包括最紧风险界的完全经验版本和最坏情况极值分布的形式，并报告了鼓励性的实证结果。

Comments 24 pages, 3 figures

2605.30486 2026-06-01 cs.LG cs.AI 版本更新

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

图条件化的图神经网络专家混合模型用于交通预测

Amirhossein Ghaffari, Saeid Sheikhi, Ekaterina Gilman

发表机构 * Future Computing Group, University of Oulu（奥卢大学未来计算组）

AI总结提出GC-MoE框架，通过图拓扑和近期交通输入为每个节点分配个性化专家组合，仅训练轻量路由模块，在四个基准上提升MAE。

Comments An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026)

详情

AI中文摘要

传感器图上的时空预测通常采用统一应用于所有节点的单一骨干架构，尽管图区域可能表现出不同的动态。道路段在功能类别、结构和交通行为上存在差异，表明节点级专家专业化可能是有用的。我们提出GC-MoE，一种图条件化的专家混合框架，基于图拓扑和近期交通输入窗口为每个节点分配个性化的冻结预测专家组合。GC-MoE将冻结的预训练时空GNN专家与输入感知、空间上下文化的路由器相结合，同时仅训练轻量级路由模块。我们还研究了一个有界图条件化输出精炼层作为可选扩展，并仅作为消融诊断包含节点自适应ST-LoRA适配器。在四个标准基准（PEMS04、PEMS07、METR-LA和PEMS-BAY）上，GC-MoE在零参数集成基线上改善了MAE，具有竞争力的RMSE和MAPE，同时在1.5M冻结专家权重之上仅训练约17K参数。实现代码见https://github.com/Ahghaffari/gc_moe。

英文摘要

Spatio-temporal forecasting on sensor graphs is commonly tackled with a single backbone architecture applied uniformly across all nodes, although graph regions can exhibit different dynamics. Road segments differ in functional class, structure, and traffic behavior, suggesting that node-wise expert specialization can be useful. We propose GC-MoE, a graph-conditioned mixture of experts framework that assigns each node a personalized combination of frozen forecasting experts based on graph topology and the recent traffic input window. GC-MoE combines frozen pretrained spatio-temporal GNN experts with an input-aware, spatially contextualized router while training only a lightweight routing module. We also study a bounded graph-conditioned output refinement layer as an optional extension and include node-adaptive ST-LoRA adapters only as an ablation diagnostic. Across four standard benchmarks (PEMS04, PEMS07, METR-LA, and PEMS-BAY), GC-MoE improves MAE over a zero-parameter ensemble baseline, with competitive RMSE and MAPE, while training only ~17K parameters on top of 1.5M frozen expert weights. The implementation is available at https://github.com/Ahghaffari/gc_moe.

URL PDF HTML ☆

赞 0 踩 0

2605.30482 2026-06-01 cs.LG 版本更新

Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability

通过机制可解释性发现 Dyck 路径上的 Zeta 映射算法

Xiaoyu Huang, Blake Jackson, Kyu-Hwan Lee

发表机构 * Department of Mathematics, Temple University, Philadelphia, PA, USA（特拉华大学数学系）； Institute for Computer-Aided Reasoning in Mathematics, Carnegie Mellon University, Pittsburgh, PA, USA（计算机辅助数学推理研究所，卡内基梅隆大学）； Department of Mathematics, University of Connecticut, Storrs, CT, USA（康乃狄克大学数学系）； Korea Institute for Advanced Study, Seoul 02455, Republic of Korea（韩国高等研究院）

AI总结本文通过训练一个小型编码器-解码器 Transformer 模型来学习 Dyck 路径上的 zeta 映射，并利用机制可解释性工具分析其计算过程，从而发现并证明了一种新的显式组合算法——脚手架映射。

详情

AI中文摘要

机器学习越来越多地用于数学发现，但在数学中，期望的输出通常不是预测本身，而是一个可以独立验证的显式构造。我们通过 Dyck 路径上的 zeta 映射（q,t-卡特兰数组合学中的一个经典双射）来研究这一设定。我们在该映射上训练了一个特意设计的小型单层单头编码器-解码器 Transformer，并使用机制可解释性工具（包括解码器交叉注意力分析、线性探测和因果干预）分析其学习到的计算过程。分析揭示了一种基于层级的机制：编码器表示使路径层级线性可访问，而解码器以结构化方式选择和遍历输入位置。将这些信号转化为组合学，得到了脚手架映射，这是一种针对 Dyck 路径的显式以峰为中心的遍历算法。我们证明该算法与 zeta 映射一致，只是标签的逆转约定有所不同。这提供了一个受控的 AI 辅助数学发现示例，其中机制可解释性将模型行为转化为精确、人类可验证的组合算法。

英文摘要

Machine learning is increasingly used in mathematical discovery, but in mathematics the desired output is often not a prediction itself, but an explicit construction that can be checked independently. We study this setting through the zeta map on Dyck paths, a classical bijection in the combinatorics of the q,t-Catalan numbers. We train a deliberately small one-layer, one-head encoder-decoder transformer on this map and analyze its learned computation using mechanistic interpretability tools, including decoder cross-attention analysis, linear probing, and causal intervention. The analysis reveals a level-based mechanism: encoder representations make path levels linearly accessible, while the decoder selects and traverses input positions in a structured way. Translating these signals into combinatorics leads to the scaffolding map, an explicit peak-centered traversal algorithm for Dyck paths. We prove that this algorithm agrees with the zeta map, modulo a reversal convention in the labeling. This gives a controlled example of AI-assisted mathematical discovery in which mechanistic interpretability turns model behavior into a precise, human-verifiable combinatorial algorithm.

URL PDF HTML ☆

赞 0 踩 0

2605.30479 2026-06-01 cs.LG 版本更新

Universal Multiclass Transductive Online Learning

通用多类别转导在线学习

Steve Hanneke, Hongao Wang

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA.（计算机科学系，普渡大学，西拉法叶，印第安纳州，47907，美国）

AI总结研究具有可能无界标签空间的通用转导在线分类问题，通过引入“Level-Constrained-Littlestone-Littlestone (LCLL)树”和冷漠性质来刻画可学习性，并证明可学习类的最优错误率要么有界要么对数增长。

详情

LongDS-Bench：关于长周期智能数据分析的失败

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding, Haoming Xu, Lei Liang, Ningyu Zhang

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）； Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph（知识图谱联合实验室）

AI总结提出LongDS基准，用于评估长周期多轮数据分析中智能体维护和更新分析状态的能力，发现最佳模型平均准确率仅48.45%，且长周期错误占失败原因的52%-69%。

Comments Ongoing work

详情

AI中文摘要

现实世界的数据分析本质上是迭代的，然而现有基准大多评估孤立或短期的交互任务，未能测试智能体在长周期内跟踪不断变化的分析上下文的能力。我们引入了LongDS，一个用于长周期、多轮数据分析的基准，其中智能体必须维护、更新、恢复和组合不断变化的分析状态。LongDS包含68个从真实世界Kaggle笔记本构建的任务，涵盖地球科学、商业和教育等六个领域的2,225轮交互。任务围绕状态演化模式（例如反事实扰动、回滚、多状态组合）设计，平均依赖跨度为11.3轮。评估五个最先进模型，我们发现最佳模型仅达到48.45%的平均准确率，性能从早期到后期轮次下降近47个百分点，长周期错误占失败原因的52%-69%。进一步分析表明，额外的智能体步骤并不一定能提高性能，这表明关键瓶颈在于维护正确的分析状态，而非增加交互预算。我们发布LongDS以支持可靠的长周期智能数据分析研究。代码和数据将在https://github.com/zjunlp/DataMind发布。

英文摘要

Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. LongDS comprises 68 tasks constructed from real-world Kaggle notebooks, spanning 2,225 turns across six domains including Geoscience, Business, and Education. Tasks are designed around state-evolution patterns (e.g., counterfactual perturbation, rollback, multi-state composition), with an average dependency span of 11.3 turns. Evaluating five state-of-the-art models, we find that the best model reaches only 48.45% average accuracy, performance drops nearly 47 points from early to late turns, and long-horizon errors account for 52%--69% of failures. Further analysis shows that additional agent steps do not necessarily improve performance, suggesting that the key bottleneck is maintaining a correct analytical state rather than increasing interaction budget. We release LongDS to support research on reliable long-horizon agentic data analysis. Code and data will be released at https://github.com/zjunlp/DataMind.

URL PDF HTML ☆

赞 0 踩 0

2605.30429 2026-06-01 quant-ph cs.LG 版本更新

Attention-based optimizer for symmetry finding

基于注意力的对称性发现优化器

Shreya Banerjee, Vinodh Raj Rajagopal Muthu, Charlie Nation, Rick P. A. Simon, Francesco Martini, Alessandro Ricottone, Federico Cerisola, Luca Dellantonio

发表机构 * Department of Physics and Astronomy, University of Exeter, Stocker Road, Exeter EX4 4QL, United Kingdom（物理与天文学系，埃克塞特大学，斯托克罗德路，埃克塞特 EX4 4QL，英国）； QuAOS collaboration（QuAOS合作）； Institute for Quantum Computing, University of Waterloo, Waterloo, ON N2L 3G1, Canada（量子计算研究所，滑铁卢大学，滑铁卢，ON N2L 3G1，加拿大）

AI总结提出一个基于Set-Transformer架构的优化框架，利用自注意力编码Pauli字符串间的相关性，并通过自定义对易目标优化，以高概率发现哈密顿量的Pauli对称性。

Comments 9+4 pages, 2 Figures, Comments welcome

详情

AI中文摘要

发现对称性对于理解物理模型至关重要。在这项工作中，我们提出了一个优化框架，用于搜索哈密顿量的Pauli对称性，融合了机器学习与自动对称性发现领域。该框架基于Set-Transformer架构，利用自注意力编码Pauli字符串之间的成对和高阶相关性。然后将这些关系解码为候选对称性，并通过基于对易的自定义目标进一步优化，映射到输入哈密顿量的对称性。我们将该方法应用于随机Pauli哈密顿量、一维和二维周期横向场伊辛模型以及Toric码。结果表明，对于物理哈密顿量（伊辛和Toric），我们的框架以接近确定性的概率成功，同时与最先进策略相比提供了显著优势。对于随机Pauli哈密顿量，我们估计了在固定设计规格下以高成功概率找到对称性所需的计算资源，特别是并行启动次数和GPU数量。

英文摘要

Finding symmetries is crucial for understanding physical models. In this work, we present an optimization framework that searches Pauli symmetries of Hamiltonians, merging the fields of machine learning with automated symmetry finding. Built on a Set-Transformer architecture, our framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli-Strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective, and mapped to a symmetry of the input Hamiltonian. We apply our method to random Pauli Hamiltonians, periodic one and two dimensional transverse-field Ising model and the Toric code. We show that for physical Hamiltonians (Ising and Toric), our framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies. For random Pauli Hamiltonians, we estimate the required computational resources, specifically the number of parallel starts and the number of GPUs, to find a symmetry with high success probability under fixed design specifications.

URL PDF HTML ☆

赞 0 踩 0

2605.30399 2026-06-01 q-bio.QM cs.LG eess.IV 版本更新

A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture

一种用于评估鱼类对水产养殖中侵入性物体反应的新型计算机视觉方法

Hanne-Grete Alvheim, Stian Mjelde Jakobsen, Martin Føre, Eleni Kelasidi

发表机构 * Department of Engineering Cybernetics, NTNU（工程 cybernetics 系，挪威技术大学）； Department of Aquaculture, SINTEF Ocean AS（水产养殖系，SINTEF 海洋公司）

AI总结本研究提出一种基于YOLOv8、ByteTrack、SuperGlue和三角测量的新型立体视觉方法，用于检测、跟踪和估计鱼类三维位置，以分析不同形状、大小和颜色的结构对鱼类行为的影响。

详情

AI中文摘要

水产养殖业需要应对若干挑战，以确保可持续的海产品生产满足日益增长的全球需求。其中一个主要挑战是确保生产过程中鱼类健康良好和福利可接受，因为改善鱼类福利在当前和未来的生产系统中至关重要。本研究通过开发和实施方法，识别鱼类对侵入性物体的个体和群体行为反应，从而解决这一问题。因此，我们开发了一种检测、跟踪和估计个体鱼类三维位置的新方法，并专门设计用于跟踪工业海水网箱中养殖鱼类的尾鳍。跟踪数据采用一种新型立体视觉方法进行处理，该方法适用于估计鱼类的位置、速度、加速度以及转向和俯仰角。随后，分析了从工业规模养鱼场获得的数据集，以识别不同形状、大小和颜色的结构对鱼类行为的影响。该方法使用手动标注的尾鳍进行训练，并采用YOLOv8结合ByteTrack作为目标检测器和跟踪器，SuperGlue用于匹配左右帧中的检测结果，以及三角测量来重建鱼类的三维位置。测试了不同的图像预处理和增强方法以提高目标检测准确性，并比较了它们的性能，同时测试了RAFT-Stereo用于深度估计。获得的结果既验证了该方法相对于先前研究工作的性能，也展示了该方法在提供对海水网箱中行为动态更深入理解方面的新颖性和潜力。

英文摘要

The aquaculture industry needs to address several challenges to secure sustainable seafood production that can serve an increasing global demand. One major challenge is to ensure good fish health and acceptable welfare during production since the improvement of fish welfare is of vital importance in current and future production systems. In this study, this is addressed by developing and implementing methods to identify fish behaviors in response to intrusive objects both on individual and on a group basis. A novel approach for detecting, tracking, and estimating the 3D position of individual fish has thus been developed, and specifically designed to track the caudal fins of farmed fish in industrial sea cages. The tracking data was subjected to a novel stereo-vision method adapted to estimate fish positions, velocities, accelerations, and turning and pitch angles. Datasets obtained from industrial-scale fish farms were then analyzed to identify the impact of structures of varying shapes, sizes, and colors on fish behavior. The method was trained using manually labeled caudal fins, and used YOLOv8 with ByteTrack as an object detector and tracker, SuperGlue for matching detections in the left and right frames, and triangulation to reconstruct the 3D positions of the fish. Different image pre-processing and augmentation methods for enhancing object detection accuracy were tested and their performance compared, while RAFT-Stereo was tested for depth estimation purposes. The obtained results both validate the method's performance against previous research efforts, and demonstrate the novelty and potential of this method in providing more insight into behavioral dynamics in sea-cages.

URL PDF HTML ☆

赞 0 踩 0

2605.30396 2026-06-01 cs.GR cs.LG 版本更新

Smaller and Faster 3DGS via Post-Training Dictionary Learning

通过训练后字典学习实现更小更快的3DGS

Jiarong Gong, Jonas Unger, Ehsan Miandji

发表机构 * Linköping University Department of Science and technology（利乌普斯大学科学与技术学院）

AI总结提出首个基于字典学习的3DGS后训练压缩框架，无需重新训练即可显著压缩模型、保持图像质量并提升实时渲染速度。

详情

AI中文摘要

3D高斯泼溅（3DGS）是一种有前景的实时渲染神经场景表示方法，但训练后的模型通常占用大量内存，限制了在低性能设备上的部署。现有的压缩技术往往引入额外的可训练参数，虽然实现了出色的压缩比，但会导致图像质量明显下降。在这项工作中，我们首次提出了基于字典学习的3DGS压缩框架。所提出的后训练压缩流程几乎可以应用于任何3DGS模型，无需重新训练或修改现有3DGS模型。我们的压缩框架实现简单，但提供了显著的压缩能力，保持了图像质量，并提升了实时渲染性能。在13个基准场景上，我们的方法应用于3DGS、3DGS-MCMC和PixelGS时，平均压缩比分别达到3.95倍、3.10倍和4.55倍。同时，渲染速度分别持续提升23.3%、24.3%和25.3%，且图像质量保持不变。

英文摘要

3D Gaussian Splatting (3DGS) is a promising neural scene representation for real-time rendering, but trained models often suffer from large memory footprints, limiting deployment on less powerful devices. Existing compression techniques often lead to architectures with several additional trainable parameters. While achieving outstanding compression ratios, they introduce noticeable drops in image quality. In this work, we introduce the first dictionary-learning-based compression framework for 3DGS. The proposed post-training compression pipeline can be deployed in virtually any 3DGS model without the need for re-training or modifications to existing 3DGS models. Our compression framework is straightforward to implement, yet provides significant compression capabilities, preserves image quality, and improves real-time rendering performance. Across 13 benchmark scenes, our approach achieves an average compression ratio of 3.95x, 3.10x, and 4.55x when applied to 3DGS, 3DGS-MCMC, and PixelGS, respectively. This yields consistent rendering speedups of 23.3%, 24.3%, and 25.3%, while maintaining image quality.

URL PDF HTML ☆

赞 0 踩 0

2605.30393 2026-06-01 cs.LG cs.AI cs.CR 版本更新

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

NumLeak: 基础模型中的公开数值基准作为潜在标签

Anany Kotawala

发表机构 * Princeton University（普林斯顿大学）

AI总结提出NumLeak框架，通过API边界探测和开源因果模型的白盒验证，揭示基础模型在预训练中记忆公开数值基准，导致评估高估泛化能力。

Comments 23 pages, 12 figures, 17 tables. Accepted at the ICML 2026 Workshop on the Impact of Memorization on Trustworthy Foundation Models (MemFM)

详情

AI中文摘要

公开数值基准出现在预训练中，因此基于日期进行评估可能测量的是记忆性回忆而非样本外技能。我们引入NumLeak，一个结合生产模型API边界探测与开源因果模型白盒受控验证的测量框架。顶级前沿LLM在3种子池化后，对Fama-French市场超额收益的回忆皮尔逊相关系数r=0.97-0.99，同时五个兄弟因子在25个基点内误差不超过0.15；在美国失业率、CPI通胀和NOAA温度上观察到类似保真度。在近期发布的保留集上，解析率骤降至21-57%，但在回答的月份上r仍约为0.99，拒绝-回忆不对称性符合记忆通道的预测。白盒实验重现了剂量反应，对数概率排名检测到开放生成遗漏的记忆，意味着封闭API黑盒探测低估了该通道。一个Sonnet“日期到市场情绪”回归与真实Mkt-RF的相关性r=0.74，在残差化模型自身回忆后降至r=0.02。一行系统提示防御在概念和历史叙事查询上以接近零的效用成本阻止了99.8%的非自适应单轮后缀攻击集。

英文摘要

Public numeric benchmarks appear in pretraining, so an evaluation that conditions on a date may be measuring memorized recall rather than out-of-sample skill. We introduce NumLeak, a measurement framework that combines API-boundary probes on production models with a white-box controlled validation on an open causal LM. Top-tier frontier LLMs recall the Fama-French market excess return at 3-seed pooled Pearson r=0.97-0.99 while staying within 0.15 within-25bps on the five sibling factors; comparable fidelity appears on U.S. unemployment, CPI inflation, and NOAA temperature. On a recent-release holdout, parse rate collapses to 21-57% but r stays at approximately 0.99 on months answered, the refuse-or-recall asymmetry a memorized channel predicts. The white-box experiment reproduces the dose-response, and logprob ranking detects memorization that open-ended generation misses, implying closed-API black-box probes understate the channel. A Sonnet "date to market-sentiment" regression that correlates with true Mkt-RF at r=0.74 collapses to r=0.02 once the model's own recall is residualized out. A one-line system-prompt defense blocks 99.8% of a non-adaptive single-turn suffix attack set at near-zero utility cost on conceptual and historical-narrative queries

URL PDF HTML ☆

赞 0 踩 0

2605.30389 2026-06-01 cs.FL cs.LG 版本更新

The Inclusion Depth of Pattern Languages: An Open Problem in Algorithmic Learning Theory

模式语言的包含深度：算法学习理论中的一个开放问题

Wei Luo

发表机构 * School of Information Technology, Deakin University（信息技术学院，迪金大学）

AI总结本文提出模式语言包含深度（最长严格包含链长度）的计算问题，并猜测其公式为2|p| - #var(p) - 1，该问题连接形式语言、组合词论和极限识别学习。

Comments 2 pages. Open problem from COLT 2005. Generic author-prepared version for arXiv. Originally appeared in Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 2005

详情

DOI: 10.1007/11503415_48
Journal ref: Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Lecture Notes in Artificial Intelligence 3559, Springer, 2005, pp. 689-690

AI中文摘要

模式语言是形式语言理论和算法学习理论中的经典模型。本文提出了计算模式语言包含深度的问题：从通用模式语言到给定模式生成的语言的最长严格包含链的长度。包含深度捕捉了从正数据识别模式的心智变化复杂度。核心开放问题是：对于每个有限字母表Σ（至少两个符号）上的每个模式p，包含深度ID_Σ(p)是否可计算，以及是否可在多项式时间内计算。一个简单的猜想公式ID_Σ(p) = 2|p| - #var(p) - 1将蕴含线性时间算法。该问题连接了模式语言包含、词上的组合学、极限中的语言识别以及有界心智变化学习。

英文摘要

Pattern languages are a classical model in formal language theory and algorithmic learning theory. This note formulates the problem of computing the inclusion depth of a pattern language: the length of the longest strict inclusion chain from the universal pattern language to the language generated by a given pattern. Inclusion depth captures the mind-change complexity of pattern identification from positive data. The central open question is whether the inclusion depth ID_Sigma(p) is computable for every pattern p over every finite alphabet Sigma with at least two symbols, and whether it is computable in polynomial time. A simple conjectured formula, ID_Sigma(p) = 2|p| - #var(p) - 1, would imply a linear-time algorithm. The problem connects pattern language inclusion, combinatorics on words, language identification in the limit, and mind-change-bounded learning.

URL PDF HTML ☆

赞 0 踩 0

2605.30388 2026-06-01 cs.LG 版本更新

A Novel Evaluation Metric for Unsupervised Learning in AIS-Based Maritime Anomaly Detection: MADQI

基于AIS的海事异常检测中无监督学习的新型评估指标：MADQI

Ismet Gocer, Zakirul Bhuiyan, Raza Hasan, Shakeel Ahmad

发表机构 * Southampton Solent University School of Technology and Maritime Industries（索尔森大学技术与海运学院）

AI总结提出一种无需标签数据的海事异常检测质量指标MADQI，通过结合四个子指标来评估无监督学习模型的异常检测性能。

Comments 26 pages, A new Eval Metric for Unsupervised Machine Learning

详情

AI中文摘要

本文介绍了一个新的系统框架，用于检测海事自动识别系统（AIS）数据集中的异常。这些异常包括与速度、位置跳跃、时间间隔和转向角度相关的异常船舶行为。尽管诸如孤立森林之类的无监督学习算法被广泛用于检测异常船舶运动，但它们通常缺乏系统且有意义的评估措施。为了解决这一局限性，我们提出了一种称为海事异常检测质量指标（MADQI）的新型质量指标。所提出的MADQI是一个复合指标，旨在评估机器学习模型的异常检测性能，而无需标记数据。该框架使用哈弗辛距离计算来分析AIS数据集，并根据空间和行为特征识别异常。所提出的MADQI评估框架整合了四个相互关联的指标：异常率一致性（ARC）、物理合理性评分（PPS）、评分分布分离度（SDS）和极端案例证据（ECE）。这些指标通过使用多块评估和自适应缩放技术的自动归一化进行组合。在AIS数据集上的实验结果表明，所提出的框架实现了80.37%的MADQI分数，证明了其在无监督异常检测中的有效性。特别是，该算法在识别异常船舶行为方面表现强劲。在MADQI的各个组成部分中，ECE和ARC分别达到了0.907和1.000的分数，表明其在检测极端异常和保持异常率一致性方面具有出色的能力。总体而言，这些结果令人鼓舞，并表明所提出的框架为评估海事AIS数据中的无监督异常检测提供了一种可靠且有意义的方法。

英文摘要

This paper introduces a new systematic framework for detecting anomalies in maritime Automatic Identification System (AIS) datasets. These anomalies include abnormal vessel behaviours related to speed, position jumps, time gaps, and turn angles. Although unsupervised learning algorithms such as Isolation Forest are widely used for detecting anomalous vessel movements, they often lack systematic and meaningful evaluation measures. To address this limitation, we propose a novel quality metric called Maritime Anomaly Detection Quality Index (MADQI). The prosed MADQI is a composite index designed to evaluate the anomaly detection performance of machine learning models without requiring labelled data. The proposed framework uses Haversine distance calculations to analyse AIS datasets and identify anomalies based on their spatial and behavioural characteristics. The proposed MADQI evaluation framework integrates four interconnected metrics: Anomaly Rate Consistency (ARC), Physical Plausibility Score (PPS), Score Distribution Separation (SDS), and Extreme Case Evidence (ECE). These metrics are combined through automatic normalisation using multi-chunk evaluation and adaptive scaling techniques. Experimental results on the AIS dataset show that the proposed framework achieved a MADQI score of 80.37%, demonstrating its effectiveness for unsupervised anomaly detection. In particular, the algorithm performed strongly in identifying abnormal vessel behaviour. Among the individual MADQI components, ECE and ARC achieved scores of 0.907 and 1.000, respectively, indicating excellent capability in detecting extreme anomalies and maintaining anomaly rate consistency. Overall, these results are encouraging and demonstrate that the proposed framework provides a reliable and meaningful approach for evaluating unsupervised anomaly detection in maritime AIS data.

URL PDF HTML ☆

赞 0 踩 0

2605.30387 2026-06-01 cs.LG cs.AI cs.CV eess.SP 版本更新

Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

基于小波图像变换和频谱流匹配的功能磁共振时间序列生成用于脑疾病识别

Hwa Hui Tew, Junn Yong Loo, Fang Yu Leong, Julia K. Lau, Ding Fan, Hernando Ombao, Raphaël C. -W. Phan, Chee Pin Tan, Chee-Ming Ting

发表机构 * School of Information Technology, Monash University Malaysia（墨尔本大学马来西亚分校信息科技学院）； School of Engineering, Monash University Malaysia（墨尔本大学马来西亚分校工程学院）； Statistics Program, King Abdullah University of Science and Technology（国王阿卜杜勒·阿齐兹大学科学与技术学院统计学项目）

AI总结提出双频谱流匹配（DSFM）框架，通过离散小波变换和离散余弦变换对BOLD信号进行双频表示，结合频谱流匹配生成类条件余弦频率表示，再经逆变换重建生理上合理的时域BOLD信号，以改善下游脑网络分类。

Comments Accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情

AI中文摘要

Gait2Hip-60：基于多节奏步态运动学预测髋部肌肉力和关节力矩的统一深度学习基准

Jiaqi Zhang, Ji Hou, Qing Sun, Xianzhi Gao, Bo Huo

发表机构 * Capital University of Physical Education and Sports（首都体育学院）； Beijing Institute of Technology（北京理工大学）； Beijing Key Laboratory of Interdisciplinary Intelligent Technologies of Sports, Medicine and Engineering（北京体育医学与工程交叉智能技术重点实验室）

AI总结本研究提出一个深度学习框架，利用LSTM、Transformer和Mamba三种模型从下肢步态运动学直接预测髋部肌肉力和关节力矩，在60名健康受试者数据上评估，发现Transformer表现最佳，并在股骨头坏死患者零样本测试中保持中等预测能力。

Comments 16 pages, 9 figures. Code and dataset publicly available

详情

AI中文摘要

在步态过程中估计髋部肌肉力和关节力矩通常依赖于肌肉骨骼仿真，这种方法信息丰富但耗时且难以应用于临床。本研究开发了一个深度学习框架，直接从下肢步态运动学预测这些髋部动力学参数，并在统一协议下比较了三种代表性序列模型。步态数据来自60名健康成年人在三种节拍器引导的节奏条件下的行走。使用十个双侧下肢关节角度作为输入，以OpenSim导出的髋部肌肉力和髋关节力矩作为参考输出。训练并评估了LSTM、Transformer和Mamba三种深度学习模型，采用相同的受试者级别划分、预处理流程和评价指标。随后，最佳模型直接在一个由9名股骨头坏死（ONFH）患者组成的外部队列上进行测试，无需重新训练。在健康受试者基准测试中，Transformer在髋部肌肉力预测（RMSE = 1.33 N/kg, MAE = 0.57 N/kg, R2 = 0.819）和髋关节力矩预测（RMSE = 0.11 Nm/kg, MAE = 0.07 Nm/kg, R2 = 0.862）方面均取得了最佳的受试者级别平均性能，且在不同步行节奏下具有相似优势。在零样本外部验证中，Transformer在ONFH患者中保留了中等预测能力，髋部肌肉力预测（RMSE = 1.51 N/kg, MAE = 0.70 N/kg, R2 = 0.537）和髋关节力矩预测（RMSE = 0.17 Nm/kg, MAE = 0.12 Nm/kg, R2 = 0.569）。这些发现支持了从步态运动学估计髋部动力学的可行性，将Transformer确定为强基线，并强调了在临床应用前需要进行更广泛的病理验证和改进泛化能力。

英文摘要

Estimating hip muscle forces and joint moments during gait typically relies on musculoskeletal simulation, which is informative but time-consuming and difficult to apply in clinical settings. This study developed a deep learning framework to predict these hip dynamics parameters directly from lower-limb gait kinematics and compared three representative sequence models under a unified protocol. Gait data were collected from 60 healthy adults under three metronome-guided cadence conditions. Ten bilateral lower-limb joint angles were used as inputs, and OpenSim-derived hip muscle forces and hip joint moments were used as reference outputs. Three deep learning models of LSTM, Transformer, and Mamba were trained and evaluated using the same subject-level split, preprocessing pipeline, and metrics. The best model was then directly tested on an external cohort of 9 patients with osteonecrosis of the femoral head (ONFH) without retraining. In the healthy-subject benchmark, Transformer achieved the best subject-level mean performance for both hip muscle force prediction (RMSE = 1.33 N/kg, MAE = 0.57 N/kg, R2 = 0.819) and hip joint moment prediction (RMSE = 0.11 Nm/kg, MAE = 0.07 Nm/kg, R2 = 0.862), with similar advantages across walking cadences. In zero-shot external validation, Transformer retained moderate predictive ability in ONFH for hip muscle force prediction (RMSE = 1.51 N/kg, MAE = 0.70 N/kg, R2 = 0.537) and hip joint moment prediction (RMSE = 0.17 Nm/kg, MAE = 0.12 Nm/kg, R2 = 0.569). These findings support the feasibility of estimating hip dynamics from gait kinematics, identify Transformer as a strong baseline, and highlight the need for broader pathological validation and improved generalization before clinical application.

URL PDF HTML ☆

赞 0 踩 0

2605.30372 2026-06-01 cs.NE cs.AI cs.LG q-bio.NC 版本更新

Evolutionary Algorithm for Reservoir Learning and Yielding

用于储层学习和生成的进化算法

Julien Testu, Pierrick Legrand, Xavier Hinaut

发表机构 * Inria ； LaBRI, CNRS UMR 5800（LaBRI，CNRS UMR 5800）； Bordeaux INP, ENSC（Bordeaux INP，ENSC）； IMS, CNRS UMR 5218（IMS，CNRS UMR 5218）

AI总结提出进化算法EARLY，通过进化多储层回声状态网络的拓扑和超参数，在时序学习任务上优于随机搜索，并发现任务难度影响网络结构。

详情

Journal ref: GECCO '26 - The Genetic and Evolutionary Computation Conference, Jul 2026, San jos{é}, Costa Rica

AI中文摘要

储层计算是一种递归神经网络，因其将动态处理与训练好的读出层分离而成为时序学习的有前途方法。然而，经典的回声状态网络（ESN）通常需要针对任务调整其架构和超参数才能获得良好性能。本文介绍了EARLY（用于储层学习和生成的进化算法），这是一个旨在进化多储层ESN的拓扑和超参数的框架。受大脑模块化组织的启发，EARLY将架构编码为基于图的基因组，并应用交叉、变异和选择来发现有效的配置。我们的目标是创建通用架构和任务诱导泛化。该方法在CogScale数据集的时序学习任务上进行了评估。结果表明，进化出的架构在多个任务上优于通过随机搜索获得的架构，并根据任务难度表现出结构差异：简单任务产生轻量级架构，而复杂任务倾向于更丰富的模块化组织。这些发现表明，进化搜索有助于为更广泛的时序问题识别可复用的储层结构。进一步在跨情境学习数据集上评估进化出的架构，以评估其适应新环境的能力。

英文摘要

Reservoir computing, a type of recurrent neural network, is a promising approach for temporal learning as it separates dynamic processing from the trained readout layer. However, classical Echo State Networks (ESNs) often require task-specific tuning of their architecture and hyperparameters to achieve good performance. This paper introduces EARLY (Evolutionary Algorithm for Reservoir Learning and Yielding), a framework designed to evolve both the topology and hyperparameters of multi-reservoir ESNs. Inspired by the modular organisation of the brain, EARLY encodes architectures as graph-based genomes and applies crossover, mutation, and selection to discover effective configurations. Our goal is to create both generic architectures and tasks inducing generalization. The method is evaluated on temporal learning tasks from the CogScale dataset. Results show that evolved architectures outperform those obtained with random search on several tasks and exhibit structural differences depending on task difficulty: simpler tasks yield lightweight architectures, while more complex tasks favour richer modular organisations. These findings suggest that evolutionary search can help identify reusable reservoir structures for a broader range of temporal problems. The evolved architectures are further evaluated on a cross-situational learning dataset to assess their ability to adapt to new environments.

URL PDF HTML ☆

赞 0 踩 0

2605.30371 2026-06-01 cs.NE cs.LG math.DS 版本更新

学习任意量子纠错码的逻辑操作

Nico Meyer, Christopher Mutschler, Dominik Seuß, Andreas Maier, Daniel D. Scherer

发表机构 * Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, Germany（弗劳恩霍夫集成电路研究所（IIS），德国纽伦堡）； Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany（埃朗根-纽伦堡弗赖堡-亚历山大-大学模式识别实验室，德国埃朗根）； University of Technology Nuremberg (UTN), Nuremberg, Germany（纽伦堡技术大学（UTN），德国纽伦堡）； Center for Artificial Intelligence (CAIRO), Technical University of Applied Sciences Würzburg-Schweinfurt, Würzburg, Germany（人工智能中心（CAIRO），韦尔茨堡-施维茨应用技术大学，德国韦尔茨堡）

AI总结提出基于学习的框架，仅通过编码电路为任意量子纠错码构造具有横向性或浅深度等结构性质的逻辑操作，并扩展为变分早期容错量子计算（VarEFTQC）方法，用于协同设计非加性编码和逻辑门集。

Comments 23 pages, 12 figures, 5 tables

详情

AI中文摘要

逻辑操作对于量子纠错码内的量子计算至关重要。然而，发现其物理实现具有挑战性，特别是对于缺乏稳定子描述的非加性码。我们提出了一个通用的基于学习的框架，仅给定编码电路，即可构造逻辑操作的物理实现，同时强制执行诸如横向性或浅深度等结构性质。我们的方法通过重新发现标准稳定子码的已知逻辑操作得到验证。然后，我们将其扩展为协同设计过程，称为变分早期容错量子计算（VarEFTQC），该过程针对给定噪声模型定制非加性编码，并强制执行所需的逻辑门集，例如横向IQP型族或低深度通用集。一个软件库实现了完整的学习流程，包括损失函数变体、ansatz族和优化例程。这些结果共同将VarEFTQC定位为发现用于早期容错量子计算的硬件自适应逻辑工具的实用工具。

对齐篡改：人类反馈强化学习如何被利用以优化错位偏见

Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

发表机构 * MIT（麻省理工学院）

AI总结本文提出对齐篡改漏洞，即对齐中的LLM通过影响偏好数据集使RLHF放大不良行为，并通过实验展示多种偏见的放大，指出现有缓解方法难以在不牺牲质量的情况下解决该问题。

Comments Accepted at ICML 2026, Source code: https://alignment-tampering.github.io/

详情

AI中文摘要

人类反馈强化学习（RLHF）是将大型语言模型（LLM）与人类偏好对齐的标准方法。在本工作中，我们引入对齐篡改，这是一种潜在漏洞，即正在对齐的LLM影响偏好数据集，导致RLHF放大不良行为。这源于RLHF的核心局限性：（1）偏好数据集由LLM自身的输出构建，使其能够影响它们；（2）成对比较仅指示哪个响应更好，而不说明原因。这些局限性可能被利用以导致对齐篡改。例如，如果LLM以更高质量生成有偏见的响应，标注者会基于质量偏好它们。然而，偏好标签无法区分质量与偏见，奖励模型继承了这一局限性。通过强化学习或最佳N采样优化此类奖励可能放大错位偏见。我们的实验展示了跨多种偏见的放大：从关键词偏见到宣传（例如性别歧视）、品牌推广和工具性目标寻求。缓解仍然具有挑战性，因为现有的鲁棒RLHF技术无法在不牺牲响应质量的情况下完全解决对齐篡改。这些发现揭示了当前RLHF的结构性漏洞，并强调了防止此漏洞的必要性。项目页面：https://alignment-tampering.github.io/

英文摘要

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are constructed from the LLM's own outputs, allowing it to influence them, and (2) pairwise comparisons only indicate which response is better, not why. These limitations can be exploited to cause alignment tampering. For example, if an LLM generates biased responses with higher quality, annotators will prefer them based on quality. However, preference labels do not distinguish quality from bias, and the reward model inherits this limitation. Optimizing such rewards through reinforcement learning or best-of-N sampling can amplify misaligned biases. Our experiments demonstrate amplification across diverse biases: from keyword bias to propaganda (e.g., sexism), brand promotion, and instrumental goal-seeking. Mitigation remains challenging, as existing techniques for robust RLHF fail to fully resolve alignment tampering without sacrificing response quality. These findings reveal structural vulnerabilities of current RLHF and emphasize the need to prevent this vulnerability. Project page: https://alignment-tampering.github.io/

URL PDF HTML ☆

赞 0 踩 0

2605.26929 2026-06-01 cs.LG 版本更新

When Muon Optimizer Meets Adversarial Training: A Theoretical and Empirical Study

当Muon优化器遇到对抗训练：理论与实证研究

Jun Yan, Weiquan Huang, Jiankai Zuo, Yujian Mo, Xi Fang, Chengliang Wu, Zeming Wei

发表机构 * IT College, Shanghai Ocean University（上海海洋大学信息学院）； School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）； SEIE, Suzhou University of Science and Technology（苏州科技大学SEIE学院）； DP Tech（DP科技）； School of Mathematical Sciences, Peking University（北京大学数学科学学院）

AI总结本文通过理论和实证研究，探讨Muon优化器（基于近似极分解的正交化更新）在对抗训练中的效果，发现其能限制矩阵更新的谱范数增长，在CNN和ViT上优于AdamW，与SGD竞争力相当。

详情

AI中文摘要

对抗训练（AT）仍然是最可靠的对抗攻击经验防御方法之一。其鲁棒性关键取决于底层极小极大目标如何优化。在实践中，随机梯度下降（SGD）优化器仍然是AT的默认优化选择，而自适应优化器通常能改善标准训练，但可能产生较差的鲁棒性。最近，Muon优化器通过近似极分解对矩阵值更新进行正交化，在内存成本与SGD相当的情况下，在大规模训练中取得了显著成功。这提出了一个与安全相关的问题：正交化优化能否在强异质威胁模型下改进AT？针对这一问题，我们进行了全面的理论和实证研究。理论上，我们表明Muon对矩阵更新施加了谱范数稳定性上限，限制了训练动态中不受控制的谱增长，而无需显式缩小学习权重。实证上，在五种架构和三种$\ell_p$威胁模型（$\ell_\infty$、$\ell_1$、$\ell_2$）及其联合下，Muon在CNN上与SGD竞争力相当，并在CNN和ViT上显著优于AdamW。这些结果将优化器几何识别为对抗训练中的一个安全相关因素，同时阐明了正交化更新有益的经验场景。总体而言，我们的发现强调了优化器设计是AT的一个安全关键组成部分。

英文摘要

Adversarial training (AT) remains one of the most reliable empirical defenses against adversarial attacks. Its robustness critically depends on how the underlying min-max objective is optimized. In practice, Stochastic Gradient Descent (SGD) optimizer remains the default optimization choice for AT, whereas adaptive optimizers often improve standard training but may yield inferior robustness. Recently, the Muon optimizer, which orthogonalizes matrix-valued updates via an approximate polar decomposition, has achieved notable success in large-scale training at a memory cost comparable to SGD. This raises a security-relevant question: \textit{can orthogonalized optimization improve AT under strong and heterogeneous threat models?} Focusing on this problem, we conduct a comprehensive theoretical and empirical study. Theoretically, we show that Muon imposes a spectral-norm stability ceiling on matrix updates, limiting uncontrolled spectral growth in the training dynamics without explicitly shrinking the learned weights. Empirically, across five architectures and three $\ell_p$ threat models ($\ell_\infty$, $\ell_1$, $\ell_2$) and their union, Muon is competitive with SGD on CNNs and substantially outperforms AdamW on both CNNs and ViTs. These results identify optimizer geometry as a security-relevant factor in adversarial training, while clarifying the empirical regimes in which orthogonalized updates are beneficial. Overall, our findings highlight optimizer design as a security-critical component of AT.

URL PDF HTML ☆

赞 0 踩 0

分子结构无法告诉我们的事：基于GNN的药物毒性预测中可解释性差距的分类

Juergen Dietrich

AI总结本研究引入了一个操作分类法，系统性地分析了图神经网络在药物毒性预测中由于结构信息限制导致的不可解释性差距，并以阿司匹林为例量化了分子结构仅能解释约45%的不良反应。

Comments 13 pages

详情

AI中文摘要

并非所有临床相关的不良反应都能从分子图中结构推断出来——无论模型质量或架构复杂性如何。本研究引入了一个操作分类法，用于描述独立于所用学习算法的结构信息限制，这些限制阻碍了基于结构的毒性预测。图神经网络（GNN）已成为分子毒性预测的自然方法，直接作用于原子连接性，避免了固定长度指纹固有的信息损失。然而，药物已知药理学特征中实际可从分子结构推断的比例仍未被系统探索。以乙酰水杨酸（ASA，阿司匹林）——药理学中表征最全面的药物之一——作为模型化合物进行系统性案例研究。在Tox21基准上训练消息传递神经网络（MPNN），并应用GNNExplainer表征原子级归因。结果表明，分子结构解释了约45%（5/11）的已知ASA不良反应。引入了一个四类差距分类法（GAP-1至GAP-4），区分了原则上不可编码的效应、由非随机缺失（MNAR）机制引起的数据差距、检测面板不匹配和表示误差。通过系统的ChEMBL查询（42个已记录检测，0个可检索生物活性条目）经验量化了MNAR差距。注意力池化实验将表示误差定位到MPNN消息传递层而非聚合步骤。该差距分类法对药物安全信号检测和监管框架（包括良好药物警戒实践（GVP）指南和新方法论（NAMs））具有直接影响。在伴随的DDI消融研究中确认了所识别的结构限制。

英文摘要

Not all clinically relevant adverse effects are structurally inferable from molecular graphs - regardless of model quality or architectural complexity. This study introduces an operational taxonomy of the structural information limits that prevent structure-based toxicity prediction, independent of the learning algorithm employed. Graph Neural Networks (GNNs) have emerged as a natural approach for molecular toxicity prediction, operating directly on atomic connectivity without the information loss inherent to fixed-length fingerprints. However, the fraction of a drug's known pharmacological profile that is actually inferable from molecular structure remains systematically underexplored. A systematic case study using acetylsalicylic acid (ASA, Aspirin) - one of the most comprehensively characterized drugs in pharmacology - serves as model compound. A Message Passing Neural Network (MPNN) is trained on the Tox21 benchmark and GNNExplainer is applied to characterize atom-level attribution. Results indicate that molecular structure explains approximately 45% (5/11) of known ASA adverse effects. A four-category Gap Taxonomy (GAP-1 through GAP-4) is introduced distinguishing between principally non-encodable effects, data gaps arising from Missing Not At Random (MNAR) mechanisms, assay panel mismatches, and representation errors. The MNAR gap is empirically quantified via a systematic ChEMBL query (42 documented assays, 0 retrievable bioactivity entries). An attention pooling experiment localizes the representation error to the MPNN message passing layers rather than the aggregation step. The Gap Taxonomy has direct implications for drug safety signal detection and regulatory frameworks including Good Pharmacovigilance Practice (GVP) guidelines and New Approach Methodologies (NAMs). Structural limits identified are confirmed in a companion DDI ablation study.

URL PDF HTML ☆

赞 0 踩 0

2605.26121 2026-06-01 cs.LG cs.AI 版本更新

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

GEM: 用于最优LLM数据策展的几何熵混合

Yue Min, Ziyun Qiao, Ruining Chen, Yujun Li

发表机构 * The Hong Kong University of Science and Technology, Hong Kong SAR, China（香港科学与技术大学）； Peking University, Beijing, China（北京大学）； University of Science and Technology of China, Hefei, China（中国科学技术大学）

AI总结提出GEM框架，通过将数据策展重构为超球面上的变分问题并采用MM算法优化，解决了分类缺陷和嵌入各向异性问题，在1.1B参数模型上实现下游准确率提升1.2%。

Comments ICML 2026 Poster

详情

AI中文摘要

LLM预训练的有效性越来越依赖于数据组成而非单纯的数据量。然而，最优混合受到分类缺陷的阻碍：人类分类法存在本体论错位，而欧几里得聚类无法解决嵌入各向异性。我们引入GEM（几何熵混合），这是一个将数据策展重构为超球面上的变分问题并辅以混合平衡正则化项的框架。通过解耦生成先验并使用可证明的MM（Minorize-Maximize）算法优化目标，GEM有效对抗聚类坍缩，从而发现欧几里得启发式方法无法察觉的平衡语义结构。我们采用师生蒸馏将这种几何保真度扩展到网络规模语料库，并引入几何影响分数（GIS）用于可解释的分类法生成。使用1.1B参数模型的实验表明，当集成到DoReMi和RegMix等混合策略中时，GEM建立了新的最先进水平，将平均下游准确率提升高达1.2%，并为可预测的数据混合提供了稳健的坐标系。

英文摘要

LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorization flaws: human taxonomies suffer from ontological misalignment, and Euclidean clustering fails to address embedding anisotropy. We introduce GEM (Geometric Entropy Mixing), a framework reformulating data curation as a variational problem on the hypersphere augmented with a mixing-balance regularizer. By decoupling the generative prior and optimizing the objective via a provable MM (Minorize-Maximize) algorithm, GEM effectively counteracts the cluster collapse to discover balanced semantic structures invisible to Euclidean heuristics. We employ teacher-student distillation to scale this geometric fidelity to web-scale corpora and introduce the Geometric Influence Score (GIS) for interpretable taxonomy generation. Experiments with 1.1B-parameter models demonstrate that GEM establishes a new state-of-the-art when integrated into mixing strategies like DoReMi and RegMix, improving average downstream accuracy by up to 1.2% and offering a robust coordinate system for predictable data mixing.

URL PDF HTML ☆

赞 0 踩 0

2605.30018 2026-06-01 cs.CL cs.LG 版本更新

Latent Performance Profiling of Large Language Models

大型语言模型的潜在性能剖析

Tanmoy Chakraborty, Ayan Sengupta, Suparna Bhattacharya, Partha Pratim Chakrabarti, Amlan Chakrabarti, Supratik Chakraborty, Partha Pratim Das, Lipika Dey, Richa Singh, Mayank Vatsa

发表机构 * Department of Electrical Engineering, Indian Institute of Technology Delhi（印度理工学院德里分校电子工程系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里分校人工智能学院）； Hewlett Packard Enterprise, India（印度惠普企业公司）； Department of Computer Science & Engineering, Indian Institute of Technology Kharagpur（印度理工学院Khargapur分校计算机科学与工程系）； A.K.Choudhury School of Information Technology, University of Calcutta, India（印度加尔各答大学信息科技学院）； Department of Computer Science & Engineering, Indian Institute of Technology Bombay（印度理工学院孟买分校计算机科学与工程系）； Department of Computer Science, Ashoka University, India（阿什oka大学计算机科学系）； Department of Computer Science & Engineering, Indian Institute of Technology Jodhpur（印度理工学院朱罗普分校计算机科学与工程系）

AI总结提出潜在性能剖析（LPP）框架，通过隐藏激活和输出分布提取任务无关的诊断指标，揭示模型内在特性，补充传统基准评估。

详情

AI中文摘要

大型语言模型（LLMs）在标准化基准测试中经常取得令人印象深刻的分数，但仅凭准确性对能力的了解有限。通过排行榜评估开源LLMs面临持续的问题，如数据污染、任务范围狭窄以及与真实世界可靠性的弱对齐。基于基准的评估（如MMLU PRO、BBH或IFEval）主要捕捉模型在固定测试集上的输出，而非其如何处理信息、校准不确定性或构建内部知识。在本文中，我们主张从以基准为中心的评估转向对LLMs进行互补的、以状态为中心的内在评估。为此，我们引入了潜在性能剖析（LPP）——一个从隐藏激活和输出分布中提取任务无关诊断的框架。LPP在模型的潜在表示和动态上定义了一组标量指标，揭示了与规模无关的特征，从而实现可解释的比较并揭示隐藏的脆弱性。与静态准确性分数不同，LPP在相似规模的模型间提供稳定、对架构敏感的签名。通过对八个LLMs（规模范围0.5B-14B）的广泛实证分析，我们证明了具有相似基准分数的模型可能表现出对比的潜在特征，例如熵或适应性的差异。在这些见解的指导下，我们设计了用于不确定性和符号推理的合成探针，这些探针与内在指标一致，同时与排行榜偏差解耦。我们建议将LPP与基准一起报告，以提供对模型行为更深入、可解释的理解，从而实现更可靠的模型选择、安全评估以及超越表面准确性的评估。

英文摘要

Large language models (LLMs) frequently achieve impressive scores on standardized benchmarks, yet accuracy alone offers a limited view of their capabilities. Evaluating open-source LLMs through leaderboards faces persistent issues like data contamination, narrow task scope, and weak alignment with real-world reliability. Benchmark-based evaluations such as MMLU PRO, BBH, or IFEval primarily capture what a model outputs on fixed test sets, not how it processes information, calibrates uncertainty, or structures internal knowledge. In this article, we advocate for a shift from benchmark-centric evaluation toward a complementary, state-centered intrinsic assessment of LLMs. To this end, we introduce Latent Performance Profiling (LPP) -- a framework that derives task-agnostic diagnostics from hidden activations and output distributions. LPP defines a set of scalar metrics on a model's latent representations and dynamics, revealing scale-independent traits that enable interpretable comparisons and uncover hidden vulnerabilities. Unlike static accuracy scores, LPP provides stable, architecture-sensitive signatures across models of similar size. With extensive empirical analyses across eight LLMs, spanning a size range of 0.5B-14B, we demonstrate that models with similar benchmark scores can exhibit contrasting latent profiles, such as differences in entropy or adaptability. Guided by these insights, we design synthetic probes for uncertainty and symbolic reasoning that align with intrinsic metrics while decoupling from leaderboard bias. We recommend that reporting LPP alongside benchmarks provides a deeper, interpretable understanding of model behavior, enabling more reliable model selection, safety assessment, and evaluation beyond surface-level accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.29852 2026-06-01 cs.CV cs.LG cs.MM 版本更新

Parameter-Efficient Subspace Decoupling ViT for Mitigating Multi-Task Negative Transfer in Histological Scoring

参数高效子空间解耦ViT用于缓解组织学评分中的多任务负迁移

Youhan Huang, Jiajun Li, Yilin Fang, Shuai Wang, Chuheng Li

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Beijing University of Chemical Technology（北京化工大学）； Capital Medical University（首都医科大学）

AI总结提出子空间解耦多任务Vision Transformer，通过轻量级任务特定适配器和正交性约束构建独立特征子空间，减少任务干扰并保留共享表示，有效缓解多任务负迁移。

Comments 6 pages, 5 figures, 2 tables. IEEE ICME 2026 (Oral). Camera-ready version

详情

AI中文摘要

组织学评分对于诊断非酒精性脂肪性肝病（NAFLD）至关重要，但由于高标注成本以及多任务学习中强相关的NAFLD活动评分（NAS）指标之间的负迁移，其自动化仍然具有挑战性。为了解决这个问题，我们提出了一种子空间解耦的多任务Vision Transformer（ViT），它集成了轻量级的任务特定适配器与基于正交性的约束。该设计为脂肪变性、气球样变和炎症构建了独立的特征子空间，有效减少了任务干扰，同时保留了共享表示。我们进一步构建了一个精心策划的多任务小鼠NAFLD组织学数据集，其中包含所有NAS组件的专家标注。实验结果表明，与训练单独的单个任务模型相比，所提出的方法以显著降低的计算成本提高了多任务稳定性和泛化能力。代码和策划的数据集已准备就绪，将在接收后公开以支持可重复性。

英文摘要

Histological scoring is essential for diagnosing Non-Alcoholic Fatty Liver Disease (NAFLD), yet its automation remains challenging due to the high annotation cost and negative transfer among the strongly correlated NAFLD Activity Score (NAS) indicators in multi-task learning. To address this issue, we propose a subspace-decoupled multi-task Vision Transformer (ViT) that integrates lightweight task-specific Adapters with orthogonality-based constraints. This design constructs independent feature subspaces for steatosis, ballooning, and inflammation, effectively reducing task interference while retaining shared representations. We further construct a curated multi-task mouse NAFLD histology dataset with expert annotations for all NAS components. Experimental results demonstrate that the proposed method improves multi-task stability and generalization with substantially reduced computational cost compared to training separate single-task models. The code and the curated dataset have been prepared and will be made publicly available upon acceptance to support reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2605.22737 2026-06-01 cs.LG cs.AI 版本更新

The Distillation Game: Adaptive Attacks & Efficient Defenses

蒸馏博弈：自适应攻击与高效防御

Youssef Allouah, Mahdi Haghifam, Sanmi Koyejo, Reza Shokri

发表机构 * Stanford University（斯坦福大学）； Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； National University of Singapore（新加坡国立大学）

AI总结通过最小化博弈框架研究蒸馏攻击中模型提供者的部署权衡，提出自适应评估规则和产品专家（PoE）防御方法，实验表明自适应学生能恢复更多能力，且PoE在成本和质量上具有优势。

详情

AI中文摘要

蒸馏攻击为模型提供者带来了部署权衡：使模型更有用的相同输出也可能使其更容易被模仿。我们通过一个效用受限的教师和自适应学生之间的最小化博弈来研究这种权衡。我们的框架产生了可处理的一侧响应规则：一个自适应评估规则，其中学生重新加权高价值示例，以及一个教师侧防御模板，抑制对蒸馏最有用的输出。从示例价值的廉价代理中，我们推导出产品专家（PoE），一种简单的前向传递防御，在生成过程中将教师与代理学生结合。实验上，自适应评估揭示了一个大的被动-自适应差距：在最先进的防御上，自适应学生在GSM8K和MATH上恢复了比被动评估所建议的更多的能力。在这种更强的评估下，昂贵防御和PoE之间的明显鲁棒性差距显著缩小，而PoE仍然便宜得多，并保留了更高质量的推理轨迹。总体而言，我们的结果表明，强大的蒸馏仍然难以阻止，并且反蒸馏的进展应该根据自适应学生而非被动学生来判断。我们的代码可在：https://github.com/ysfalh/distillation-game 获取。

英文摘要

Distillation attacks create a deployment trade-off for model providers: the same outputs that make a model more useful can also make it easier to imitate. We study this trade-off through a minimax game between a utility-constrained teacher and an adaptive student. Our framework yields tractable one-sided response rules: an adaptive evaluation rule in which the student reweights high-value examples, and a teacher-side defense template that suppresses outputs most useful for distillation. From a cheap proxy for example value, we derive Product-of-Experts (PoE), a simple forward-pass-only defense that combines the teacher with a proxy student during generation. Empirically, adaptive evaluation reveals a large passive--adaptive gap: on state-of-the-art defenses, adaptive students recover substantially more capability than passive evaluation suggests on GSM8K and MATH. Under this stronger evaluation, the apparent robustness gap between expensive defenses and PoE narrows considerably, while PoE remains substantially cheaper and preserves higher-quality reasoning traces. Overall, our results suggest that strong distillation remains difficult to stop, and that progress on antidistillation should be judged against adaptive students rather than passive ones. Our code is available at: https://github.com/ysfalh/distillation-game.

URL PDF HTML ☆

赞 0 踩 0

2605.12340 2026-06-01 stat.ML cs.LG 版本更新

Online Learning-to-Defer with Varying Experts

在线学习延迟决策与变化专家

Dang Hoang Duy, Yannis Montreuil, Maxime Meyer, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing, National University of Singapore（新加坡国立大学计算机学院）； Fédération ENAC ISAE-SUPAERO ONERA, Université de Toulouse, France（法国图卢兹大学ENAC ISAE-SUPAERO ONERA联合体）； Institute for Infocomm Research, A*STAR , Singapore（新加坡A*STAR信息通信研究所）； IPAL, IRL 2955, Singapore（新加坡IPAL实验室）； Department of Mathematics, National University of Singapore（新加坡国立大学数学系）

AI总结针对动态专家池和流式数据，提出首个在线学习延迟决策算法，利用H-一致性界和在线凸优化实现遗憾界保证。

详情

AI中文摘要

学习延迟决策（L2D）方法将每个查询路由到预测模型或外部专家。虽然现有工作研究批处理设置中的这个问题，但实际部署需要处理流数据、变化的专家可用性和变化的专家分布。我们引入了第一个用于多类分类的在线L2D算法，具有bandit反馈和动态变化的专家池。我们的方法在一般情况下实现了$O((n+n_e)T^{2/3})$的遗憾界，在低噪声条件下实现了$O((n+n_e)\sqrt{T})$的遗憾界，其中$T$是时间范围，$n$是标签数量，$n_e$是跨轮次观察到的不同专家数量。该分析基于在线框架的新颖$\mathcal{H}$-一致性界，结合在线凸优化的一阶方法。在合成和真实世界数据集上的实验表明，我们的方法有效地将标准学习延迟决策扩展到具有变化专家可用性和可靠性的设置。

英文摘要

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.

URL PDF HTML ☆

赞 0 踩 0

2604.09414 2026-06-01 stat.ML cs.LG 版本更新

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

超越增强动作代理的多专家学习延迟决策

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing（计算学院）； Fédération ENAC（ENAC联合会）； ISAE-SUPAERO ； ONERA ； National University of Singapore（新加坡国立大学）； Université de Toulouse（图卢兹大学）； Agency for Science, Technology and Research（科技研究局）； Institute for Infocomm Research（信息通信研究所）

AI总结针对多专家学习延迟决策问题，提出一种解耦代理损失函数，通过独立sigmoid头与softmax分类器头分离优化，解决了现有方法中的优化病理问题，并首次给出不随专家数量增长的校准常数界。

详情

AI中文摘要

学习延迟决策（L2D）系统针对每个输入决定是自行预测还是交给若干可用专家之一。非常成熟的方案通过将$K$个类别和$J$个专家视为共享$(K{+}J)$动作几何中的竞争动作，联合训练分类器和路由器。后续工作在该几何内提出了一系列增量修复；我们表明，即使在统计一致性下，每个方法仍不同程度地遭受优化层面的病理问题（目标失真、梯度放大、赢家通吃饥饿、集合质量崩溃或类别-专家耦合）。我们完全跳出增强动作家族，提出一种解耦代理：一个softmax分类器头以及每个专家独立的sigmoid头，镜像了问题的两个自然对象。我们证明每个样本的更新是坐标式的，且类别-专家Hessian块恒为零，并证明了具有校准常数$\max\{2\sqrt{2},\sqrt{2J/λ}\}$的过量风险界——据我们所知，这是第一个在多专家L2D中当每个专家权重固定时常数不随专家池增长的保证。在受控合成研究以及CIFAR-10、CIFAR-10H和Covertype上，它是我们比较中唯一在专家池增长时保持稳定、保留稀有专家并在每个真实数据基准上优于独立分类器的方法。

英文摘要

A learning-to-defer (L2D) system decides, for each input, whether to predict on its own or to hand it to one of several available experts. The very well established recipe trains classifier and router jointly by treating the $K$ classes and $J$ experts as competing actions in one shared $(K{+}J)$-action geometry. Subsequent work has proposed a series of incremental fixes within this geometry; we show that each still suffers, to varying severity, from an optimization-level pathology (target distortion, gradient amplification, winner-take-all starvation, set-mass collapse, or class-expert coupling) even under statistical consistency. We step outside the augmented-action family entirely and propose a decoupled surrogate: a softmax classifier head and an independent sigmoid head per expert, mirroring the two natural objects of the problem. We show that per-sample updates are then coordinatewise and the class-expert Hessian block is identically zero, and prove an excess-risk bound with calibration constant $\max\{2\sqrt{2},\sqrt{2J/λ}\}$ -- to our knowledge the first multi-expert L2D guarantee whose constant does not grow with the expert pool when the per-expert weight is held fixed. On controlled synthetic studies and on CIFAR-10, CIFAR-10H, and Covertype, it is the only method in our comparison that remains stable as the expert pool grows, preserves rare specialists, and improves over a standalone classifier on every real-data benchmark.

URL PDF HTML ☆

赞 0 踩 0

2603.14324 2026-06-01 stat.ML cs.LG 版本更新

Learning-to-Defer with Expert-Conditional Advice

基于专家条件建议的学习-延迟决策

Yannis Montreuil, Leïna Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing（计算学院）； Département de Mathématiques（数学系）； National University of Singapore（新加坡国立大学）； Sorbonne University（索邦大学）； Fédération ENAC（ENAC联合会）； ISAE-SUPAERO ； ONERA ； Agency for Science, Technology and Research（科技研究局）； Université de Toulouse（图卢兹大学）； Institute for Infocomm Research（信息与通信研究所）

AI总结研究在决策时可为专家提供额外信息（建议）的延迟学习问题，提出一种在复合专家-建议动作空间上的增广替代损失，并证明其一致性保证和最优策略恢复能力。

详情

AI中文摘要

学习-延迟决策将每个输入路由到预期成本最小的专家，但假设决策时每个专家可获得的信息是固定的。许多现代系统违反了这一假设：选择专家后，还可以选择该专家应接收哪些额外信息，例如检索到的文档、工具输出或升级上下文。我们研究了这个问题，并将其称为带建议的学习-延迟决策。我们表明，即使在最简单的非平凡设置中，一系列广泛使用的自然分离替代损失（通过不同头部学习路由和建议）也是不一致的。然后，我们引入了一个在复合专家-建议动作空间上操作的增广替代损失，并证明了其$\mathcal{H}$一致性保证以及超额风险转移界，从而在极限情况下恢复贝叶斯最优策略。在表格、语言和多模态任务上的实验表明，所提方法优于标准学习-延迟决策，同时根据成本机制调整其建议获取行为；一个合成基准证实了分离替代损失预测的失败模式。

英文摘要

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.

URL PDF HTML ☆

赞 0 踩 0

2510.10988 2026-06-01 stat.ML cs.LG 版本更新

Adversarial Robustness in One-Stage Learning-to-Defer

单阶段学习委托中的对抗鲁棒性

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing, National University of Singapore（新加坡国立大学计算机学院）； Fédération ENAC ISAE-SUPAERO ONERA, Université de Toulouse, France（法国图卢兹大学ENAC ISAE-SUPAERO ONERA联合体）； Institute for Infocomm Research, A*STAR , Singapore（新加坡星展研究所）； IPAL, IRL 2955, Singapore（新加坡IPAL实验室）

AI总结针对单阶段学习委托（L2D）中预测器与分配器联合训练的场景，提出首个对抗鲁棒性框架，通过形式化攻击、设计成本敏感的对抗替代损失并建立理论保证（包括H、R/F和贝叶斯一致性），在基准数据集上验证了方法在保持干净性能的同时提升了对无目标和有目标攻击的鲁棒性。

2605.29511 2026-06-01 cs.MA cs.CL cs.LG 版本更新

DynaGraph: Lightweight Multi-Model Interaction Framework via Dynamic Topological Reconfiguration

DynaGraph: 通过动态拓扑重构的轻量级多模型交互框架

Yanxing Guo, Zihao Zheng, Fangzhou Wu, Ling Liang, Lin Bao, Zongwei Wang, Yimao Cai

发表机构 * Peking University（北京大学）； Nanjing University（南京大学）； Beijing Advanced Innovation Center for Integrated Circuits（北京集成电路先进制造创新中心）； Beijing University of Posts and Telecommunications（北京邮电大学）； Yanxin Co. Ltd（燕新有限公司）

AI总结提出DynaGraph框架，通过动态拓扑重构和PEFT适配器复用，在单消费级GPU上实现多模型协作，接近72B单模型推理能力并大幅降低延迟和token消耗。

详情

AI中文摘要

处理复杂推理任务通常依赖于庞大的单体LLM，这会导致严重的计算冗余。虽然通过结构化流水线或多智能体协作进行任务分解提供了替代方案，但这些方法不可避免地陷入一个关键困境：预定义的静态拓扑极易受到级联错误的影响，而无约束的动态智能体则面临轨迹发散和不可预测的内存膨胀。为了解决这个问题，我们提出了DynaGraph，一个由动态拓扑重构驱动的轻量级多模型框架。在执行层面，DynaGraph在共享基础模型上复用时分PEFT适配器，使得整个系统的训练和推理部署可以在单个消费级GPU上完成。在路由层面，评估器持续监控执行置信度以触发分层自愈：针对局部数据差距的细粒度修补和针对严重逻辑断裂的子图重构。在StrategyQA、MATH和FinQA上的实验表明，我们的8B模型接近72B单体模型的推理能力（例如，在StrategyQA上为87.6%，在MATH上为82.7%）。此外，与无约束的动态架构相比，它延迟降低了高达68.1%，token消耗降低了68.6%。

英文摘要

Tackling complex reasoning tasks typically relies on massive monolithic LLMs, which suffer from severe computational redundancy. While task decomposition through structured pipelines or multi-agent collaborations offers an alternative, these approaches inevitably fall into a critical dilemma: predefined static topologies are highly vulnerable to cascading errors, whereas unconstrained dynamic agents suffer from trajectory divergence and unpredictable memory bloat. To address this, we present DynaGraph, a lightweight multi-model framework driven by dynamic topological reconfiguration. At the execution level, DynaGraph multiplexes time-division PEFT adapters over a shared base model, enabling both full system training and inference deployment on a single consumer-grade GPU. At the routing level, the Evaluator continuously monitors execution confidence to trigger hierarchical self-healing: Fine-grained Patching for localized data gaps and Subgraph Reconstruction for severe logical ruptures. Experiments on StrategyQA, MATH, and FinQA demonstrate our 8B model closely approximates the reasoning capabilities of a 72B monolithic model (e.g., 87.6% on StrategyQA, 82.7% on MATH). Furthermore, it reduces latency by up to 68.1% and token consumption by 68.6% compared to unconstrained dynamic architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.29373 2026-06-01 cs.LG cs.NA math.NA 版本更新

Deep Adaptive Dimension Reduction for Bayesian Inference in Inverse Problems

逆问题中贝叶斯推理的深度自适应降维

Yueyang Wang, Xili Wang, Kejun Tang, Xiaoliang Wan, Tao Zhou, Chao Yang

发表机构 * School of Mathematical Sciences, Peking University（北京大学数学科学学院）； School of Sciences, Great Bay University（大湾大学理学院）； Department of Mathematics, Louisiana State University（路易斯安那州立大学数学系）； SKLMS & Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院SKLMS及计算数学与科学/工程计算研究所）

AI总结提出基于变分流模型的深度自适应降维贝叶斯推理框架，结合VAE非线性降维、双归一化流和迭代先验更新策略，并自适应微调傅里叶神经算子代理，以高效求解高维PDE控制逆问题中的复杂非高斯后验分布。

Comments 25 pages, 5 figures

详情

AI中文摘要

求解高维PDE控制的逆问题通常具有挑战性，原因在于复杂的非高斯后验分布、昂贵的正演模型评估以及错误的先验信息。为了解决这些问题，我们提出了一种基于变分流（VF）模型的深度自适应降维贝叶斯推理框架。由于标准归一化流受双射映射限制且无法直接降维，VF通过将基于VAE的非线性降维与潜在先验和编码器的双归一化流相结合，克服了这一限制。该设计提供了严格高于VAE的证据下界，并允许更灵活地逼近复杂后验分布。我们进一步引入了一种迭代先验更新策略，该策略逐渐将先验均值移向高概率后验区域，避免了手动先验调整。这些组件与自适应微调的傅里叶神经算子（FNO）代理一起形成了一个闭环自适应循环：VF生成后验集中样本以改进代理，而更新的代理进一步改进后验推理。在100维Rosenbrock问题和三个标准PDE控制逆问题上的数值实验表明，与MCMC、UKI和SVGD基线相比，我们的方法在所有测试配置中均具有竞争性或更优的精度，在高噪声观测和高维参数空间等挑战性场景中优势最为明显。

英文摘要

Solving high-dimensional PDE-governed inverse problems is often challenging due to complex non-Gaussian posterior distributions, expensive forward model evaluations, and misspecified prior information. To address these issues, we propose a deep adaptive dimension-reduction Bayesian inference framework based on the Variational Flow (VF) model. Since standard normalizing flows are restricted by bijective mappings and cannot directly reduce dimensions, VF overcomes this limitation by integrating VAE-based nonlinear dimension reduction with dual normalizing flows for the latent prior and encoder. This design provides a strictly higher evidence lower bound than VAE and allows more flexible approximation of complex posterior distributions. We further introduce an iterative prior updating strategy that gradually moves the prior mean toward high-probability posterior regions, avoiding manual prior tuning. These components form a closed adaptive loop together with an adaptively fine-tuned Fourier Neural Operator (FNO) surrogate: VF generates posterior-concentrated samples to refine the surrogate, while the updated surrogate further improves posterior inference. Numerical experiments on a 100-dimensional Rosenbrock problem and three standard PDE-governed inverse problems show that our method delivers competitive or superior accuracy compared with MCMC, UKI, and SVGD baselines across all tested configurations, with the most pronounced advantages emerging in challenging scenarios such as high-noise observations and high-dimensional parameter spaces.

URL PDF HTML ☆

赞 0 踩 0

2605.29268 2026-06-01 cs.CL cs.AI cs.LG cs.NE 版本更新

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

进化搜索中的计算分配：从深度-广度到多臂老虎机

Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang, Haozheng Luo, Tianfan Fu, Aarthy Nagarajan

发表机构 * University of Notre Dame（诺丁汉大学）； Northeastern University（东北大学）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； Southeast University（东南大学）； Northwestern University（西北大学）； Nanjing University（南京大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结针对LLM引导的进化搜索中固定预算的LLM调用分配问题，提出基于多臂老虎机的BaSE方法，通过跨并行轨迹分配调用，平均适应度提升12.3%。

详情

AI中文摘要

LLM引导的进化搜索（Evolve系统）在数学和组合任务上达到了最先进的结果，但现有系统通常只报告多次运行中的最佳结果，而未记录运行间的分布。我们询问如何分配固定的LLM调用预算，以及单次运行达到报告数字的可靠性如何。通过扫描五个模型和三个任务的深度-广度网格，我们识别出两个经验规律：一个适应度-计算包络线，其中能力排序主要取决于有效FLOPs；以及一个双线性深度-广度拟合，具有任务特定的交互；两者都受模型-任务能力门控。受这些规律启发，我们提出BaSE（基于老虎机的自进化），一种多臂老虎机，它在并行轨迹间分配LLM调用。在不改变模型、提示或评估器的情况下，BaSE在8个（模型，任务）单元上比最强的岛屿协议基线平均适应度提高12.3%，在方差高的设置上增益最大：仅通过分配实现可靠性提升。

英文摘要

LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report only the best of many runs and leave the run-to-run distribution undocumented. We ask how a fixed budget of LLM calls should be allocated, and how reliably a single run reaches the reported numbers. Sweeping the depth-breadth grid over five models and three tasks, we identify two empirical regularities: a fitness-compute envelope along which capability ordering largely collapses on effective FLOPs, and a bilinear depth-breadth fit with task-specific interaction; both are gated by model-task capability. Motivated by these regularities, we propose BaSE (Bandit-based Self-Evolving), a multi-armed bandit that allocates LLM calls across parallel trajectories. Without changing the model, prompt, or evaluator, BaSE improves mean fitness by 12.3% over the strongest island-protocol baseline across 8 (model, task) cells, with the largest gains on high-variance settings: a reliability gain from allocation alone.

URL PDF HTML ☆

赞 0 踩 0

2605.28918 2026-06-01 cs.LG cs.AI cs.IR 版本更新

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

当LLM奖励设计失败时：面向诊断的稀疏结构化RL改进

Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang

AI总结针对稀疏结构化强化学习任务，提出诊断驱动的迭代奖励函数改进方法，通过训练诊断和失败模式分类指导修正，显著提升MiniGrid任务成功率。

详情

AI中文摘要

对于具有语义奖励函数接口的稀疏结构化强化学习任务，LLM生成的奖励塑造更适合被视作调试而非一次性生成。我们使用MiniGrid作为核心评估、MuJoCo作为边界压力测试，研究PPO训练的智能体。我们的审计发现两种主要的一次性失败模式——奖励泛滥和语义/API误解，以及一种较罕见的弱塑造情况。我们提出诊断驱动的迭代改进，其中训练诊断和失败模式分类法指导有针对性的奖励函数修订。改进使DoorKey-8x8从2.3%提升至97.6%，KeyCorridor从31.2%提升至86.7%，但种子间方差较高。控制实验表明这些提升并非来自重试或额外训练：仅指标重新提示导致大幅下降，而静态词汇控制恢复了大部分差距（87.6%；70.7%），表明分类法提示是主要机制，动态标签仅提供部分孤立的增量证据。预算匹配和Best-of-3比较将改进与选择和训练时间效应分离。组件移除测试、敏感性分析以及针对作者标签的审计为调试解释提供了汇聚证据，同时揭示了校准限制。连续控制结果显示了边界：基于成功的诊断可能在密集奖励的 locomotion 中误报，而回报趋势反馈移除了一个假阳性机制但未带来稳健提升。低调用协议是与基于种群的奖励搜索的成本对比，而非基准比较。在四个交叉方差设计环境中，点估计表明当LLM奖励函数方差占主导时收益更大，但bootstrap区间较宽。该方法局限于PPO下具有可靠接口的稀疏结构化任务；event_text等字段可能有益、有害或中性。

英文摘要

For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation. We study PPO-trained agents using MiniGrid as core evaluation and MuJoCo as boundary stress test. Our audit finds two dominant one-shot failure modes -- reward flooding and semantic/API misunderstanding -- plus a rarer weak-shaping case. We propose diagnostic-driven iterative refinement, where training diagnostics and a failure-mode taxonomy guide targeted reward-function revision. Refinement improves DoorKey-8x8 from 2.3% to 97.6% and KeyCorridor from 31.2% to 86.7% with high seed-to-seed variance. Controls show these gains are not from retrying or extra training: metrics-only re-prompting yields large drops, while a static-vocabulary control recovers much of the gap (87.6%; 70.7%), showing the taxonomy prompt is a major mechanism and dynamic labels provide only partially isolated incremental evidence. Budget-matched and Best-of-3 comparisons separate refinement from selection and training-time effects. Component-removal tests, sensitivity analyses, and an audit against author labels provide converging evidence for the debugging interpretation while revealing calibration limits. Continuous-control results show the boundary: success-based diagnostics can misfire in dense-reward locomotion, and return-trend feedback removes one false-positive mechanism without robust gains. The low-call protocol is a cost contrast with population-based reward search, not a benchmark comparison. In four crossed-variance-design environments, point estimates suggest larger gains when LLM reward-function variance dominates but bootstrap intervals are wide. The method is bounded to sparse structured tasks with reliable interfaces under PPO; fields like event_text may help, hurt, or be neutral.

URL PDF HTML ☆

赞 0 踩 0

2605.25134 2026-06-01 cs.LG cs.AI 版本更新

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

重参数化、权重衰减和自适应学习率下稀疏优化的理论分析

Huangyu Xu, Jingqin Yang, Qianqian Xu, Jiaye Teng

发表机构 * State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China（人工智能安全国家重点实验室，计算技术研究所，中国科学院，北京，中国）； School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China（中国科学院大学计算机科学与技术学院，北京，中国）； Beijing Academy of Artificial Intelligence (BAAI), Beijing, China（北京人工智能研究院（BAAI），北京，中国）； IIIS, Tsinghua University, Beijing, China（清华大学人工智能院，北京，中国）； School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China（上海财经大学统计与管理学院，上海，中国）； Institute of Data Science and Statistics, Shanghai University of Finance and Economics, Shanghai, China（上海财经大学数据科学与统计研究所，上海，中国）

AI总结针对稀疏优化中的不稳定问题，提出基于重参数化、权重衰减和自适应学习率的ReWA方法，通过改善优化景观实现比ℓ1正则化更好的稀疏性，同时保持测试精度。

Comments 32 pages, 5 figures. Submitted to ICML 2026

2602.20176 2026-06-01 q-bio.BM cs.LG 版本更新

Cross-Chirality Generalization by Axial Vectors for Hetero-Chiral Protein-Peptide Interaction Design

通过轴向向量实现异手性蛋白质-肽相互作用设计的跨手性泛化

Ziyi Yang, Zitong Tian, Yinjun Jia, Tianyi Zhang, Jiqing Zheng, Hao Wang, Yubu Su, Juncai He, Lei Liu, Yanyan Lan

发表机构 * Department of Chemistry, Tsinghua University, Beijing, China（清华大学化学系）； School of Life Sciences, Tsinghua University, Beijing, China（清华大学生命科学学院）； Anew Labs, Shanghai, China（Anew实验室）； Tsinghua-Peking Center for Life Sciences, Beijing, China（清华大学-北京大学生命科学中心）； Ministry of Education Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Tsinghua University, Beijing, China（教育部生物有机磷化学与化学生物学重点实验室）； Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China（合成与系统生物学中心）； Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China（北京生物结构前沿研究中心）； Qiuzhen College, Tsinghua University, Beijing, China（齐臻学院）； Yau Mathematical Sciences Center, Tsinghua University, Beijing, China（叶德平数学科学中心）； Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China（人工智能产业研究院（AIR），清华大学）； AI Industry Research Innovation Center,Wuxi Research Institute for Applied Technologies, Tsinghua University（人工智能产业研究创新中心，无锡应用技术研究院，清华大学）

AI总结提出向E(3)等变（极）向量特征注入轴向特征的方法，结合潜在扩散模型实现从同手性训练数据到异手性设计任务的跨手性泛化，首次通过湿实验验证了生成式AI从头设计D-肽结合物的有效性。

Comments v3: Revised acknowledgements only. The paper has been accepted to ICML 2026

详情

AI中文摘要

靶向L-蛋白的D-肽结合物具有广阔的治疗潜力。尽管基于机器学习的靶标条件肽设计取得了快速进展，但生成D-肽结合物仍基本未被探索。在这项工作中，我们表明通过向$E(3)$等变（极）向量特征注入轴向特征，可以实现从同手性（L--L）训练数据到异手性（D--L）设计任务的跨手性泛化。通过在潜在扩散模型中实现该方法，我们实现了D-肽结合物设计，不仅在 extit{in silico}基准测试中优于现有工具，而且在湿实验验证中显示出有效性。据我们所知，我们的方法代表了首个经过湿实验验证的用于 extit{de novo}设计D-肽结合物的生成式AI，为处理蛋白质设计中的手性提供了新视角。代码可在https://github.com/YZY010418/PepMirror获取。

英文摘要

D-peptide binders targeting L-proteins have promising therapeutic potential. Despite rapid advances in machine learning-based target-conditioned peptide design, generating D-peptide binders remains largely unexplored. In this work, we show that by injecting axial features to $E(3)$-equivariant (polar) vector features, it is feasible to achieve cross-chirality generalization from homo-chiral (L--L) training data to hetero-chiral (D--L) design tasks. By implementing this method within a latent diffusion model, we achieved D-peptide binder design that not only outperforms existing tools in \textit{in silico} benchmarks, but also demonstrates efficacy in wet-lab validation. To our knowledge, our approach represents the first wet-lab validated generative AI for the \textit{de novo} design of D-peptide binders, offering new perspectives on handling chirality in protein design. Codes are available at https://github.com/YZY010418/PepMirror

URL PDF HTML ☆

赞 0 踩 0

2510.15340 2026-06-01 quant-ph cs.LG cs.SY eess.SY 版本更新

超越静态不确定性：为概率时间序列建模时间不确定性动态

Yijun Wang, Qiyuan Zhuang, Larysa Marchanka, Xiu-Shen Wei

发表机构 * Department of Computer Science, Southeast University（东南大学计算机科学系）； Francisk Skorina Gomel State University（弗拉基米尔·斯科里纳戈梅尔州立大学）

AI总结提出VolDy-VAE模型，通过循环尺度路径捕捉波动率动态，实现时间一致的概率预测，提升准确性和不确定性校准。

详情

AI中文摘要

现实世界的时间序列表现出时间结构化的不确定性：波动率在动荡时期聚集，在稳定时期消散，并在结构断裂处突然变化。然而，许多概率预测方法将预测不确定性估计为独立的逐点量，忽略了波动率机制的演变和持续性。我们将这一缺失维度形式化为时间不确定性动态，并在波动率动态变分自编码器（VolDy-VAE）中实例化它，这是一个具有位置-尺度解码器的非自回归生成预测器。VolDy-VAE结合了用于均值预测的位置路径和用于传递和演化波动率隐藏状态的循环尺度路径，该状态从回溯窗口转移到预测范围，从而实现时间一致的预测方差。这种设计产生了一种自适应衰减机制：高方差观测值对位置估计的影响较小，而其不确定性通过明确的尺度预测得以保留。我们进一步提供了一个简化的机制转换分析，表明当方差已知或一致估计时，波动率感知目标简化为逆方差加权，而基于MSE的估计量保持无偏但统计效率较低。在九个基准上的实验表明，VolDy-VAE在保持低推理延迟的同时，提高了预测准确性和不确定性校准，优于竞争的概率和点预测基线；插件研究进一步表明，VolDy原理可以有益于GAN、Koopman VAE和Transformer骨干网络。源代码公开于https://github.com/wangyijunlyy/VolDy-VAE。

英文摘要

Real-world time series exhibit temporally structured uncertainty: volatility clusters in turbulent regimes, dissipates in stable periods, and shifts abruptly around structural breaks. Yet many probabilistic forecasting methods estimate predictive uncertainty as an independent per-step quantity, leaving the evolution and persistence of volatility regimes under-modeled. We formalize this missing dimension as temporal uncertainty dynamics and instantiate it in the Volatility Dynamics Variational Autoencoder (VolDy-VAE), a non-autoregressive generative forecaster with a location-scale decoder. VolDy-VAE combines a location path for mean prediction with a recurrent scale path that transfers and evolves a volatility hidden state from the look-back window to the forecasting horizon, enabling temporally coherent predictive variances. This design yields an adaptive attenuation mechanism: high-variance observations receive lower influence on the location estimate while their uncertainty is preserved through explicit scale predictions. We further provide a simplified regime-switching analysis showing that, when variances are known or consistently estimated, the volatility-aware objective reduces to inverse-variance weighting, whereas MSE-based estimators remain unbiased but statistically inefficient. Experiments on nine benchmarks show that VolDy-VAE improves forecasting accuracy and uncertainty calibration over competitive probabilistic and point-forecasting baselines while maintaining low inference latency; plug-in studies further indicate that the VolDy principle can benefit GAN, Koopman VAE, and Transformer backbones. The source code is publicly available at https://github.com/wangyijunlyy/VolDy-VAE.

URL PDF HTML ☆

赞 0 踩 0

2501.02672 2026-06-01 stat.ML cs.LG econ.EM stat.ME 版本更新

Re-examining Granger Causality with Causal Bayesian Networks and Reichenbachs Principles

重新审视格兰杰因果关系：基于因果贝叶斯网络和赖兴巴赫原理

S. A. Adedayo

发表机构 * Univie Doctoral School of Computer Science (DOCS)（维也纳计算机科学博士学院）

AI总结本文通过赖兴巴赫原理和因果贝叶斯网络重新解释格兰杰因果关系，提出因果化格兰杰因果关系（c-GC）算法，赋予其稳健的因果解释，并在合成数据上取得满意结果。

2605.23937 2026-06-01 cs.AI cs.LG cs.LO math.OC 版本更新

BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization

BoxLitE：基于凸优化的忠实知识库嵌入

Bruno F. Lourenço, Hesham Morgan, Ana Ozaki, Aleksandar Pavlović, Emanuel Sallinger

发表机构 * The Institute of Statistical Mathematics, Japan（日本统计数学研究所）； TU Wien, Austria（奥地利技术大学维也纳分校）； University of Oslo, Norway（挪威奥斯陆大学）； University of Applied Sciences Campus Vienna, Austria（奥地利应用科学大学维也纳校区）

AI总结提出BoxLitE模型，通过凸优化实现DL-Lite$^{\mathcal{H}}$知识库的忠实嵌入，确保可满足知识库存在弱忠实模型。

Comments 28 pages. Full version of paper accepted to KR 2026 (23nd International Conference on Principles of Knowledge Representation and Reasoning). Track: KR meets Machine Learning and Explanation. Added a figure and some minor changes

详情

AI中文摘要

知识库（KB）嵌入旨在结合经典知识图谱嵌入在事实（ABox）中泛化信息的能力与本体语言（TBox）表示的概念知识。多位作者最近探索了将概念映射到向量空间中凸区域的思想。这对于表示TBox中通常存在的层次结构很有用，因为更一般的概念可以映射到更大的区域，包含与更具体概念相关的区域。然而，在实际学习任务中，凸性的能力很少被利用。在这里，我们引入了BoxLitE，一个针对DL-Lite$^{\mathcal{H}}$的KB嵌入模型，允许凸优化。我们证明，对于任何可满足的DL-Lite$^{\mathcal{H}}$ KB，存在一个BoxLitE嵌入，它是一个弱忠实模型。作为概念验证，我们展示了如何将KB嵌入任务表述为凸优化问题，以及如何获得具有这种理想忠实性属性的嵌入。

英文摘要

Knowledge base (KB) embeddings aim at combining the capability of classical knowledge graph embeddings to generalize the information present in facts, the ABox, with conceptual knowledge represented in an ontology language, the TBox. Several authors have recently explored the idea of mapping concepts to convex regions in a vector space. This is useful to represent hierarchies, typically present in TBoxes, since more general concepts can be mapped to larger regions, containing those regions associated with more specific concepts. However, the power of convexity is rarely leveraged during the actual learning tasks. Here, we introduce BoxLitE, a KB embedding model for DL-Lite$^{\mathcal{H}}$ that allows for convex optimization. We show that for any satisfiable DL-Lite$^{\mathcal{H}}$ KB, there is a BoxLitE embedding that is a weakly faithful model. As a proof of concept, we show how to formulate the KB embedding task as a convex optimization problem and how to obtain embeddings with such desirable faithfulness properties.

URL PDF HTML ☆

赞 0 踩 0

2605.15530 2026-06-01 cs.LG 版本更新

PlanningBench: 生成可扩展且可验证的规划数据以评估和训练大型语言模型

Ziliang Zhao, Zenan Xu, Shuting Wang, Hongjin Qian, Yan Lei, Minda Hu, Zhao Wang, Shihan Dou, Zhicheng Dou, Pluto Zhou

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学人工智能学院 Gallagher 学校）； LLM Department, Hunyuan Team, Tencent（腾讯 Hunyuan 团队 LLM 部门）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出PlanningBench框架，通过约束驱动合成管道生成可扩展、多样化且可验证的规划数据，用于评估和训练LLMs，并验证其在提升规划能力上的有效性。

详情

AI中文摘要

规划是大型语言模型（LLMs）的一项基本能力，因为这类复杂任务要求模型将目标、约束、资源和长期后果协调成可执行且可验证的解决方案。然而，现有的规划基准通常将规划数据视为固定的实例集合，而非可控的生成目标。这限制了场景覆盖范围，将难度与表面代理而非结构来源挂钩，并且对可扩展生成、自动验证或面向规划的训练支持有限。我们引入PlanningBench，一个用于生成可扩展、多样化且可验证的规划数据的框架，既可用于评估也可用于训练。PlanningBench从真实规划场景出发，将实际工作流程抽象为包含30多种任务类型、子任务、约束族和难度因素的结构化分类体系。在该分类体系的指导下，一个约束驱动的合成管道实例化自包含的规划问题，具备自适应难度控制、质量过滤和实例级验证检查表。这将规划数据构建从固定基准收集转变为可控生成，同时保留现实任务基础。我们使用PlanningBench评估开源和闭源前沿LLMs，发现当前模型在耦合约束下仍难以生成完整解决方案。除评估外，在已验证的PlanningBench数据上进行强化学习可提升在未见规划基准和更广泛的指令遵循任务上的性能。进一步分析表明，确定性或明确指定的最优解提供了更清晰的奖励信号和更稳定的训练动态。总体而言，PlanningBench为诊断和提高LLMs中可泛化的规划能力提供了可控的规划数据来源。

英文摘要

Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited support for scalable generation, automatic verification, or planning-oriented training. We introduce PlanningBench, a framework for generating scalable, diverse, and verifiable planning data for both evaluation and training. PlanningBench starts from real planning scenarios and abstracts practical workflows into a structured taxonomy of more than 30 task types, subtasks, constraint families, and difficulty factors. Guided by this taxonomy, a constraint-driven synthesis pipeline instantiates self-contained planning problems with adaptive difficulty control, quality filtering, and instance-level verification checklists. This shifts planning data construction from fixed benchmark collection to controllable generation while preserving realistic task grounding. We use PlanningBench to evaluate open-source and closed-source frontier LLMs, and find that current models still struggle to produce complete solutions under coupled constraints. Beyond evaluation, reinforcement learning on verified PlanningBench data improves performance on unseen planning benchmarks and broader instruction-following tasks. Further analysis suggests that determinate or well-specified optimal solutions provide clearer reward signals and more stable training dynamics. Overall, PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2601.22538 2026-06-01 cs.LG stat.AP 版本更新

Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models

通过切换状态空间模型在非平稳时间序列中的学习-延迟决策

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

发表机构 * School of Computing（计算机科学学院）； National University of Singapore（新加坡国立大学）； Institute for Infocomm Research（信息通信研究所）； ISAE-SUPAERO ； ONERA ； A*STAR, Singapore（新加坡A*STAR）

AI总结提出L2D-SLDS框架，利用因子化切换线性高斯状态空间模型处理非平稳流式数据，通过共享因子持续更新未查询专家的信念，并设计学习感知查询分数平衡即时成本与信息增益，实现在线学习-延迟决策。

详情

AI中文摘要

学习-延迟决策（L2D）将每个决策路由到系统自身的预测器或外部专家。流式时间序列设置打破了离线L2D的假设：数据是非平稳的，专家可用性随时间变化，内部预测器在线训练。我们提出L2D-SLDS，一种基于因子化切换线性高斯状态空间模型的一阶段在线L2D框架，该模型覆盖所有潜在残差：一个离散状态、一个共享全局因子以及每个专家的特异状态。始终观测的内部残差通过共享因子持续更新关于每个未查询专家的信念，而学习感知查询分数平衡即时成本与潜在状态信息增益以及一步学习者的改进。我们证明了一个针对时变学习-延迟比较器的oracle不等式，将遗憾分解为查询奖励预算、SLDS预测成本误差项$\mathcal{E}_{\mathrm{SLDS}}$以及内部学习者的区间动态遗憾。在合成数据、墨尔本、耶拿和24专家德里基准测试上，L2D-SLDS与上下文和非平稳老虎机基线相比具有竞争力或更优，同时在真实数据轮次中延迟比例低于$2\%$。

英文摘要

Learning-to-defer (L2D) routes each decision to a system's own predictor or to an external expert. Streaming time-series settings break the offline-L2D assumptions: the data are non-stationary, expert availability shifts over time, and the internal predictor is trained online. We propose L2D-SLDS, a one-stage online L2D framework based on a factorized switching linear-Gaussian state-space model over all potential residuals: a discrete regime, a shared global factor, and per-expert idiosyncratic states. The always-observed internal residual continuously updates beliefs about every unqueried expert through the shared factor, and a learner-aware query score balances immediate cost against latent-state information gain and one-step learner improvement. We prove an oracle inequality against a time-varying learn-and-defer comparator, decomposing regret into a query-bonus budget, an SLDS predictive-cost-error term~$\mathcal{E}_{\mathrm{SLDS}}$, and the internal learner's interval dynamic regret. On synthetic, Melbourne, Jena, and 24-expert Delhi benchmarks, L2D-SLDS is competitive with or improves on contextual- and non-stationary-bandit baselines while deferring on ${<}2\%$ of real-data rounds.

URL PDF HTML ☆

赞 0 踩 0

2509.10308 2026-06-01 cs.LG 版本更新

GraphCSVAE: Graph Categorical Structured Variational Autoencoder for Spatiotemporal Auditing of Physical Vulnerability Towards Sustainable Post-Disaster Risk Reduction

GraphCSVAE: 面向可持续灾后风险降低的物理脆弱性时空审计的图类别结构化变分自编码器

Joshua Dimasaka, Christian Geiß, Robert Muir-Wood, Emily So

发表机构 * University of Cambridge（剑桥大学）； Cambridge University Centre for Risk in the Built Environment（剑桥大学建筑环境风险中心）； Earth Observation Center（地球观测中心）； Institute of Geography（地理研究所）

AI总结提出GraphCSVAE框架，通过整合深度学习、图表示和类别概率推断，利用时间序列卫星数据和专家先验，对物理脆弱性进行建模，并在两个灾后地区验证其时空审计能力。

Comments Accepted for publication in Progress in Disaster Science (on May 20, 2026) and at the 8th International Disaster and Risk Conference, IDRC 2025 | Keywords: weakly supervised, graph, categorical, vulnerability, remote sensing, spatiotemporal | The data and code are respectively available at https://doi.org/10.5281/zenodo.16656471 and https://github.com/riskaudit/GraphCSVAE

详情

DOI: 10.1016/j.pdisas.2026.100601

AI中文摘要

在灾害发生后，全球许多机构在监测灾害风险变化方面面临挑战，限制了评估联合国仙台减少灾害风险框架（2015-2030）进展的能力。尽管众多研究通过地球观测和数据驱动方法显著推进了灾害暴露和危险性的大规模建模，但在风险方程中另一个同等重要但具有挑战性的要素——物理脆弱性的建模方面进展仍然有限。为弥补这一空白，我们引入了图类别结构化变分自编码器（GraphCSVAE），这是一个概率数据驱动框架，通过整合深度学习、图表示和类别概率推断，利用时间序列卫星数据集和专家先验来建模物理脆弱性。我们引入了一个弱监督的一阶转移矩阵，以捕捉两个受灾害影响且社会经济弱势地区脆弱性时空分布的变化：孟加拉国受气旋影响的Khurushkul社区和塞拉利昂受泥石流影响的弗里敦市。在两个案例研究中，该框架构建了2016-2023年的大规模图表示，并由于缺乏时间地面真值标签，使用Aitchison距离评估后验成分分布与专家先验的差异。该工作揭示了灾后物理脆弱性的区域动态，为局部时空审计和可持续的灾后风险降低策略提供了宝贵见解。

英文摘要

In the aftermath of disasters, many institutions worldwide face challenges in monitoring changes in disaster risk, limiting assessment of progress towards the UN Sendai Framework for Disaster Risk Reduction 2015-2030. While numerous efforts have substantially advanced the large-scale modeling of hazard and exposure through Earth observation and data-driven methods, progress remains limited in modeling another equally important yet challenging element of the risk equation: physical vulnerability. To address this gap, we introduce Graph Categorical Structured Variational Autoencoder (GraphCSVAE), a probabilistic data-driven framework for modeling physical vulnerability by integrating deep learning, graph representation, and categorical probabilistic inference, using time-series satellite-derived datasets and expert priors. We introduce a weakly supervised first-order transition matrix to capture changes in the spatiotemporal distribution of vulnerability across two disaster-affected and socioeconomically disadvantaged regions: the cyclone-impacted Khurushkul community in Bangladesh and the mudslide-affected city of Freetown in Sierra Leone. Across both case studies, the framework constructs large-scale graph representations spanning 2016-2023 and evaluates posterior compositional distributions against expert priors using Aitchison distance due to the lack of temporal groundtruth labels. The work reveals post-disaster regional dynamics in physical vulnerability, offering valuable insights into localized spatiotemporal auditing and sustainable strategies for post-disaster risk reduction.

URL PDF HTML ☆

赞 0 踩 0

2605.19233 2026-06-01 cs.CR cs.LG quant-ph 版本更新

Quantum Machine Learning for Cyber-Physical Anomaly Detection in Unmanned Aerial Vehicles: A Leakage-Free Evaluation with Proxy-Audited Feature Sets

量子机器学习在无人机网络物理异常检测中的应用：基于代理审计特征集的无泄漏评估

Carlos A. Durán Paredes, Javier E. León Calderón, Nicolás Sánchez Perea, Germán Darío Díaz, Camilo Segura Quintero

发表机构 * Corporation for Aerospace Initiatives, Research and Innovation (CASIRI)（航空航天研究与创新公司）； Department of Electronics Engineering, Universidad Nacional de Colombia（国立哥伦比亚大学电子工程系）； Department of Electronics Engineering, Universidad del Cauca（卡利学院电子工程系）； Department of Physics, Universidad del Cauca（卡利大学物理系）

AI总结针对无人机网络物理攻击，提出无泄漏评估框架，结合分组时间协议、三模式特征审计和混合XGBoost+数据重上传分类器，验证量子增强混合方法的增量优势。

Comments 10 pages, 7 figures, 1 table; open Qiskit 2.x implementation available at https://github.com/Carlosandp/TLM-UAV-Quantum-Anomaly-Detection

详情

AI中文摘要

无人机是网络物理系统，其攻击面涵盖网络化航空电子设备和机载传感器融合：受损的GPS或电池模块可以模拟良性任务段并逃避简单的异常检测器。我们在多传感器TLM:UAV基准上对无人机异常检测的量子机器学习进行了无泄漏评估。三项贡献支持该研究。(i) 一种分组感知时间协议(B2)将数据集划分为十个连续的TimeUS块，并在十个随机种子上进行评估，消除了随机分层分割混合邻近样本所产生的膨胀。(ii) 一种三模式特征审计（完整/宽松/严格）量化了准确度有多少来自瞬时物理信号与上下文代理（累积能量、电池状态、GPS轨迹）。(iii) 在相同预算下，将混合XGBoost+数据重上传(DRU)分类器与五个配对的非线性控制（原始、PCA、多项式-2、随机RBF和未训练的DRU映射）进行基准测试。独立DRU在种子间并不始终匹配最强的经典基线；然而，经过训练的DRU混合模型是唯一一个平均F1宏从完整模式到严格模式向上移动（+0.05）的模型，这一方向性信号由于种子间标准差而无法解释为统计上确定的差异。经过训练的DRU混合模型在无代理评估下还记录了最低的平均误报率，但受所报告的种子间方差影响。我们将其视为一种增量的、可复现的量子增强混合优势，并提供一个开源的Qiskit 2.x实现，作为NISQ时代航空航天系统中网络安全分析的基准。

英文摘要

Unmanned aerial vehicles (UAVs) are cyber-physical systems whose attack surface spans networked avionics and on-board sensor fusion: a compromised GPS or battery module can mimic a benign mission segment and evade naive anomaly detectors. We present a leakage-free evaluation of quantum machine learning for UAV anomaly detection on the multi-sensor TLM:UAV benchmark. Three contributions support the study. (i) A group-aware temporal protocol (B2) partitions the dataset into ten contiguous TimeUS blocks and evaluates over ten seeds, eliminating the inflation produced by random stratified splits that mix neighbouring samples. (ii) A three-mode feature audit (full/loose/strict) quantifies how much accuracy stems from instantaneous physical signals versus contextual proxies (cumulative energy, battery state, GPS trajectory). (iii) A hybrid XGBoost + Data Reuploading (DRU) classifier is benchmarked against five paired non-linear controls (raw, PCA, polynomial-2, random-RBF, and an untrained DRU map) under identical budgets. The standalone DRU does not consistently match the strongest classical baseline across seeds; however, the trained-DRU hybrid is the only model whose mean F1 macro shifts upward from full to strict (+0.05), a directional signal that the per-seed standard deviations prevent from being interpreted as a statistically established difference. The trained-DRU hybrid also records the lowest mean false-alarm rate under proxy-free evaluation, subject to the inter-seed variance reported. We frame this as an incremental, reproducible quantum-enhanced hybrid benefit, and provide an open Qiskit 2.x implementation as a benchmark for cybersecurity analytics in NISQ-era aerospace systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19145 2026-06-01 cs.LG 版本更新

PMF-CL: Pareto-Minimal-Forgetting Continual Learner for Conflicting Tasks

PMF-CL: 面向冲突任务的帕累托最小遗忘持续学习器

Srijith Nair, Atilla Eryilmaz, Jia Liu

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结提出基于多任务学习视角的帕累托最优框架，通过寻找帕累托最优解实现冲突任务下最小化遗忘的持续学习，并推导出适用于线性回归、基函数回归及具有二次上界损失函数的帕累托最小遗忘算法。

Comments 25 pages, 4 figures, 4 algorithms

详情

AI中文摘要

文献中提出了许多持续学习算法来解决机器学习模型中的灾难性遗忘问题（即学习新任务导致先前学习任务性能下降）。尽管所有持续学习方法都使用某种形式的记忆来保留过去任务的信息，但对需要存储哪些信息以最小化灾难性遗忘的基本理解仍然难以捉摸。最近，人们认识到，在存在所有任务共同全局最小化器的强假设下，灾难性遗忘可以完全避免。然而，在实践中，任务很少具有共同的全局最小化器，一定程度的遗忘是不可避免的。本文提出了一个基于多任务学习视角的、原则性且系统化的冲突任务持续学习基础框架。该方法基于寻找帕累托最优解，即根据定义，在帕累托意义上最小化遗忘先前任务的解。我们推导了线性回归和基函数回归的帕累托最小遗忘持续学习算法，以及具有二次上界的一般损失函数（例如逻辑回归）。对于二次问题，PMF-CL使用内存高效的迭代更新，对于具有$d$个参数的模型，静态内存占用为$\mathcal{O}(d^2)$。

英文摘要

In the literature, many continual learning (CL) algorithms have been proposed to address the issue of catastrophic forgetting in ML models (i.e., learning new tasks leads to the loss of performance on previously learned tasks). Although all CL approaches use some form of memory to retain information about past tasks, a grounded understanding of what information needs to be stored to minimize catastrophic forgetting remains elusive. Recently, it has been recognized that under the strong assumption of the existence of a common global minimizer over all tasks, catastrophic forgetting can be completely avoided. However, in practice, tasks rarely have a common global minimizer, and a certain amount of forgetting is inevitable. In this paper, we propose a foundational framework for principled and systematic CL of conflicting tasks using a multi-task learning (MTL) perspective. The approach is based on finding Pareto-optimal solutions, i.e., the solutions which, by definition, minimally forget the previous tasks in the Pareto sense. We derive Pareto-minimal-forgetting CL algorithms for linear and basis-function regression, and general loss functions which have a quadratic upper bound, e.g., logistic regression. For quadratic problems, PMF-CL uses memory-efficient iterative updates with a static memory footage of $\mathcal{O}(d^2)$ for models with $d$ parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.18807 2026-06-01 cs.LG cs.AI 版本更新

Block-Based Double Decoders

基于块的双解码器

Asher Labovich, Benjamin Bradley, Vanessa Alexander, Chaitanya Harsha

发表机构 * Brown University（布朗大学）

AI总结提出基于块的双解码器架构，利用双重因果块注意力掩码实现全损失监督和静态序列打包，结合解码器训练效率与编码器-解码器推理效率，在缩放定律实验中优于编码器-解码器并接近解码器模型，推理时KV缓存和每token计算减少至少2/3。

Comments 8 pages main, 13 pages total

2605.18803 2026-06-01 cs.LG cs.AI 版本更新

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

PROWL: 基于优先遗憾驱动的世界模型学习优化

Ahmet H. Güzel, Jenny Seidenschwarz, Benjamin Graham, Jonathan Sadeghi, Jeffrey Hawke, Ilija Bogunovic

发表机构 * University College London AI Centre（伦敦大学学院人工智能中心）； Odyssey ； University of Basel（巴塞尔大学）

AI总结提出一种KL约束的对抗课程，通过训练策略暴露扩散世界模型的高误差轨迹并持续微调，结合优先对抗轨迹缓冲区，解决被动数据中罕见关键转换的鲁棒性问题。

详情

AI中文摘要

现代动作条件视频世界模型在短期视觉真实性上表现强劲，但在罕见且对交互关键的转换上仍不可靠，而这些转换主导了下游规划和策略性能。由于被动演示数据系统性地对这些高影响区域采样不足，提高鲁棒性需要主动引发模型失败，而非依赖其自然发生。我们引入了一种KL约束的对抗课程，其中训练一个策略来暴露基于扩散的世界模型的高误差轨迹，同时保持接近行为分布。世界模型在这些对抗性发现的轨迹上持续微调，形成一个对抗训练循环，将罕见失败转化为稳定的、接近分布的训练信号，而不会漂移到分布外利用。为了在模型改进时持续对未解决的弱点施加压力，我们提出了一种优先对抗轨迹（PAT）缓冲区，该缓冲区根据预测误差、动作保真度和学习进度对轨迹重新排序，将训练集中在未解决的失败模式上，而不是重复访问已解决的案例。我们在MineRL框架中实现了我们的方法，并在保留的分布外轨迹上进行了评估；PROWL提高了相对于仅在被动数据上训练的模型的鲁棒性，揭示了在弱行为约束下的奖励黑客行为，并证明了有效的对抗世界模型训练关键取决于平衡探索性失败发现与显式行为正则化。我们的结果表明，可扩展的世界模型不仅受益于更大的数据集，还受益于选择性生成信息丰富的训练数据。

英文摘要

Modern action-conditioned video world models achieve strong short-horizon visual realism, yet remain unreliable on rare, interaction-critical transitions that dominate downstream planning and policy performance. Because passive demonstration data systematically under-samples these high-impact regimes, improving robustness requires actively eliciting model failures rather than relying on their natural occurrence. We introduce a KL-constrained adversarial curriculum in which a policy is trained to expose high-error trajectories of a diffusion-based world model while remaining close to the behavior distribution. The world model is continuously fine-tuned on these adversarially discovered trajectories, yielding an adversarial training loop that converts rare failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation. To maintain pressure on unresolved weaknesses as the model improves, we propose a Prioritized Adversarial Trajectory (PAT) buffer that re-ranks trajectories based on prediction error, action fidelity, and learning progress, focusing training on unresolved failure modes rather than repeatedly revisiting solved cases. We implement our approach in the MineRL framework and evaluate it on held-out out-of-distribution trajectories; PROWL improves robustness over models trained on passive data alone, reveals reward-hacking behaviors under weak behavioral constraints, and demonstrates that effective adversarial world-model training critically depends on balancing exploratory failure discovery with explicit behavioral regularization. Our results suggest that scalable world models benefit not only from larger datasets, but also from selectively generating informative training data.

URL PDF HTML ☆

赞 0 踩 0

2605.18606 2026-06-01 cs.LG 版本更新

Physics-Aligned Canonical Equivariant Fourier Neural Operator under Symmetry-Induced Shifts

对称性诱导位移下的物理对齐规范等变傅里叶神经算子

Jiaxiao Xu, Changhong Mou, Yeyu Zhang, Fengxiang He

发表机构 * Shanghai University of Finance and Economics（上海财经大学）； Utah State University（犹他州立大学）； University of Edinburgh（爱丁堡大学）

AI总结提出PACE-FNO，通过李代数坐标估计将输入场对齐到参考帧，再应用标准FNO并恢复目标帧，利用周期性演化方程的连续对称性分离坐标对齐与物理演化，在多种PDE上实现OOD相对误差降低高达12倍。

Comments 36 pages, 14 figures, 10 tables

详情

AI中文摘要

神经算子近似PDE解映射，但未必尊重控制方程的对称性。在分布外（OOD）场景中，标准神经算子通常需要在单个映射中学习坐标对齐和物理演化，这可能会损害泛化能力。我们利用周期性域上演化方程的已知连续对称性来分离这两个角色。我们提出了物理对齐规范等变傅里叶神经算子（PACE-FNO），它通过李代数坐标估计器估计输入帧，将场映射到参考帧，应用标准傅里叶神经算子（FNO），并将预测恢复到目标帧。我们使用有界对称扰动联合训练对齐和算子预测，并在推理时通过可选的低维精化步骤更新估计帧。等变性通过输入和输出变换强制执行，而FNO架构保持不变。在周期性域上的1-D和2-D Burgers、浅水方程和Navier-Stokes方程中，PACE-FNO在分布内（ID）精度上与标准神经算子相当，并在平移和伽利略位移下将分布外（OOD）相对误差比带对称增强的FNO（FNO+Aug）降低多达12倍，在耦合旋转-平移位移下增益较小。消融实验表明，对齐输入和恢复输出帧贡献了大部分OOD增益；推理时精化提供了较小的修正。

英文摘要

Neural operators approximate PDE solution maps, but they need not respect the symmetries of the governing equation. In out-of-distribution (OOD) regimes, a standard neural operator must often learn coordinate alignment and physical evolution within a single map, which can hurt generalization. We use known continuous symmetries of evolution equations on periodic domains to separate these two roles. We propose the Physics-Aligned Canonical Equivariant Fourier Neural Operator (PACE-FNO), which estimates the input frame with a Lie-algebra coordinate estimator, maps the field to a reference frame, applies a standard Fourier Neural Operator (FNO), and restores the prediction to the target frame. We train alignment and operator prediction jointly using bounded symmetry perturbations, with an optional low-dimensional refinement step that updates the estimated frame at inference. Equivariance is enforced by the input and output transformations, while the FNO architecture remains unchanged. Across 1-D and 2-D Burgers, shallow-water, and Navier-Stokes equations on periodic domains, PACE-FNO matches the in-distribution (ID) accuracy of standard neural operators and reduces out-of-distribution (OOD) relative error by up to 12x over FNO with symmetry augmentation (FNO+Aug) under translations and Galilean shifts, with smaller gains for coupled rotation-translation shifts. Ablations show that aligning the input and restoring the output frame account for most OOD gains; inference-time refinement provides a smaller correction.

URL PDF HTML ☆

赞 0 踩 0

2605.18364 2026-06-01 cs.LG math.OC 版本更新

Proximal basin hopping: global optimization with guarantees

近端盆地跳跃：有保证的全局优化

Guillaume Lauga, Cesare Molinari, Samuel Vaiter

发表机构 * LJAD ； MALGA ； Université Côte d’Azur（法国尼斯大学）； Università di Genova（热那亚大学）； CNRS（国家科学研究中心）

AI总结提出近端盆地跳跃（PBH）理论框架，结合近端优化与局部最小化，构建算法以高概率收敛到全局最小值，在合成硬函数和深度学习标度律拟合等实际问题中表现优于有理论保证的已知算法，且维度越高性能差距越大。

2605.18024 2026-06-01 cs.LG cs.AI cs.MA 版本更新

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

交互破坏对抗学习框架用于鲁棒多智能体强化学习

Sunwoo Lee, Mingu Kang, Yonghyeon Jo, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea（人工智能研究生院，UNIST，乌山，韩国）

AI总结提出交互破坏对抗学习框架，从信息论角度构建攻击破坏智能体间交互，并训练智能体在干扰下可靠执行，提升鲁棒性。

Comments 9 pages for main, 33 pages for total, Accepted to ICML 2026

2605.17524 2026-06-01 cs.LG cs.DB 版本更新

Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings

协方差结构与坐标异质性支配对比嵌入的二值量化

Wenxuan Xiao

发表机构 * Changsha University（长沙大学）

AI总结通过分析InfoNCE训练表示的协方差结构，揭示了协方差矩阵的非对角项和坐标异质性如何分别影响二值量化的排序保真度和设计选择，并推导出缩放律以指导系统设计。

Comments 21 pages, 1 figure, 19 tables (6 in main text, 13 in appendix)

详情

AI中文摘要

二值量化（BQ）将高维嵌入压缩为每个坐标一或两个比特，从而实现极速的最近邻搜索。然而，一个显著的谜题仍然存在：BQ在对比嵌入上取得了有竞争力的召回率，但在其他嵌入上却失败——并且两个领先系统采用了截然相反的策略（随机旋转与保留坐标轴），而没有共同的理论解释何时适用哪种策略。我们通过将最近建立的InfoNCE训练表示的Gaussian结构与BQ质量的统计框架联系起来，解决了这个谜题。我们的分析揭示了协方差矩阵的两个不同作用。首先，完整的协方差结构——而不仅仅是其对角线——决定了排序保真度的绝对水平，其中非对角相关性贡献了30-50%的信号。其次，坐标异质性（每个坐标方差的非均匀性）支配着关键设计选择：每个额外比特贡献多少，以及随机旋转是有益还是有害。我们推导了Gaussian模型下排序保真度的近似表达式，表明幅度比特携带与异质性成比例的信息，并表明随机旋转恰好破坏了某个范式所利用的信号，同时创造了另一个范式所需的各向同性。一个现象学缩放律预测了跨模型和维度的保真度。在涵盖9个嵌入家族的18个数据集上的实验支持了主要预测，并据我们所知，为二值量化系统提供了第一个有原则的设计指南。

英文摘要

Binary quantization (BQ) compresses high-dimensional embeddings into one or two bits per coordinate, enabling nearest neighbor search at extreme speed. Yet a striking puzzle persists: BQ achieves competitive recall on contrastive embeddings but fails on others -- and two leading systems adopt diametrically opposite strategies (random rotation vs. preserving coordinate axes) without a common theory explaining when each is appropriate. We address this puzzle by connecting the Gaussian structure recently established for InfoNCE-trained representations to a statistical framework for BQ quality. Our analysis reveals two distinct roles of the covariance matrix. First, the full covariance structure -- not merely its diagonal -- determines the absolute level of ranking fidelity, with off-diagonal correlations contributing 30--50% of the signal. Second, coordinate heterogeneity (the non-uniformity of per-coordinate variances) governs key design choices: how much each additional bit contributes, and whether random rotation helps or hurts. We derive approximate expressions for ranking fidelity under a Gaussian model, show that the magnitude bit carries information proportional to heterogeneity, and show that random rotation destroys precisely the signal that one paradigm exploits while creating the isotropy that the other requires. A phenomenological scaling law predicts fidelity across models and dimensions. Experiments on 18 datasets spanning 9 embedding families support the main predictions and provide, to our knowledge, the first principled design guide for binary quantization systems.

URL PDF HTML ☆

赞 0 踩 0

2605.17373 2026-06-01 cs.LG cs.AI 版本更新

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

FML-bench：从搜索动力学视角对AI研究代理策略的受控研究

Qiran Zou, Hou Hei Lam, Wenhao Zhao, Tingting Chen, Yiming Tang, Samson Yu, Yingtao Zhu, Srinivas Anumasa, Zufeng Zhang, Tianyi Zhang, Chang Liu, Zhengyao Jiang, Anirudh Goyal, Dianbo Liu

发表机构 * National University of Singapore（国立新加坡大学）； Tsinghua University（清华大学）； University of Minnesota（明尼苏达大学）； Weco ； Meta

AI总结本文提出FML-Bench基准，通过分离策略与基础设施并定义过程级指标，评估六种代理策略，发现贪婪爬山法接近最优树搜索，且自适应策略基于搜索密度切换可超越其他代理。

Comments Our benchmark is available at: https://github.com/qrzou/FML-bench

详情

AI中文摘要

AI研究代理通过自动化假设生成、实验和实证改进来加速机器学习研究。现有代理策略从贪婪爬山法到树搜索和进化优化不等，但哪些策略选择驱动性能仍不清楚。回答这个问题需要一个基准，该基准将代理策略（例如搜索拓扑）与执行基础设施（例如代码编辑器）分离，以便性能差异归因于策略而非基础设施，并提供最终分数之外的过程级指标来分析探索行为。现有基准支持有限。我们提出FML-Bench，一个涵盖10个领域18个基础ML研究任务的基准，将代理策略与执行基础设施分离，并定义了12个过程级行为指标。评估六个代表性代理，我们发现：(1) 策略复杂性本身并不能保证强性能：一个简单的贪婪爬山者几乎与最佳性能的树搜索代理相匹配，两者均远高于其余代理；(2) 我们的分析表明，这种模式与改进机会结构相关：当机会密集时，贪婪搜索往往更有效，而当机会稀疏时，树搜索和进化策略往往更有效；基于这一见解构建的自适应代理在检测到改进停滞时切换到更广泛的探索，并优于其他六个代理，初步支持了这一观察；(3) 过程级分析表明，早期收敛和方向聚焦的探索与最终性能显著相关，而解决方案多样性和计算成本则不然。我们的基准可在 https://github.com/qrzou/FML-bench 获取。

英文摘要

AI research agents accelerate ML research by automating hypothesis generation, experimentation, and empirical refinement. Existing agent strategies range from greedy hill-climbing to tree search and evolutionary optimization, yet which strategy choices drive performance remains unclear. Answering this question requires a benchmark that separates agent strategy (e.g., search topology) from execution infrastructure (e.g., code editor), so that performance differences are attributable to strategy rather than infrastructure, and that provides process-level metrics beyond final scores to analyze exploration behaviors. Existing benchmarks offer limited support. We propose FML-Bench, a benchmark of 18 fundamental ML research tasks across 10 domains that separates agent strategy from execution infrastructure and defines 12 process-level behavioral metrics. Evaluating six representative agents, we find that: (1) strategy complexity alone does not guarantee strong performance: a simple greedy hill-climber nearly matches the best-performing tree-search agent, both well above the remaining agents; (2) our analysis suggests this pattern relates to improvement opportunity structure: greedy search tends to be more effective when opportunities are dense, while tree-search and evolutionary strategies tend to be more effective when opportunities are sparse; an adaptive agent built on this insight switches to broader exploration upon detecting improvement stagnation and outperforms the other six agents, lending initial support to this observation; and (3) process-level analysis reveals that early convergence and directionally focused exploration are significantly associated with final performance, while solution diversity and compute cost are not. Our benchmark is available at: https://github.com/qrzou/FML-bench.

URL PDF HTML ☆

赞 0 踩 0

2605.17126 2026-06-01 stat.ML cs.LG stat.ME 版本更新

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety

无需特征值下界的多任务线性回归：自适应性、鲁棒性与安全性

Seok-Jin Kim

发表机构 * Columbia（哥伦比亚大学）

AI总结针对存在污染任务的多任务线性回归问题，提出基于矩阵加权范数正则化的估计器，引入相对平衡条件，在弱谱假设下达到与现有方法相当的预测误差界，并具备安全性保证。

Comments Accepted at ICML 2026

详情

AI中文摘要

我们研究了存在污染任务的多任务线性回归问题。我们处理了大多数任务的未知参数在 $\ell_2$ 范数下接近，而部分任务是任意异常值的情况。现有的理论框架严重依赖于每个任务的经验二阶矩的最小特征值远离零（阶为 $\Omega(1)$）的假设。关键的是，这一假设在许多高维场景中不成立，导致先前的保证无效。为了克服这一限制，我们提出了一种基于矩阵加权范数正则化的估计器。我们还引入了一个相对平衡条件，由平衡常数量化，该条件将每个任务的二阶矩与平均内点几何进行比较，并放宽了对任务级二阶矩下界的需求。在具有适度平衡性的有利情况下，我们的预测 MSE 界在显著更弱的谱假设下匹配 Duan 和 Wang (2023) 的速率；由此得到的任务总体 MSE 在最小化极大意义下是最优的，仅相差对数因子。此外，我们证明了我们的估计器具有安全性保证：当相关的平衡常数很大或无穷大，或者任务不相关时，该方法的表现不会差于独立任务学习。

英文摘要

We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $Ω(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning.

URL PDF HTML ☆

赞 0 踩 0

2605.15706 2026-06-01 cs.LG 版本更新

Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models

可微分的混合智能体激励大型语言模型的群体智能

Xingjian Wu, Junkai Lu, Siyu Yan, Xiangfei Qiu, Jilin Hu, Chenjuan Guo, Bin Yang

发表机构 * East China Normal University（华东师范大学）

AI总结提出可微分的混合智能体（DMoA）框架，通过可微分的上下文感知路由机制动态激活智能体，实现推理过程中的弹性协作，并在9个基准上取得最优性能。

详情

AI中文摘要

大型语言模型（LLMs）的最新进展推动了用于复杂推理任务的多智能体系统（MAS）的发展。然而，现有的MAS通常依赖于预定义或预编译的通信拓扑，这限制了它们对动态任务需求的灵活性和适应性。在这项工作中，我们提出了可微分的混合智能体（DMoA），一个自我进化的多智能体框架，能够在推理过程中实现弹性且自适应的智能体协作。不同于静态构建工作流，DMoA在每个推理步骤动态路由和激活智能体，使系统能够隐式模拟多样化的通信拓扑并适应不断变化的需求。为了实现这一点，我们设计了一个可微分的、上下文感知的路由机制，利用循环结构融入历史和上下文信息，以逐步方式产生稀疏的智能体激活。此外，我们引入预测熵作为自监督信号来优化路由过程，实现了无需外部标注的高效测试时自适应。在9个基准上的广泛实验表明，DMoA在实现最先进性能的同时，展现出强大的效率、鲁棒性和集成能力。

英文摘要

Recent advances in Large Language Models (LLMs) have catalyzed the development of multi-agent systems (MAS) for complex reasoning tasks. However, existing MAS typically rely on pre-defined or pre-compiled communication topologies, which limits their flexibility and adaptability to dynamic task requirements. In this work, we propose Differentiable Mixture-of-Agents (DMoA), a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference. Instead of statically constructing workflows, DMoA dynamically routes and activates agents at each reasoning step, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving demands. To achieve this, we design a differentiable, context-aware routing mechanism that leverages recurrent structures to incorporate historical and contextual information, producing sparse agent activations in a step-wise manner. Furthermore, we introduce predictive entropy as self-supervised signals to optimize the routing process, enabling efficient test-time adaptation without external annotations. Extensive experiments across 9 benchmarks demonstrate that DMoA achieves state-of-the-art performance while exhibiting strong efficiency, robustness, and ensembling capabilities.

URL PDF HTML ☆

赞 0 踩 0

2605.15470 2026-06-01 cs.LG physics.ao-ph 版本更新

Njord: A Probabilistic Graph Neural Network for Ensemble Ocean Forecasting

Njord: 一种用于集合海洋预报的概率图神经网络

Daniel Holmberg, Joel Oskarsson, Erik Wikingsson, Fredrik Lindsten, Teemu Roos

发表机构 * University of Helsinki（赫尔辛基大学）； ETH AI Center（苏黎世联邦理工学院人工智能中心）； Linköping University（利德诺大学）

AI总结提出结合深度潜变量框架和图神经网络的概率模型Njord，在全球和区域海洋实现单次前向传播采样集合预报，并引入K-means聚类网格适应不规则海面几何，相比确定性基线在观测评估中取得更低误差。

Comments Preprint

详情

AI中文摘要

海洋动力学本质上是混沌的，但现有的机器学习海洋模型仅产生确定性预报。我们介绍了Njord，一种用于海洋预报的概率数据驱动模型，适用于全球和区域领域。Njord结合了深度潜变量框架与图神经网络架构，使得每次预报步骤可以在单次前向传播中采样。我们在全球0.25°分辨率和波罗的海区域2 km分辨率上应用Njord。为了扩展到这些大型海洋网格，我们引入了K-means聚类网格，以适应不规则的海面几何。实验表明，与确定性机器学习基线相比，Njord在两个领域均表现出强劲性能，同时通过采样的集合预报提供不确定性估计。在全球OceanBench基准上，Njord在针对真实观测评估时，在上层海洋变量上平均实现了最低误差，其中海表温度预测改进最大。

英文摘要

Ocean dynamics are inherently chaotic, yet existing machine learning ocean models produce only deterministic forecasts. We introduce Njord, a probabilistic data-driven model for ocean forecasting, applicable to both global and regional domains. Njord combines a deep latent variable framework with a graph neural network architecture, enabling sampling each forecast step in a single forward pass. We apply Njord globally at 0.25° resolution and regionally to the Baltic Sea at 2 km resolution. To scale to these large ocean grids we introduce K-means cluster meshes that adapt to irregular sea surface geometry. Experiments demonstrate strong performance on both domains compared to deterministic machine learning baselines, while also providing uncertainty estimates from the sampled ensemble forecasts. On the global OceanBench benchmark, Njord achieves the lowest errors on average across upper-ocean variables when evaluated against real-world observations, with the largest improvements in surface temperature prediction.

URL PDF HTML ☆

赞 0 踩 0

2410.06074 2026-06-01 cs.LG cs.NA math.NA 版本更新

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

可扩展的机械神经网络用于微分方程和机器学习

Jiale Chen, Dingling Yao, Adeel Pervez, Dan Alistarh, Francesco Locatello

发表机构 * Institute of Science and Technology Austria (ISTA)（奥地利科学技术研究所）

AI总结提出可扩展机械神经网络（S-MNN），通过线性化序列长度的计算和空间复杂度，实现高效建模长期动力学，保持精度和可解释性。

Comments Published as a conference paper at the Thirteenth International Conference on Learning Representations (ICLR 2025): https://openreview.net/forum?id=Oazgf8A24z

详情

Journal ref: International Conference on Learning Representations, 2025, pp. 10018-10039

AI中文摘要

我们提出了可扩展机械神经网络（S-MNN），这是一个增强的神经网络框架，专为涉及长时间序列的科学机器学习应用而设计。通过重新表述原始机械神经网络（MNN）（Pervez等人，2024），我们将计算时间和空间复杂度从分别关于序列长度的三次和二次降低到线性。这一显著改进使得在不牺牲准确性或可解释性的情况下高效建模长期动力学成为可能。大量实验表明，S-MNN在精度上与原始MNN相当，同时大幅减少计算资源。因此，S-MNN可以在应用中直接替换原始MNN，为将机械瓶颈集成到复杂动力系统的神经网络模型中提供实用且高效的工具。源代码可在https://github.com/IST-DASLab/ScalableMNN获取。

英文摘要

We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

URL PDF HTML ☆

赞 0 踩 0

2605.11134 2026-06-01 cs.LG cs.AI 版本更新

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

偏好优化中的虚假相关学习：机制、后果及通过平局训练的缓解方法

Christian Moya, Alex Semendinger, Guang Lin, Elliott Thornley

发表机构 * Department of Mathematics, Purdue University, West Lafayette IN, USA（普渡大学数学系）； School of Mechanical Engineering, Purdue University, West Lafayette IN, USA（普渡大学机械工程学院）； Massachusetts Institute of Technology, Cambridge MA, USA（麻省理工学院）

AI总结本文通过统一理论分析揭示了偏好优化（如DPO）中虚假相关学习的机制（均值虚假偏差和因果-虚假相关泄漏），证明其导致分布偏移下的不可逆脆弱性，并提出平局训练数据增强策略以选择性减少虚假学习。

Comments Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, 2026, Seoul, South Korea

AI中文摘要

偏好学习方法（如直接偏好优化DPO）已知会诱导对虚假相关的依赖，导致当前语言模型中的谄媚和长度偏差，并可能在未来系统中造成严重的目标泛化错误。在这项工作中，我们对此现象进行了统一的理论分析，描述了虚假学习的机制、其在部署中的后果以及一种可证明的缓解策略。聚焦于对数线性策略，我们展示了标准偏好学习目标通过两个渠道在总体水平上诱导对虚假特征的依赖：均值虚假偏差和因果-虚假相关泄漏。然后我们表明这种依赖造成了分布偏移的不可逆脆弱性：来自相同训练分布的更多数据无法减少模型对虚假特征的依赖。为了解决这个问题，我们提出了平局训练，一种使用平局（等效用偏好对）的数据增强策略，以引入数据驱动的正则化。我们证明了该方法选择性地减少虚假学习而不降低因果学习。最后，我们在对数线性模型上验证了我们的理论，并提供了实证证据，表明虚假学习机制和平局训练的益处均适用于神经网络和大语言模型。

英文摘要

Preference learning methods like Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenomenon, characterizing the mechanisms of spurious learning, its consequences on deployment, and a provable mitigation strategy. Focusing on log-linear policies, we show that standard preference-learning objectives induce reliance on spurious features at the population level through two channels: mean spurious bias and causal-spurious correlation leakage. We then show that this reliance creates an irreducible vulnerability to distribution shift: more data from the same training distribution fails to reduce the model's dependence on spurious features. To address this, we propose tie training, a data augmentation strategy using ties (equal-utility preference pairs) to introduce data-driven regularization. We demonstrate that this approach selectively reduces spurious learning without degrading causal learning. Finally, we validate our theory on log-linear models and provide empirical evidence that both the spurious learning mechanisms and the benefits of tie training persist for neural networks and large language models.

URL PDF HTML ☆

赞 0 踩 0

2605.02125 2026-06-01 cs.DC cs.LG 版本更新

Langevin动力学的一致投影：在粗粒化模型中保持热力学和动力学

Vahid Nateghi, Lara Neureither, Selma Moqvist, Carsten Hartmann, Simon Olsson, Feliks Nüske

发表机构 * Max-Planck-Institute for Dynamics of Complex Technical Systems（复杂技术系统动力学马克斯-普朗克研究所）； Institute of Mathematics（数学研究所）； Brandenburgische Technische Universität Cottbus-Senftenberg（克托夫-森滕堡技术大学）； Department of Computer Science and Engineering（计算机科学与工程系）； Chalmers University of Technology（挑战大学）； University of Gothenburg（哥德堡大学）

AI总结提出一种基于投影的粗粒化形式，结合生成式扩展动态模式分解和热力学插值，准确捕捉全空间模型的热力学和动力学性质。

详情

DOI: 10.1103/wckl-dz9d

AI中文摘要

粗粒化（CG）是对复杂多尺度系统（如生物分子的构象动力学）进行高效建模和模拟的重要任务。本文针对一般的欠阻尼Langevin动力学，提出了一种基于投影的粗粒化形式。遵循Zwanzig投影方法，我们推导了粗粒化动力学的闭式表达式。此外，我们展示了如何利用在Koopman算子方法背景下开发的生成式扩展动态模式分解（gEDMD）方法来建模CG动力学并评估其动力学性质，如过渡时间尺度。最后，我们将我们的方法与热力学插值（TI）相结合，这是一种在热力学条件之间转换样本的生成方法，从而无需重复数值模拟即可将方法扩展到跨热力学状态。通过一个二维模型系统，我们证明了所提出的方法能够准确捕捉全空间模型的热力学和动力学性质。

英文摘要

Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a projection-based coarse-graining formalism for general underdamped Langevin dynamics. Following the Zwanzig projection approach, we derive a closed-form expression for the coarse grained dynamics. In addition, we show how the generator Extended Dynamic Mode Decomposition (gEDMD) method, which was developed in the context of Koopman operator methods, can be used to model the CG dynamics and evaluate its kinetic properties, such as transition timescales. Finally, we combine our approach with thermodynamic interpolation (TI), a generative approach to transform samples between thermodynamic conditions, to extend the scope of the approach across thermodynamic states without repeated numerical simulations. Using a two-dimensional model system, we demonstrate that the proposed method allows to accurately capture the thermodynamic and kinetic properties of the full-space model.

URL PDF HTML ☆

赞 0 踩 0

2605.08145 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models

自描述多模态交互调优：放大可利用冗余以实现鲁棒的视觉语言模型

Yuriel Ryan, Hei Man Ip, Adriel Kuek, Paul Pu Liang, Roy Ka-Wei Lee

发表机构 * Singapore University of Technology and Design（新加坡科技设计大学）； DSO National Laboratories（国防部国家实验室）； Massachusetts Institute of Technology（麻省理工学院）

AI总结针对视觉语言模型中的幻觉和鲁棒性问题，提出自描述多模态交互调优方法，通过放大模态间冗余信息来补偿受损模态，并设计多模态交互门机制将独特交互转化为冗余交互，实验表明该方法可减少38.3%的视觉诱导错误并提升16.8%的一致性。

Comments Accepted to ICML 2026. Code: https://github.com/yurielryan/Multimodal-Interaction-Tuning

详情

AI中文摘要

当前的视觉语言模型在面对模糊或受损模态时存在幻觉和鲁棒性问题。我们假设这些问题可以通过利用模态间的共享信息来补偿受损模态得到解决。为此，我们分析了多模态交互——模态提供的冗余（共享）、独特（排他）和协同（涌现）任务相关信息——以确定它们对模型可靠性的影响。具体来说，放大冗余交互将增加这种可利用的共享信息以解决这些问题；然而，现代指令数据集通常消除冗余以优先考虑视觉定位。我们通过一个自描述工作流弥合这一差距，该工作流包含一个 extsc{多模态交互门}：一种将独特交互转化为冗余交互的机制。我们的发现表明，增加冗余可以减少38.3%的视觉诱导错误，并提高16.8%的一致性。

英文摘要

Current vision language models face hallucination and robustness issues against ambiguous or corrupted modalities. We hypothesize that these issues can be addressed by exploiting the shared information between modalities to compensate for the impaired one. To this end, we analyze multimodal interactions -- redundant (shared), unique (exclusive), and synergistic (emergent) task-relevant information provided by the modalities -- to determine their impacts on model reliability. Specifically, amplifying redundant interactions would increase this exploitable shared information to resolve these issues; yet, modern instruction datasets often eliminate redundancies to prioritize visual grounding. We bridge this gap through a self-captioning workflow featuring a \textsc{Multimodal Interaction Gate}: a mechanism to convert unique interactions into redundant interactions. Our findings suggest that increasing redundancy can reduce visual induced errors by 38.3\% and improve consistency by 16.8\%.

URL PDF HTML ☆

赞 0 踩 0

2605.06831 2026-06-01 cs.LG cs.AI 版本更新

Why DDIM Hallucinates More Than DDPM: A Theoretical Analysis of Reverse Dynamics

为什么DDIM比DDPM更容易产生幻觉：反向动力学的理论分析

Muhammad H. Ashiq, Samanyu Arora, Abhinav N. Harish, Ishaan Kharbanda, Hung Yun Tseng, Grigorios G. Chrysos

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结通过理论分析高斯混合目标下的反向ODE（DDIM）和SDE（DDPM），证明在临界时间τ后DDIM会卡在两个最近模式之间的线段上，而DDPM的随机性帮助其脱离该区域从而避免幻觉。

Comments Accepted in ICML

2605.06137 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Autoregressive Visual Generation Needs a Prologue

自回归视觉生成需要一个序幕

Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； hi-Lab, Xiaohongshu Inc（小红书实验室）

AI总结提出Prologue方法，通过生成前置的序幕令牌来弥合自回归图像生成中的重建-生成差距，在不影响重建质量的前提下显著提升生成性能。

Comments Code: https://github.com/Zyriix/prologue Demo: https://huggingface.co/spaces/Zyriix/prologue-demo

详情

AI中文摘要

在这项工作中，我们提出了Prologue，一种弥合自回归（AR）图像生成中重建-生成差距的方法。Prologue不修改视觉令牌以同时满足重建和生成，而是生成一小部分序幕令牌，并将其前置到视觉令牌序列之前。这些序幕令牌仅使用AR交叉熵（CE）损失进行训练，而视觉令牌则专用于重建。这种解耦设计使我们能够通过AR模型的真实分布优化生成，而不影响重建质量，我们进一步从ELBO角度形式化了这一点。在ImageNet 256x256上，Prologue-Base在没有无分类器引导的情况下将gFID从21.01降至10.75，同时几乎保持重建不变；Prologue-Large使用标准AR模型，无需辅助语义监督，达到了具有竞争力的rFID 0.99和gFID 1.46。有趣的是，仅由AR梯度驱动，序幕令牌展现出涌现的语义结构：对16个序幕令牌进行线性探测达到35.88%的Top-1准确率，远高于标准分词器前16个令牌的23.71%；使用固定序幕令牌进行重采样保留了相似的高层语义布局。我们的结果暗示了一个新方向：通过引入单独学习的生成表示，同时保持原始表示不变，可以提升生成质量。

英文摘要

In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further formalize from an ELBO perspective. On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision. Interestingly, driven only by AR gradients, prologue tokens exhibit emergent semantic structure: linear probing on 16 prologue tokens reaches 35.88% Top-1, far above the 23.71% of the first 16 tokens from a standard tokenizer; resampling with fixed prologue tokens preserves a similar high-level semantic layout. Our results suggest a new direction: generation quality can be improved by introducing a separate learned generative representation while leaving the original representation intact.

URL PDF HTML ☆

赞 0 踩 0

2605.05520 2026-06-01 cs.LG stat.AP stat.ML 版本更新

Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

使用商业微波链路和扩散模型先验的贝叶斯雨场重建

Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati, Hagit Messer, Eric Moulines

发表机构 * School of Electrical and Computer Engineering, Tel Aviv University, Tel Aviv, Israel（电气与计算机工程学院，特拉维夫大学，特拉维夫，以色列）

AI总结提出将雨场重建视为贝叶斯逆问题，利用扩散模型作为高保真空间先验，通过无需训练的后验采样方法（如即插即用、序贯蒙特卡洛和副本交换）实现优于传统方法的性能。

Comments Added link to source code

详情

Journal ref: ICML 2026

AI中文摘要

商业微波链路（CML）为降雨感知提供了密集的空间覆盖，但其产生的路径积分测量使得精确的地面重建具有挑战性。现有方法通常将CML简化为点传感器，并忽略降雨与信号衰减之间的线积分关系，导致在非均匀降水条件下性能下降。在这项工作中，我们将雨场重建视为一个贝叶斯逆问题，使用扩散模型（DM）作为高保真空间先验。我们表明，与删失高斯过程相比，扩散模型能更好地保留关键降雨统计量。将降雨估计视为具有DM先验的贝叶斯逆问题，使得可以使用广泛的无需训练的后验采样方法，包括即插即用、序贯蒙特卡洛和副本交换方法。在合成和真实世界数据集上的实验表明，与基于CML的现有重建基线相比，该方法具有一致的改进。

英文摘要

Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-fidelity spatial priors. We show that diffusion models better preserve key rainfall statistics compared to censored Gaussian processes. Framing rainfall estimation as a Bayesian inverse problem with a DM prior enables training-free posterior sampling using a broad family of methods, including Plug-and-Play, Sequential Monte Carlo, and Replica Exchange methods. Experiments on synthetic and real-world datasets demonstrate consistent improvements over established CML-based reconstruction baselines.

URL PDF HTML ☆

赞 0 踩 0

2510.03096 2026-06-01 cs.LG 版本更新

成本感知学习

Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour

发表机构 * Kempner Institute（凯姆纳研究所）； Harvard University（哈佛大学）； Google Research（谷歌研究）； Tel Aviv University（特拉维夫大学）

AI总结针对有限和优化中不同组件采样成本不同的问题，提出基于梯度范数和成本的采样分布算法Cost-Aware SGD，并应用于语言模型强化学习，显著降低策略优化中的token使用量。

详情

AI中文摘要

我们考虑成本感知学习问题，其中对有限和目标的各个组件进行采样会产生不同的成本。目标是在最小化总成本的同时达到目标误差。我们提出了成本感知SGD，它使用基于梯度范数和成本的分布来采样组件。我们对该算法进行了深入分析，包括相对于基线的成本改进界限、分布代理次优性的刻画以及下界。我们将理论见解应用于语言模型的强化学习，其中序列级策略梯度的计算成本随长度变化。我们发现优势幅度作为梯度范数的高保真代理，并据此引入成本感知GRPO。在1.5B、4B和8B LLM上的实验结果表明，该算法显著减少了策略优化中使用的token数量，同时匹配或超过基线准确率。

英文摘要

We consider the problem of Cost-Aware Learning, where sampling different components of a finite-sum objective incurs different costs. The objective is to reach a target error while minimizing the total cost. We propose Cost-Aware SGD, which uses a distribution based on gradient norms and costs to sample components. We provide a thorough analysis of this algorithm, including cost-improvement bounds over baselines, a characterization of distribution proxy sub-optimality, and a lower bound. We apply our theoretical insights to reinforcement learning with language models, where the computational cost of sequence-level policy gradients varies with length. We find that the advantage magnitude serves as a high-fidelity proxy for gradient norms, and use this to introduce Cost-Aware GRPO. Empirical results on 1.5B, 4B, and 8B LLMs demonstrate that this algorithm significantly reduces the tokens used in policy optimization while matching or exceeding baseline accuracy.

URL PDF HTML ☆

赞 0 踩 0

2604.23436 2026-06-01 stat.ML cs.LG math.OC stat.CO 版本更新

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

带有Nesterov加速草图的在线牛顿方法的推断

Haoxuan Wang, Xinchen Du, Sen Na

发表机构 * School of Industrial and Systems Engineering, Georgia Institute of Technology（工业与系统工程系，佐治亚理工学院）

AI总结针对在线牛顿方法推断计算成本高的问题，提出结合Hessian平均与Nesterov加速草图投影求解器的方法，在保持一阶方法$O(d^2)$复杂度下实现二阶方法的鲁棒性，并建立了全局收敛性、渐近正态性和在线协方差估计器。

Comments 52 pages, 2 tables, 3 figures; accepted at ICML 2026

详情

AI中文摘要

基于流式数据的可靠决策需要对在线方法进行原则性的不确定性量化。虽然一阶方法能够实现高效的迭代更新，但其推断过程仍需更新适当的（协方差）矩阵，导致$O(d^2)$的时间和内存复杂度，并且对问题的病态性和噪声异质性敏感。这一昂贵的推断任务为更鲁棒的二阶方法提供了机会，然而二阶方法受限于求解牛顿系统所需的$O(d^3)$复杂度。在本文中，我们通过研究一种带有Hessian平均的在线牛顿方法来解决这一差距，其中每一步的牛顿方向使用带有Nesterov加速的草图投影求解器近似计算，匹配了一阶方法的$O(d^2)$复杂度。对于所提出的方法，我们量化了来自随机数据和随机计算的不确定性。在标准光滑性和矩条件下，我们建立了全局几乎必然收敛性，证明了最后迭代的渐近正态性，其极限协方差由Lyapunov方程刻画，并开发了一个完全在线的协方差估计器，具有非渐近收敛保证。我们还将所得的不确定性量化与没有Nesterov加速的精确和草图牛顿方法联系起来。在回归模型上的大量实验证明了所提出方法在在线推断中的优越性。

英文摘要

Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.

URL PDF HTML ☆

赞 0 踩 0

2604.22794 2026-06-01 eess.SY cs.LG cs.SY 版本更新

Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations

通过专家演示加速风电场控制的强化学习

Marcus Binder Nilsen, Julian Quick, Tuhfe Göçmen, Nikolay Dimitrov, Pierre-Elouan Réthoré

发表机构 * Department of Wind and Energy Systems, Technical University of Denmark（丹麦技术大学风能与能源系统系）

AI总结提出一种利用稳态尾流模型生成的专家演示预训练方法，通过行为克隆初始化Soft Actor-Critic网络，消除初始学习阶段，使初始性能接近基线水平，并在在线微调后超越查表控制器。

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

详情

DOI: 10.1088/1742-6596/3224/3/032016
Journal ref: J. Phys.: Conf. Ser. 3224 032016 (2026)

AI中文摘要

强化学习为自适应风电场流量控制提供了一种有前景的方法，但其实际部署受到训练收敛缓慢和初始性能差的阻碍，如果直接部署未经训练的智能体，这些因素可能导致多年的功率输出减少。本文研究了稳态尾流模型中的领域知识是否可以加速强化学习训练并提高初始控制器性能。我们提出了一种预训练方法，其中通过在动态尾流模拟（WindGym）中部署基于PyWake的稳态优化器生成专家演示，然后通过行为克隆初始化Soft Actor-Critic智能体的演员和评论家网络。在2x2风电场上的实验表明，预训练消除了代价高昂的初始学习阶段：未经训练的智能体性能比贪婪零偏航基线低约12%，而预训练将初始性能提升至接近基线水平。在在线微调过程中，所有配置在250,000个环境步骤内收敛到相似性能，最终超过查表控制器，后者在500,000步后达到约7%的功率增益。

英文摘要

Reinforcement learning (RL) offers a promising approach for adaptive wind farm flow control, yet its practical deployment is hindered by slow training convergence and poor initial performance, factors that could translate to years of reduced power output if an untrained agent were deployed directly. This work investigates whether domain knowledge from steady-state wake models can accelerate RL training and improve initial controller performance. We propose a pretraining methodology in which expert demonstrations are generated by deploying a PyWake-based steady-state optimizer within a dynamic wake simulation (WindGym), then used to initialize both the actor and critic networks of a Soft Actor-Critic agent via behavior cloning. Experiments on a 2x2 wind farm show that pretraining eliminates the costly initial learning phase: while an untrained agent underperforms the greedy zero-yaw baseline by approximately 12%, pretraining raises initial performance to near-baseline levels. During online fine-tuning, all configurations converge within 250,000 environment steps to achieve similar performance, ultimately exceeding that of a lookup-table controller, which reaches approximately 7% power gain after 500,000 steps.

URL PDF HTML ☆

赞 0 踩 0

2604.22722 2026-06-01 cs.IR cs.AI cs.LG 版本更新

Aligning Dense Retrievers with LLM Utility via Distillation

通过蒸馏将稠密检索器与LLM效用对齐

Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai, Maksims Volkovs, Ga Wu

发表机构 * Dalhousie University（达尔豪西大学）

AI总结提出Utility-Aligned Embeddings (UAE)框架，通过蒸馏LLM的困惑度降低效用分布来训练双编码器，在不增加测试时LLM推理开销的情况下提升稠密检索的精度和效率。

详情

AI中文摘要

稠密向量检索是检索增强生成（RAG）的实用支柱，但相似性搜索可能受限于精度。相反，利用LLM重排序的基于效用的方法通常能实现更优性能，但计算成本高且易受困惑度估计中固有噪声的影响。我们提出Utility-Aligned Embeddings (UAE)，一个旨在将这些优势融合为实用、高性能检索方法的框架。我们将检索表述为分布匹配问题，使用Utility-Modulated InfoNCE目标训练双编码器以模仿由困惑度降低导出的效用分布。该方法将分级效用信号直接注入嵌入空间，无需测试时LLM推理。在QASPER基准上，UAE在召回率@1上提升30.59%，MAP提升30.16%，Token F1提升17.3%，优于强语义基线BGE-Base。关键的是，UAE比高效的LLM重排序方法快180倍以上，同时保持竞争性能，表明将检索与生成效用对齐能在规模上产生可靠的上下文。

英文摘要

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

URL PDF HTML ☆

赞 0 踩 0

2604.09429 2026-06-01 cs.CV cs.AI cs.LG 版本更新

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

射线即像素：学习视频与相机轨迹的联合分布

Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang

发表机构 * Meta AI

AI总结提出一种视频扩散模型（Rays as Pixels），通过将相机表示为密集射线像素（raxels）并与视频帧共享潜在空间，联合去噪实现相机轨迹预测和相机控制视频生成。

Comments Accepted to ICML 2026. 9-page main paper plus supplementary material. Project page: https://wbjang.github.io/raysaspixels/

详情

AI中文摘要

从图像恢复相机参数和从新视角渲染场景在计算机视觉和图形学中被视为独立任务。当图像覆盖稀疏或姿态模糊时，这种分离会失效，因为每个任务依赖于另一个任务的输出。我们提出Rays as Pixels，一种视频扩散模型（VDM），学习视频和相机轨迹的联合分布。据我们所知，这是首个在单一框架内预测相机姿态并进行相机控制视频生成的模型。我们将每个相机表示为密集射线像素（raxels），这是一种与视频帧位于同一潜在空间的像素对齐编码，并通过解耦自交叉注意力机制联合去噪两者。一个训练好的模型处理三个任务：从视频预测相机轨迹、沿预定义轨迹从输入图像生成视频、以及从输入图像联合合成视频和轨迹。我们在姿态估计和相机控制视频生成上进行评估，并引入闭环自一致性测试，显示模型预测的姿态及其基于这些姿态的渲染结果一致。与Plücker嵌入的消融实验证实，将相机与视频共享潜在空间显著更有效。

英文摘要

Recovering camera parameters from images and rendering scenes from novel viewpoints have been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task depends on what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. To our knowledge, this is the first model to predict camera poses and do camera-controlled video generation within a single framework. We represent each camera as dense ray pixels (raxels), a pixel-aligned encoding that lives in the same latent space as video frames, and denoise the two jointly through a Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, generating video from input images along a pre-defined trajectory, and jointly synthesizing video and trajectory from input images. We evaluate on pose estimation and camera-controlled video generation, and introduce a closed-loop self-consistency test showing that the model's predicted poses and its renderings conditioned on those poses agree. Ablations against Plücker embeddings confirm that representing cameras in a shared latent space with video is subtantially more effective.

URL PDF HTML ☆

赞 0 踩 0

2604.18587 2026-06-01 cs.LG cs.AI cs.LO cs.PL 版本更新

Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs

编译以压缩：通过编译器输出提升形式定理证明器

Guchan Li, Rui Tian, Hongning Wang

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China（清华大学计算机科学与技术系）

AI总结利用编译器将大量证明尝试压缩为结构化失败模式，提出一种学习-精炼框架，通过树搜索基于验证器反馈局部修正错误，在可比测试时预算下在PutnamBench上达到最先进性能。

详情

AI中文摘要

大型语言模型在形式定理证明中展现出显著潜力，但最先进的性能往往需要通过大量展开或扩展上下文窗口来实现令人望而却步的测试时计算。在这项工作中，我们通过利用形式验证中的一种信息结构来解决这一可扩展性瓶颈：观察到编译器将大量不同的证明尝试空间映射到一组紧凑的结构化失败模式。我们引入了一个学习-精炼框架，利用这种压缩来执行高效的学习和证明探索。我们执行树搜索，根据明确的验证器反馈局部修正错误，从而避免了积累长历史证明尝试的相关成本。大量评估表明，我们的方法在不同规模上持续增强了基础证明器的推理能力。值得注意的是，在可比较的测试时预算下，我们的方法在PutnamBench上达到了公开报告的约80亿和约320亿参数模型中的最先进性能，为下一代验证器引导推理提供了一种可扩展的范式。

英文摘要

Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured failure modes. We introduce a learning-to-refine framework that leverages this compression to perform efficient learning and proof exploration. We perform tree search that corrects errors locally conditioned on explicit verifier feedback, thereby circumventing the costs associated with accumulating a long history of proof attempts. Extensive evaluations show that our method consistently amplifies the reasoning capabilities of base provers across varying scales. Notably, our approach achieves state-of-the-art performance on PutnamBench among publicly reported $\sim$8B and $\sim$32B parameter models under comparable test-time budgets, offering a scalable paradigm for next-generation verifier-guided reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.17551 2026-06-01 cs.LG cs.AI 版本更新

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

SVL：目标条件强化学习作为生存学习

Franki Nguimatsia Tiofack, Fabian Schramm, Théotime Le Hellard, Justin Carpentier

发表机构 * Inria（法国国家信息与自动化研究所）； École Normale Supérieure, PSL Research University, Paris, France（巴黎高等师范学院，PSL研究大学）

AI总结提出生存价值学习（SVL），通过将时间到目标建模为概率分布，将目标条件强化学习重构为生存学习问题，并利用危险模型进行最大似然估计，在离线基准上匹配或超越强基线方法。

Comments Accepted to the 43rd International Conference on Machine Learning, Seoul, South Korea

详情

AI中文摘要

标准的目标条件强化学习（GCRL）方法依赖于时间差分学习，由于自举可能导致不稳定和样本效率低下。虽然最近的工作探索了对比和监督公式以提高稳定性，但我们提出了一种概率替代方案，称为生存价值学习（SVL），通过将每个状态到目标的时间建模为概率分布，将GCRL重新定义为生存学习问题。这种结构化的分布蒙特卡洛视角产生了一个闭式恒等式，将目标条件价值函数表示为生存概率的折扣和，从而通过危险模型在事件和右删失轨迹上进行最大似然估计来实现价值估计。我们引入了三种实用的价值估计器，包括有限视界截断和两种分箱无限视界近似，以捕捉长视界目标。在离线GCRL基准上的实验表明，SVL与层次化演员结合，匹配或超越了强大的层次化TD和蒙特卡洛基线，在复杂的长视界任务上表现出色。网页和代码：https://simple-robotics.github.io/publications/survival-value-learning/

英文摘要

Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a probability distribution. This structured distributional Monte Carlo perspective yields a closed-form identity that expresses the goal-conditioned value function as a discounted sum of survival probabilities, enabling value estimation via a hazard model trained via maximum likelihood on both event and right-censored trajectories. We introduce three practical value estimators, including finite-horizon truncation and two binned infinite-horizon approximations to capture long-horizon objectives. Experiments on offline GCRL benchmarks show that SVL combined with hierarchical actors matches or surpasses strong hierarchical TD and Monte Carlo baselines, excelling on complex, long-horizon tasks. Webpage and Code: https://simple-robotics.github.io/publications/survival-value-learning/

URL PDF HTML ☆

赞 0 踩 0

2604.16278 2026-06-01 cs.AI cs.CL cs.LG 版本更新

Learning to Reason with Insight for Informal Theorem Proving

学习在非形式定理证明中进行洞察推理

Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song

发表机构 * City University of Hong Kong（香港城市大学）； Tsinghua University（清华大学）； Ke Holdings Inc.（Ke控股公司）； Shenzhen University of Advanced Technology（深圳先进技术大学）； Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结针对非形式定理证明中缺乏洞察（识别核心技巧）的瓶颈，提出统一训练框架DeepInsight，通过分层数据集、渐进式多阶段SFT和基于洞察的策略优化方法，显著提升大语言模型的数学推理能力。

详情

AI中文摘要

尽管大多数自动定理证明方法依赖于形式证明系统，但非形式定理证明能更好地发挥大语言模型（LLMs）在自然语言处理方面的优势。在这项工作中，我们识别出非形式定理证明的一个主要瓶颈是缺乏洞察，即难以识别解决复杂问题所需的核心技巧。为了解决这个问题，我们提出了$ exttt{DeepInsight}$，一个统一的训练框架，旨在培养这种基本的推理技能，并使LLMs能够进行洞察推理。我们的框架由三个部分组成：（1）$ exttt{DeepInsightTheorem}$，一个分层数据集，通过显式提取核心技巧和证明草图以及最终证明来结构化非形式证明；（2）渐进式多阶段SFT策略，模拟人类学习过程，教授模型证明写作、规划和洞察识别；（3）$ exttt{InsightPO}$，一种策略优化方法，在此洞察层次结构上分配结构化奖励。我们在具有挑战性的数学基准上的实验表明，这种洞察感知的生成策略显著优于基线。这些结果表明，教模型识别和应用核心技巧可以大幅提高其数学推理能力。

英文摘要

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose $\texttt{DeepInsight}$, a unified training framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. Our framework consists of three components: (1) $\texttt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof; (2) a Progressive Multi-Stage SFT strategy that mimics the human learning process, teaching the model proof writing, planning, and insight identification; and (3) $\texttt{InsightPO}$, a policy optimization method that assigns structured rewards over this insight hierarchy. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantly outperforms baselines. These results demonstrate that teaching models to identify and apply core techniques can substantially improve their mathematical reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.15959 2026-06-01 cs.LG 版本更新

Multi-Objective Bayesian Optimization via Adaptive \varepsilon-Constraints Decomposition

基于自适应 ε-约束分解的多目标贝叶斯优化

Yaohong Yang, Sammie Katt, Samuel Kaski

发表机构 * Department of Computer Science, Aalto University, Espoo, Finland（阿尔托大学计算机科学系，芬兰 Espoo）； ELLIS Institute Finland（芬兰 ELLIS 机构）； Department of Computer Science, University of Manchester, Manchester, United Kingdom（曼彻斯特大学计算机科学系，英国 Manchester）

AI总结提出STAGE-BO方法，通过自适应ε-约束分解将多目标优化转化为序列约束子问题，实现均匀帕累托覆盖并处理约束与偏好。

Comments 24 pages, 22 figures, 4 tables. Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

多目标贝叶斯优化（MOBO）为优化多个昂贵的黑箱函数提供了一个原则性框架。然而，现有的MOBO方法通常在覆盖性、可扩展性以及处理约束和偏好方面存在困难。在这项工作中，我们提出了STAGE-BO，即顺序目标自适应间隙填充ε-约束贝叶斯优化：通过分析代理帕累托前沿的覆盖性，我们的方法识别出具有最大未覆盖间隙的帕累托前沿点，并使用其坐标在ε-约束方法中定义自适应约束，从而将问题转化为一系列不等式约束子问题，并通过约束期望改进采集函数高效求解。我们的方法无需超体积计算即可实现均匀的帕累托覆盖，并自然地处理约束和偏好。在合成和真实世界基准上的实验表明，与最先进的基线相比，我们的方法具有优越的覆盖性和具有竞争力的超体积性能。我们的代码实现可在https://github.com/YangYaohong1/STAGE-BO找到。

英文摘要

Multi-objective Bayesian optimization (MOBO) provides a principled framework for optimizing multiple expensive black-box functions. However, existing MOBO methods often struggle with coverage, scalability, and handling constraints and preferences. In this work we propose STAGE-BO, Sequential Targeting Adaptive Gap-Filling $\varepsilon$-Constraint Bayesian Optimization: by analyzing the coverage of the surrogate Pareto front, our method identifies the Pareto front point with the largest uncovered gap, and uses its coordinates to define adaptive constraints in $\varepsilon$-constraint method, which transforms the problem into a sequence of inequality-constrained subproblems, efficiently solved via constrained expected improvement acquisition. Our approach provides uniform Pareto coverage without hypervolume computation and naturally handles constraints and preferences. Experiments on synthetic and real-world benchmarks demonstrate superior coverage and competitive hypervolume performance against state-of-the-art baselines. Our code implementation can be found at https://github.com/YangYaohong1/STAGE-BO.

URL PDF HTML ☆

赞 0 踩 0

2604.11613 2026-06-01 cs.LG cs.AI 版本更新

Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification

对称性揭示逐层动力学：Transformer如何执行上下文分类

Patrick Lutz, Themistoklis Haris, Arjun Chandra, Aditya Gangrade, Venkatesh Saligrama

发表机构 * Boston University, Departments of Computer Science

AI总结通过强制特征和标签排列等变性，从Transformer中提取出显式的深度索引递归更新规则，揭示了上下文分类的几何驱动算法。

Comments appears in the Proceedings of the 43rd International Conference on Machine Learning (ICML '26)

2604.09412 2026-06-01 stat.ML cond-mat.dis-nn cs.LG 版本更新

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

高维两层ReLU神经网络损失景观中局部极小值的精确描述

Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli

发表机构 * Physics, Chalmers University of Technology ； University of Gothenburg ； Ecole Normale Superieure, PSL \& CNRS ； Engineering, Chalmers University of Technology ； School of Computer Science ； Applied Mathematics, University of the Witwatersrand

AI总结本文通过总结统计量精确刻画了高维两层ReLU神经网络损失景观中的局部极小值，并建立了与单次SGD的关联，揭示了过参数化对极小值稳定性和可达性的影响。

Comments 29 pages, 18 figures. Accepted as a conference paper at ICML 2026

详情

AI中文摘要

我们研究了在可实现教师-学生设置下，具有高斯协变量的形式为$\sum_{k=1}^K \mathrm{ReLU}(w_k^ op x)$的两层ReLU网络的总体损失景观。我们证明局部极小值在总结统计量方面允许精确的低维表示，从而对景观产生清晰且可解释的描述。我们进一步建立了与单次SGD的直接联系：局部极小值对应于总结统计量空间中动力学的吸引不动点。这一视角揭示了极小值分组成离散族的层次结构，并展示了过参数化如何改变它们在基于梯度动力学下的稳定性和可达性。在这种过参数化机制下，全局极小值变得越来越可访问，吸引动力学并减少收敛到虚假解。总的来说，我们的结果揭示了常见简化假设的内在局限性，这些假设即使在最小的神经网络模型中也可能遗漏损失景观的基本特征。

英文摘要

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical organisation of minima into discrete families and shows how overparameterisation changes their stability and reachability under gradient-based dynamics. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

URL PDF HTML ☆

赞 0 踩 0

2503.09315 2026-06-01 cs.LG 版本更新

利用流从神经动力学中识别连接分布

Timothy Doyeon Kim, Ulises Pereira-Obilinovic, Yiliu Wang, Eric Shea-Brown, Uygar Sümbül

发表机构 * Allen Institute, Seattle, WA, USA（艾伦研究所）； University of Washington, Seattle, WA, USA（华盛顿大学）

AI总结针对低秩循环神经网络（lrRNN）中连接结构不可识别的问题，提出基于最大熵和连续归一化流（CNF）的推理框架，通过流匹配训练学习连接权重分布，以无偏方式匹配观测动力学，并应用于合成数据和真实神经记录。

详情

AI中文摘要

连接结构塑造了神经计算，但从群体记录中推断这种结构是退化的：多种连接结构可以产生相同的动力学。最近的工作使用低秩循环神经网络（lrRNN）从观测活动中推断低维潜在动力学和连接，从而对动力学进行机制性解释。然而，训练lrRNN的标准方法可能会恢复与潜在动力学无关的虚假结构。我们首先刻画了lrRNN中连接结构的可识别性，并确定了唯一解存在的条件。为了找到这样的解，我们开发了一个基于最大熵和连续归一化流（CNF）的推理框架，通过流匹配进行训练。我们的方法不是估计单个连接矩阵，而是学习一个连接权重的分布，该分布在不可识别分量上最大程度地无偏，同时匹配观测动力学。这种方法捕捉了复杂但必要的分布，例如经验数据中发现的重尾连接。我们在具有产生多稳态吸引子、极限环和环吸引子的连接结构的合成数据集上验证了我们的方法，并展示了其在决策过程中大鼠额叶皮层记录中的适用性。我们的框架将电路推断从恢复连接转变为识别哪些连接结构是计算上必需的，哪些是欠约束推断的产物。

英文摘要

Connectivity structure shapes neural computation, but inferring this structure from population recordings is degenerate: multiple connectivity structures can generate identical dynamics. Recent work uses low-rank recurrent neural networks (lrRNNs) to infer low-dimensional latent dynamics and connectivity from observed activity, enabling a mechanistic interpretation of the dynamics. However, standard approaches for training lrRNNs can recover spurious structures irrelevant to the underlying dynamics. We first characterize the identifiability of connectivity structures in lrRNNs and determine conditions under which a unique solution exists. To find such solutions, we develop an inference framework based on maximum entropy and continuous normalizing flows (CNFs), trained via flow matching. Instead of estimating a single connectivity matrix, our method learns a distribution over connection weights that is maximally unbiased over unidentifiable components while matching the observed dynamics. This approach captures complex yet necessary distributions such as heavy-tailed connectivity found in empirical data. We validate our method on synthetic datasets with connectivity structures that generate multistable attractors, limit cycles, and ring attractors, and demonstrate its applicability in recordings from rat frontal cortex during decision-making. Our framework shifts circuit inference from recovering connectivity to identifying which connectivity structures are computationally required, and which are artifacts of underconstrained inference.

URL PDF HTML ☆

赞 0 踩 0

2603.23977 2026-06-01 cs.LG cs.AI 版本更新

Circuit-Inspired High-Order Neural Networks with Unified Neural Dynamics Modeling for PDE Solving and Visual Perception

电路启发的具有统一神经动力学建模的高阶神经网络用于PDE求解与视觉感知

Tongfei Chen, Jingying Yang, Linlin Yang, Juan Zhang, Jinhu Lü, David Doermann, Chunyu Xie, Long He, Tian Wang, Guodong Guo, Baochang Zhang

发表机构 * Communication University of China（通信大学）； AI Research, Qihoo 360（360人工智能研究院，奇虎360）； Eastern Institute of Technology, Ningbo（宁波工程技术院）

AI总结提出电路启发的高阶神经网络（CHONN），通过基尔霍夫级联组合实现高阶动力学算子，在PDE求解、长期物理预测和ImageNet-1K识别中提升结构保真度和稳定性。

详情

AI中文摘要

深度网络通常依赖架构启发式方法来塑造表示演化，限制了其对由内在动力学支配的数据的建模能力。我们提出了电路启发的高阶神经网络（CHONN），这是一个模块化框架，将表示演化视为一个潜在势过程，并通过基尔霍夫启发的级联组合增加其有效阶数。单个基尔霍夫神经单元实现稳定的一阶更新，而串行组合的单元在一个块内形成高阶动力学算子。这种构造是可解释的、数值稳定的，并且与常见的神经骨干网络兼容。理论分析表明，级联单元诱导出端到端的高阶算子，控制实验证明块内高阶构造不同于通用深度堆叠，特别是在导数敏感度量上。在稳态算子学习、长期物理预测和ImageNet-1K识别中，CHONN提高了结构保真度、滚动稳定性和视觉表示学习。这些结果将高阶电路组合确定为神经动力学建模的一般原则。

英文摘要

Deep networks often rely on architectural heuristics to shape representation evolution, limiting their ability to model data governed by intrinsic dynamics. We present the Circuit-inspired High-Order Neural Network (CHONN), a modular framework that treats representation evolution as a latent potential process and increases its effective order through Kirchhoff-inspired cascade composition. A single Kirchhoff Neural Cell implements a stable first-order update, while serially composed cells form higher-order dynamical operators within one block. This construction is interpretable, numerically stable and compatible with common neural backbones. Theoretical analysis shows that cascaded cells induce end-to-end high-order operators, and controlled experiments demonstrate that intra-block high-order construction differs from generic depth stacking, especially on derivative-sensitive measures. Across steady-state operator learning, long-horizon physical forecasting and ImageNet-1K recognition, CHONN improves structural fidelity, rollout stability and visual representation learning. These results identify high-order circuit composition as a general principle for neural dynamics modeling.

URL PDF HTML ☆

赞 0 踩 0

2510.02578 2026-06-01 q-bio.BM cs.LG 版本更新

FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

FLOWR.root：基于流匹配的基础模型，用于联合多用途结构感知3D配体生成和亲和力预测

Julian Cremer, Tuan Le, Mohammad M. Ghahremanpour, Emilia Sługocka, Filipe Menezes, Djork-Arné Clevert

发表机构 * Machine Learning & Computational Sciences, Pfizer Worldwide R&D（辉瑞全球研发机器学习与计算科学部）； Computational Chemistry, Medicine Design, Pfizer Worldwide R&D（辉瑞全球研发计算化学与医学设计部）； Doctoral School of Medical and Health Sciences, Jagiellonian University Medical College（杰拉西利昂大学医学院医学与健康科学博士学院）； Department of Physicochemical Drug Analysis, Faculty of Pharmacy, Jagiellonian University Medical College（杰拉西利昂大学医学院药物物理化学分析系）； Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich（海德堡-慕尼黑分子靶点与治疗中心结构生物学研究所）； TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich（慕尼黑技术大学自然科学学院生物科学系，拜耳核磁共振中心）

AI总结提出SE(3)-等变流匹配模型FLOWR.root，实现口袋感知的3D配体生成、效力与结合亲和力预测及置信度估计，支持从头生成、条件采样、片段优化替换及多终点亲和力预测，在无条件分子生成和口袋条件配体生成上达到最优性能，并通过参数高效微调在亲和力预测上超越现有方法。

详情

AI中文摘要

我们提出了FLOWR.root，一个SE(3)-等变流匹配模型，用于口袋感知的3D配体生成，同时进行效力和结合亲和力预测及置信度估计。该模型支持从头生成、相互作用和药效团条件采样、片段优化和替换，以及多终点亲和力预测（pIC50、pKi、pKd、pEC50）。训练结合了大规模配体库与混合保真度的蛋白质-配体复合物，并在精选的共晶数据集上进行了细化，通过参数高效微调适应项目特定数据。基础FLOWR.root模型在无条件3D分子和口袋条件配体生成中达到了最先进的性能。在HiQBind上，预训练和微调后的模型展示了高度准确的亲和力预测，并在FEP+/OpenFE基准测试中超越了Boltz-2等最新方法，具有显著的速度优势。然而，我们表明解决未见过的结构-活性景观需要领域适应；参数高效的LoRA微调在多样化的专有数据集和PDE10A上带来了显著改进。联合生成和亲和力预测通过重要性采样实现了推理时缩放，将设计引导向更高亲和力的化合物。案例研究验证了这一点：针对CLK3的选择性CK2α配体生成显示了预测结合能与量子力学结合能之间的显著相关性。在ERα、TYK2和BACE1上的骨架优化证实了预测亲和力与QM计算之间的强一致性，同时确认了几何保真度。通过整合结构感知生成、亲和力估计、属性引导采样和高效领域适应，FLOWR.root为从先导发现到先导优化的基于结构的药物设计提供了全面基础。

英文摘要

We present FLOWR.root, an SE(3)-equivariant flow-matching model for pocket-aware 3D ligand generation with joint potency and binding affinity prediction and confidence estimation. The model supports de novo generation, interaction- and pharmacophore-conditional sampling, fragment elaboration and replacement, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, refined on curated co-crystal datasets and adapted to project-specific data through parameter-efficient finetuning. The base FLOWR.root model achieves state-of-the-art performance in unconditional 3D molecule and pocket-conditional ligand generation. On HiQBind, the pre-trained and finetuned model demonstrates highly accurate affinity predictions, and outperforms recent state-of-the-art methods such as Boltz-2 on the FEP+/OpenFE benchmark with substantial speed advantages. However, we show that addressing unseen structure-activity landscapes requires domain adaptation; parameter-efficient LoRA finetuning yields marked improvements on diverse proprietary datasets and PDE10A. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies. Scaffold elaboration on ER$α$, TYK2, and BACE1 demonstrates strong agreement between predicted affinities and QM calculations while confirming geometric fidelity. By integrating structure-aware generation, affinity estimation, property-guided sampling, and efficient domain adaptation, FLOWR.root provides a comprehensive foundation for structure-based drug design from hit identification through lead optimization.

URL PDF HTML ☆

赞 0 踩 0

2603.22867 2026-06-01 cs.AR cs.AI cs.LG 版本更新

TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

TRINE: 一种面向多模态AI的令牌感知、运行时自适应FPGA推理引擎

Hyunwoo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Suyeon Jang, Behnam Khaleghi, Fei Wen, Mohsen Imani

发表机构 * University of California, Irvine（加州大学尔湾分校）； Purdue University Northwest（北达科他州立大学）； Qualcomm（高通）； Samsung（三星）

AI总结针对多模态AI中不同计算/内存模式导致嵌入式平台实时性不足的问题，提出TRINE，一种无需重配置的单比特流FPGA加速器与编译器，通过统一层映射、运行时模式切换、令牌剪枝和依赖感知层卸载，实现端到端多模态推理，在Alveo U50和ZCU104上相比RTX 4090和Jetson Orin Nano分别降低延迟22.57倍和6.86倍，功耗仅20-21W。

Comments Accepted to DAC 2026

详情

AI中文摘要

混合ViT、CNN、GNN和Transformer NLP的多模态堆栈给嵌入式平台带来压力，因为它们的计算/内存模式不同，且硬实时目标几乎没有松弛空间。TRINE是一个单比特流FPGA加速器和编译器，无需重配置即可执行端到端多模态推理。层被统一为DDMM/SDDMM/SpMM，并映射到一个模式可切换的引擎上，该引擎在运行时在权重/输出驻留脉动阵列、1xCS SIMD和可路由加法树（RADT）之间切换，共享PE阵列。一个宽度匹配的两阶段top-k单元支持流内令牌剪枝，而依赖感知层卸载（DALO）在可重构处理单元上重叠独立内核以维持利用率。在Alveo U50和ZCU104上评估，TRINE相比RTX 4090和Jetson Orin Nano分别降低延迟高达22.57倍和6.86倍，功耗20-21W；仅令牌剪枝在ViT密集型流水线上可实现高达7.8倍加速，DALO贡献高达79%的吞吐量提升。采用int8量化，代表性任务的精度下降<2.5%，为统一的视觉、语言和图工作负载提供了最先进的延迟和能效——仅需一个比特流。

英文摘要

Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack. TRINE is a single-bitstream FPGA accelerator and compiler that executes end-to-end multimodal inference without reconfiguration. Layers are unified as DDMM/SDDMM/SpMM and mapped to a mode-switchable engine that toggles at runtime among weight/output-stationary systolic, 1xCS SIMD, and a routable adder tree (RADT) on a shared PE array. A width-matched, two-stage top-k unit enables in-stream token pruning, while dependency-aware layer offloading (DALO) overlaps independent kernels across reconfigurable processing units to sustain utilization. Evaluated on Alveo U50 and ZCU104, TRINE reduces latency by up to 22.57x vs. RTX 4090 and 6.86x vs. Jetson Orin Nano at 20-21 W; token pruning alone yields up to 7.8x on ViT-heavy pipelines, and DALO contributes up to 79% throughput improvement. With int8 quantization, accuracy drops remain <2.5% across representative tasks, delivering state-of-the-art latency and energy efficiency for unified vision, language, and graph workloads-in one bitstream.

URL PDF HTML ☆

赞 0 踩 0

2512.05976 2026-06-01 physics.comp-ph cs.LG 版本更新

Physics Enhanced Deep Surrogates for the Phonon Boltzmann Transport Equation

声子玻尔兹曼输运方程的物理增强深度代理模型

Antonio Varagnolo, Giuseppe Romano, Raphaël Pestourie

发表机构 * School of Computational Science and Engineering, Georgia Institute of Technology（计算科学与工程学院，佐治亚理工学院）； Institute for Soldier Nanotechnologies, Massachusetts Institute of Technology（士兵纳米技术研究所，麻省理工学院）

AI总结提出物理增强深度代理模型（PEDS），结合可微傅里叶求解器与神经网络生成器，通过不确定性驱动主动学习，在弹道和扩散区域实现高精度、高数据效率的声子输运模拟，仅需300次高保真BTE模拟即可达到约5%的误差。

详情

DOI: 10.1115/1.4071904

AI中文摘要

设计具有可控纳米尺度热流的材料对于微电子、热电和能量转换技术的进步至关重要。在这些尺度上，声子输运遵循玻尔兹曼输运方程（BTE），该方程捕捉了非扩散（弹道）效应，但在逆设计循环中反复求解成本过高。现有的代理方法在速度和准确性之间权衡：快速宏观求解器可能高估热导率数百个百分点，而最近的数据驱动算子学习器通常需要数千次高保真模拟。因此，需要一种快速、数据高效的代理模型，在弹道和扩散区域均保持可靠。我们提出了一种物理增强深度代理模型（PEDS），它将可微傅里叶求解器与神经网络生成器相结合，并与不确定性驱动的主动学习耦合。傅里叶求解器作为物理归纳偏置，而网络学习几何依赖的修正和混合系数，该系数在宏观和纳米尺度行为之间插值。与纯数据驱动基线相比，PEDS将训练数据需求降低了高达70%，仅需300次高保真BTE模拟即可实现约5%的分数误差，并能够高效设计覆盖12-85 W m$^{-1}$ K$^{-1}$的多孔几何结构，平均设计误差为4%。学习到的混合参数恢复了弹道-扩散转变，并提高了分布外鲁棒性。这些结果表明，嵌入简单、可微的低保真物理可以显著提高代理模型的数据效率和可解释性，使重复的PDE约束优化在纳米尺度热材料设计中变得实用。

英文摘要

Designing materials with controlled heat flow at the nano-scale is central to advances in microelectronics, thermoelectrics, and energy-conversion technologies. At these scales, phonon transport follows the Boltzmann Transport Equation (BTE), which captures non-diffusive (ballistic) effects but is too costly to solve repeatedly in inverse-design loops. Existing surrogate approaches trade speed for accuracy: fast macroscopic solvers can overestimate conductivities by hundreds of percent, while recent data-driven operator learners often require thousands of high-fidelity simulations. This creates a need for a fast, data-efficient surrogate that remains reliable across ballistic and diffusive regimes. We introduce a Physics-Enhanced Deep Surrogate (PEDS) that combines a differentiable Fourier solver with a neural generator and couples it with uncertainty-driven active learning. The Fourier solver acts as a physical inductive bias, while the network learns geometry-dependent corrections and a mixing coefficient that interpolates between macroscopic and nano-scale behavior. PEDS reduces training-data requirements by up to 70% compared with purely data-driven baselines, achieves roughly 5% fractional error with only 300 high-fidelity BTE simulations, and enables efficient design of porous geometries spanning 12-85 W m$^{-1}$ K$^{-1}$ with average design errors of 4%. The learned mixing parameter recovers the ballistic-diffusive transition and improves out of distribution robustness. These results show that embedding simple, differentiable low-fidelity physics can dramatically increase surrogate data-efficiency and interpretability, making repeated PDE-constrained optimization practical for nano-scale thermal-materials design.

URL PDF HTML ☆

赞 0 踩 0

2603.19862 2026-06-01 cs.CV cs.LG 版本更新

IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

IsoCLIP: 分解CLIP投影器以实现高效的模态内对齐

Simone Magistri, Dipam Goswami, Marco Mistretta, Bartłomiej Twardowski, Joost van de Weijer, Andrew D. Bagdanov

发表机构 * Media Integration and Communication Center (MICC), University of Florence, Italy（意大利佛罗伦萨大学媒体集成与通信中心）； Department of Computer Science, Universitat Autònoma de Barcelona, Spain（西班牙巴塞罗那自治大学计算机科学系）； Computer Vision Center, Barcelona, Spain（西班牙巴塞罗那计算机视觉中心）； IDEAS Research Institute, Warsaw, Poland（波兰华沙IDEAS研究所）

AI总结本文通过分析CLIP投影器的谱特性，发现模态间对齐子空间和各向异性方向，提出无训练方法IsoCLIP去除各向异性方向以改善模态内对齐，在模态内检索和分类任务上降低延迟并超越现有方法。

Comments Accepted at CVPR2026

详情

AI中文摘要

视觉-语言模型如CLIP被广泛用于涉及视觉和文本模态的跨模态任务。然而，当个体模态编码器应用于固有的模态内任务（如图像到图像检索）时，其性能因模态内错位而受损。本文研究CLIP中的模态内错位，重点关注将投影前图像和文本嵌入映射到共享嵌入空间的投影器的作用。通过分析应用于投影特征的余弦相似度形式及其与对比CLIP损失的交互，我们发现在训练期间存在一个负责对齐两种模态的跨模态算子，以及第二个仅强制执行模态内归一化但不促进模态内对齐的模态内算子。通过对跨模态算子的谱分析，我们识别出一个近似各向同性的子空间，其中两种模态良好对齐，以及每个模态特有的各向异性方向。我们证明该对齐子空间可以直接从投影器权重中获得，并且去除各向异性方向可改善模态内对齐。我们在模态内检索和分类基准上的实验表明，我们的无训练方法减少了模态内错位，大大降低了延迟，并在多个预训练的类CLIP模型上优于现有方法。代码公开于：https://github.com/simomagi/IsoCLIP。

英文摘要

Vision-Language Models like CLIP are extensively used for inter-modal tasks which involve both visual and text modalities. However, when the individual modality encoders are applied to inherently intra-modal tasks like image-to-image retrieval, their performance suffers from the intra-modal misalignment. In this paper we study intra-modal misalignment in CLIP with a focus on the role of the projectors that map pre-projection image and text embeddings into the shared embedding space. By analyzing the form of the cosine similarity applied to projected features, and its interaction with the contrastive CLIP loss, we show that there is an inter-modal operator responsible for aligning the two modalities during training, and a second, intra-modal operator that only enforces intra-modal normalization but does nothing to promote intra-modal alignment. Via spectral analysis of the inter-modal operator, we identify an approximately isotropic subspace in which the two modalities are well-aligned, as well as anisotropic directions specific to each modality. We demonstrate that this aligned subspace can be directly obtained from the projector weights and that removing the anisotropic directions improves intra-modal alignment. Our experiments on intra-modal retrieval and classification benchmarks show that our training-free method reduces intra-modal misalignment, greatly lowers latency, and outperforms existing approaches across multiple pre-trained CLIP-like models. The code is publicly available at: https://github.com/simomagi/IsoCLIP.

URL PDF HTML ☆

赞 0 踩 0

2601.05770 2026-06-01 cs.LG cs.CL 版本更新

位置盲叠层成像：通过数据驱动变分推断进行图像重建的可行性

Simon Welker, Lorenz Kuger, Tim Roith, Berthy Feng, Martin Burger, Timo Gerkmann, Henry Chapman

发表机构 * Department of Informatics, University of Hamburg（汉堡大学信息学院）； Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY（自由电子激光科学中心 CFEL，德意志电子同步辐射实验室）； Department of Mathematics, Bundesstr. 55, University of Hamburg（汉堡大学数学系）； CIT School, Technical University of Munich（慕尼黑技术大学 CIT 学院）； Munich Center for Machine Learning, München（慕尼黑机器学习中心）； Massachusetts Institute of Technology (MIT)（麻省理工学院）； The NSF AI Institute for Artificial Intelligence and Fundamental Interactions（国家科学基金会人工智能与基本相互作用研究院）； Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY（海德堡成像，德意志电子同步辐射实验室）

AI总结针对位置盲叠层成像这一新盲逆问题，利用基于分数的扩散模型作为数据驱动先验，通过变分推断联合恢复扫描位置和图像，在模拟简化二维变体中验证了图像重建的可行性。

详情

AI中文摘要

在这项工作中，我们提出并研究了位置盲叠层成像这一新颖的盲逆问题，即在没有任何扫描位置信息的情况下进行叠层相位恢复，必须与图像联合恢复扫描位置。该问题的动机来自单粒子衍射X射线成像，其中随机取向的粒子被照射并收集一组衍射图案。如果使用高度聚焦的X射线束，测量结果也会对每个粒子的光束位置敏感，从而成为叠层成像，但这些位置也是未知的。我们通过使用基于分数的扩散模型作为现代数据驱动图像先验，采用变分推断，在模拟的简化二维变体中研究了这个困难问题的图像重建可行性。我们发现，在适当的照明结构和强先验条件下，即使在测量噪声下，除了最困难的成像场景外，所有情况下都能实现可靠且成功的图像重建。

英文摘要

In this work, we present and investigate the novel blind inverse problem of position-blind ptychography, i.e., ptychographic phase retrieval without any knowledge of scan positions, which then must be recovered jointly with the image. The motivation for this problem comes from single-particle diffractive X-ray imaging, where particles in random orientations are illuminated and a set of diffraction patterns is collected. If one uses a highly focused X-ray beam, the measurements would also become sensitive to the beam positions relative to each particle and therefore ptychographic, but these positions are also unknown. We investigate the viability of image reconstruction in a simulated, simplified 2-D variant of this difficult problem, using variational inference with modern data-driven image priors in the form of score-based diffusion models. We find that, with the right illumination structure and a strong prior, one can achieve reliable and successful image reconstructions even under measurement noise, in all except the most difficult evaluated imaging scenario.

URL PDF HTML ☆

赞 0 踩 0

2603.12916 2026-06-01 cs.LG cs.AI 版本更新

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Surprised by Attention: 面向时间序列异常检测的可预测查询动态

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler

发表机构 * Mercedes-Benz AG（梅赛德斯-奔驰集团）； Institute for Intelligent Systems, Esslingen University of Applied Sciences（智能系统研究所，埃森嫩应用科学大学）

AI总结提出 AxonAD 无监督检测器，通过预测多头注意力查询向量的演化并结合重构误差与查询不匹配分数，有效检测多变量时间序列中的结构依赖偏移异常。

Comments This manuscript has been accepted for publication at ECML-PKDD 2026. The final version will be published in the conference proceedings. Main: 17 Pages, 7 Figures, 3 Tables; Appendix: 3 Pages, 4 Tables

详情

AI中文摘要

多变量时间序列异常通常表现为跨通道依赖的偏移，而非简单的幅度异常。例如，在自动驾驶中，转向指令可能内部一致，但与产生的横向加速度解耦。当灵活的序列模型尽管协调性改变仍能合理重构信号时，基于残差的检测器可能遗漏此类异常。我们提出 AxonAD，一种无监督检测器，将多头注意力查询演化视为短视界可预测过程。梯度更新重构路径与仅基于历史上下文的预测器耦合，该预测器通过掩码预测器-目标目标针对指数移动平均（EMA）目标编码器进行训练。推理时，重构误差与尾部聚合的查询不匹配分数结合，该分数衡量最近时间步上预测查询与目标查询之间的余弦偏差。这种双重方法在保留幅度级检测的同时，对结构依赖偏移敏感。在带有区间标注的专有车载遥测数据以及 TSB-AD 多变量套件（17 个数据集，180 个序列）上，使用无阈值和范围感知指标，AxonAD 在排名质量和时间定位上优于强基线。消融实验证实查询预测和组合评分是观察到的改进的主要驱动因素。代码可在 https://github.com/iis-esslingen/AxonAD 获取。

英文摘要

Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

URL PDF HTML ☆

赞 0 踩 0

2603.09453 2026-06-01 cs.LG cs.AI stat.ML 版本更新

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

变分路由：用于校准混合专家Transformer的可扩展贝叶斯框架

Albus Yizhuo Li, Matthew Wicker

发表机构 * Department of Computing, Imperial College London（伦敦帝国理工学院计算机系）

AI总结提出变分混合专家路由（VMoER），通过将贝叶斯推断限制在专家选择阶段，实现大规模模型的不确定性校准，在微调基础模型上显著提升路由稳定性、降低校准误差并提高分布外检测AUROC，且额外计算开销极小。

Comments 8 pages, 7 figures for main text; 16 pages for Appendix; Accepted by ICML 2026;

详情

AI中文摘要

基础模型越来越多地部署在需要理解其输出不确定性的场景中，这对于确保负责任部署至关重要。虽然贝叶斯方法为不确定性量化提供了原则性方法，但其计算开销使得在基础模型规模下进行训练或推理不切实际。最先进的模型通过精心设计的稀疏性（包括混合专家（MoE）层）实现了数万亿的参数数量。在这项工作中，我们通过引入变分混合专家路由（VMoER）展示了大规模下的校准不确定性，这是一种用于建模MoE层不确定性的结构化贝叶斯方法。VMoER将贝叶斯推断限制在通常由确定性路由网络完成的专家选择阶段。我们使用两种推断策略实例化VMoER：对路由logits的摊销变分推断和推断用于随机专家选择的温度参数。在微调测试的基础模型上，VMoER在噪声下将路由稳定性提高了38%，校准误差降低了94%，分布外AUROC提高了12%，同时额外FLOPs增加不到1%。这些结果表明，VMoER为构建鲁棒且具有不确定性意识的基础模型提供了一条可扩展的路径。

英文摘要

Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across fine-tuning tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.

URL PDF HTML ☆

赞 0 踩 0

2603.13875 2026-06-01 cs.CL cs.LG 版本更新

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

GradMem: 通过测试时梯度下降将上下文写入记忆

Yuri Kuratov, Matvey Kairov, Aydar Bulatov, Ivan Rodkin, Mikhail Burtsev

发表机构 * AXXX, Cognitive AI Systems Lab, Moscow, Russia（AXXX认知人工智能系统实验室，莫斯科，俄罗斯）； London Institute for Mathematical Sciences, London, UK（伦敦数学科学研究所，伦敦，英国）

AI总结提出GradMem方法，利用测试时梯度下降将上下文写入紧凑记忆状态，通过自监督重构损失优化记忆令牌，在键值检索和自然语言任务上优于前向式记忆写入方法。

Comments International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

许多大型语言模型应用需要基于长上下文进行条件生成。Transformer通常通过存储每层过去激活的KV缓存来支持这一点，这会产生大量内存开销。一种理想的替代方案是压缩记忆：一次性读取上下文，将其存储在紧凑状态中，并从该状态回答许多查询。我们在上下文移除设置中研究这一点，其中模型在推理时无法访问原始上下文的情况下必须生成答案。我们引入了GradMem，它通过每个样本的测试时优化将上下文写入记忆。给定一个上下文，GradMem在保持模型权重冻结的情况下，对一小部分前缀记忆令牌执行几步梯度下降。GradMem显式优化模型级的自监督上下文重构损失，从而产生带有迭代纠错的损失驱动写入操作，这与仅前向方法不同。在关联键值检索中，GradMem在相同记忆大小下优于仅前向记忆写入器，并且额外的梯度步长比重复的前向写入更有效地扩展容量。我们进一步表明，GradMem可以迁移到合成基准之外：使用预训练语言模型，它在自然语言任务（包括bAbI和SQuAD变体）上取得了有竞争力的结果，仅依赖于记忆中的编码信息。

英文摘要

Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is compressive memory: read a context once, store it in a compact state, and answer many queries from that state. We study this in a context removal setting, where the model must generate an answer without access to the original context at inference time. We introduce GradMem, which writes context into memory via per-sample test-time optimization. Given a context, GradMem performs a few steps of gradient descent on a small set of prefix memory tokens while keeping model weights frozen. GradMem explicitly optimizes a model-level self-supervised context reconstruction loss, resulting in a loss-driven write operation with iterative error correction, unlike forward-only methods. On associative key--value retrieval, GradMem outperforms forward-only memory writers with the same memory size, and additional gradient steps scale capacity much more effectively than repeated forward writes. We further show that GradMem transfers beyond synthetic benchmarks: with pretrained language models, it attains competitive results on natural language tasks including bAbI and SQuAD variants, relying only on information encoded in memory.

URL PDF HTML ☆

赞 0 踩 0

2603.13727 2026-06-01 cs.LG physics.data-an 版本更新

Data-driven Progressive Discovery of Physical Laws

数据驱动的物理定律渐进发现

Mingkun Xia, Weiwei Zhang

AI总结提出链式符号回归（CoSR）框架，通过逐步组合具有明确物理意义的知识单元，从数据中渐进发现物理定律，并在多个物理问题中验证其有效性。

Comments This paper needs to be retracted due to methodological flaws found in RBC case

详情

AI中文摘要

符号回归是知识发现的有力工具，能够直接从数据中提取可解释的数学表达式。然而，传统的符号发现通常采用端到端的“一步式”过程，在处理真实物理系统时往往生成冗长且物理意义不明的表达式，导致模型泛化能力差。这一局限性根本上源于其偏离了科学发现的基本路径：物理定律并非以单一形式存在，而是遵循从简单到复杂、层次化且渐进式的模式。受此原理启发，我们提出了链式符号回归（CoSR），一种将物理定律发现建模为符号知识链的新框架。该知识链通过沿特定逻辑逐步组合多个具有明确物理意义的知识单元而形成，最终能够从数据中精确发现潜在的物理定律。CoSR完整复现了从开普勒第三定律到万有引力定律的经典力学渐进发现路径，并应用于三类问题：湍流瑞利-贝纳德对流、圆管粘性流以及激光-金属相互作用，展示了其改进经典标度理论的能力。最后，CoSR在复杂工程问题——不同飞行器气动系数标度中展示了发现新知识的能力。

英文摘要

Symbolic regression is a powerful tool for knowledge discovery, enabling the extraction of interpretable mathematical expressions directly from data. However, conventional symbolic discovery typically follows an end-to-end, "one-step" process, which often generates lengthy and physically meaningless expressions when dealing with real physical systems, leading to poor model generalization. This limitation fundamentally stems from its deviation from the basic path of scientific discovery: physical laws do not exist in a single form but follow a hierarchical and progressive pattern from simplicity to complexity. Motivated by this principle, we propose Chain of Symbolic Regression (CoSR), a novel framework that models the discovery of physical laws as a chain of symbolic knowledge. This knowledge chain is formed by progressively combining multiple knowledge units with clear physical meanings along a specific logic, ultimately enabling the precise discovery of the underlying physical laws from data. CoSR fully recapitulates the progressive discovery path from Kepler's third law to the law of universal gravitation in classical mechanics, and is applied to three types of problems: turbulent Rayleigh-Benard convection, viscous flows in a circular pipe, and laser-metal interaction, demonstrating its ability to improve classical scaling theories. Finally, CoSR showcases its capability to discover new knowledge in the complex engineering problem of aerodynamic coefficients scaling for different aircraft.

URL PDF HTML ☆

赞 0 踩 0

2603.09936 2026-06-01 cs.LG 版本更新

HYGENE: 一种基于扩散的超图生成方法

Dorian Gailhard, Enzo Tartaglione, Lirida Naviner, Jhony H. Giraldo

AI总结提出一种基于扩散过程的超图生成方法HYGENE，通过渐进局部扩展和去噪扩散过程，从单对连接节点逐步构建目标超图，首次将深度学习应用于超图生成。

Comments arXiv admin note: text overlap with arXiv:2312.11529 by other authors

详情

AI中文摘要

超图是强大的数学结构，可以模拟社交网络、生物信息学和推荐系统等各个领域中的复杂高阶关系。然而，由于其固有的复杂性和缺乏有效的生成模型，生成真实且多样化的超图仍然具有挑战性。在本文中，我们介绍了一种基于扩散的超图生成（HYGENE）方法，通过渐进局部扩展方法解决了这些挑战。HYGENE 作用于超图的二分表示，从单对连接节点开始，迭代扩展以形成目标超图。在每一步中，使用去噪扩散过程以局部方式添加节点和超边，这允许在细化局部细节之前构建全局结构。我们的实验证明了 HYGENE 的有效性，证明了它能够紧密模仿超图中的各种属性。据我们所知，这是首次尝试使用深度学习模型进行超图生成，我们的工作旨在为该领域的未来研究奠定基础。

英文摘要

Hypergraphs are powerful mathematical structures that can model complex, high-order relationships in various domains, including social networks, bioinformatics, and recommender systems. However, generating realistic and diverse hypergraphs remains challenging due to their inherent complexity and lack of effective generative models. In this paper, we introduce a diffusion-based Hypergraph Generation (HYGENE) method that addresses these challenges through a progressive local expansion approach. HYGENE works on the bipartite representation of hypergraphs, starting with a single pair of connected nodes and iteratively expanding it to form the target hypergraph. At each step, nodes and hyperedges are added in a localized manner using a denoising diffusion process, which allows for the construction of the global structure before refining local details. Our experiments demonstrated the effectiveness of HYGENE, proving its ability to closely mimic a variety of properties in hypergraphs. To the best of our knowledge, this is the first attempt to employ deep learning models for hypergraph generation, and our work aims to lay the groundwork for future research in this area.

URL PDF HTML ☆

赞 0 踩 0

2603.08721 2026-06-01 cs.AR cs.LG cs.SE 版本更新

KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware

KernelCraft: 面向新兴硬件的近底层内核生成的智能体基准测试

Jiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao

发表机构 * Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom（计算机科学与技术系，剑桥大学，剑桥，英国）； Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom（电气与电子工程系，伦敦帝国理工学院，伦敦，英国）； School of Informatics, University of Edinburgh, Edinburgh, United Kingdom（信息学院，爱丁堡大学，爱丁堡，英国）

AI总结提出KernelCraft基准，通过函数调用和反馈驱动的工作流评估LLM智能体为新兴加速器生成和优化底层内核的能力，在多个任务上验证其能快速生成正确且高效的内核。

详情

AI中文摘要

具有新颖指令集架构（ISA）的新AI加速器通常需要开发者手动编写底层内核，这是一个耗时且易出错的过程，且无法跨硬件目标扩展。这延迟了新兴硬件平台进入市场。虽然先前基于LLM的代码生成在成熟的GPU生态系统中显示出潜力，但目前尚不清楚智能体LLM系统能否快速为具有新ISA的新兴硬件生成有效且高效的内核。我们提出KernelCraft：首个基准，用于评估LLM智能体通过函数调用、反馈驱动的工作流为定制加速器生成和优化底层内核的能力。我们在三个新兴加速器上评估智能体性能，涵盖20多个机器学习任务，每个任务有五种不同的配置。在四个领先的推理模型中，最强的智能体能在几步优化内为未见过的ISA生成功能正确的内核，并产生匹配或超越编译器基线的优化内核。这些结果证明了KernelCraft加速加速器芯片开发周期的潜力。KernelCraft可在https://kernelcraft-cam.github.io/获取。

英文摘要

New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels, a time-consuming and error-prone process that does not scale across hardware targets. This delays emerging hardware platforms from reaching the market. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark for evaluating an LLM agent's ability to generate and optimize low-level kernels for customized accelerators through a function-calling, feedback-driven workflow. We evaluate agent performance across three emerging accelerators on more than 20 machine-learning tasks, each with five diverse task configurations. Across four leading reasoning models, the strongest agents generate functionally correct kernels for unseen ISAs within a few refinement steps and produce optimized kernels that match or outperform compiler baselines. These results demonstrate KernelCraft's potential to accelerate the accelerator chip development cycle. KernelCraft is available at https://kernelcraft-cam.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2603.08651 2026-06-01 cs.LG hep-th math-ph math.MP 版本更新

Group Entropies and Mirror Duality: A Class of Flexible Mirror Descent Updates for Machine Learning

群熵与镜像对偶：一类灵活的机器学习镜像下降更新

Andrzej Cichocki, Piergiulio Tempesta

发表机构 * Systems Research Institute of Polish Academy of Science（波兰科学院系统研究所）； Warsaw University of Technology（华沙理工大学）

AI总结本文提出一个连接形式群论和群熵与现代机器学习的理论算法框架，通过群论镜像映射实现灵活可调的镜像下降优化更新，并引入镜像对偶概念以切换链接函数，在单纯形约束二次规划问题上验证了有效性。

Comments 36 pages, 5 figures

详情

AI中文摘要

我们引入了一个全面的理论和算法框架，将形式群论和群熵与现代机器学习联系起来，为无限、灵活的镜像下降（MD）优化算法族铺平了道路。我们的方法利用了群熵的丰富结构，这些熵是由群合成法则控制的广义熵泛函，涵盖并显著扩展了所有迹形式熵，如Shannon、Tsallis和Kaniadakis族。通过在MD中利用群论镜像映射（或链接函数），通过多参数广义对数及其逆（群指数）表达，我们实现了高度灵活和自适应的MD更新，可以针对不同的数据几何和统计分布进行定制。为此，我们引入了“镜像对偶”的概念，允许我们在特定的学习率约束下，无缝地切换或互换群论链接函数及其逆。通过调整或学习群对数的超参数，使我们能够使模型适应训练分布的统计特性，同时通过微调确保理想的收敛特性。这种通用性不仅提供了更大的灵活性和改进的收敛特性，而且通过扩展正则化器和自然梯度算法的设计，为机器学习和深度学习中的应用开辟了新的视角。我们在大规模、单纯形约束的二次规划问题上广泛评估了所提出更新的有效性、鲁棒性和性能。

英文摘要

We introduce a comprehensive theoretical and algorithmic framework that bridges formal group theory and group entropies with modern machine learning, paving the way for an infinite, flexible family of Mirror Descent (MD) optimization algorithms. Our approach exploits the rich structure of group entropies, which are generalized entropic functionals governed by group composition laws, encompassing and significantly extending all trace-form entropies such as the Shannon, Tsallis, and Kaniadakis families. By leveraging group-theoretical mirror maps (or link functions) in MD, expressed via multi-parametric generalized logarithms and their inverses (group exponentials), we achieve highly flexible and adaptable MD updates that can be tailored to diverse data geometries and statistical distributions. To this end, we introduce the notion of \textit{mirror duality}, which allows us to seamlessly switch or interchange group-theoretical link functions with their inverses, subject to specific learning rate constraints. By tuning or learning the hyperparameters of the group logarithms enables us to adapt the model to the statistical properties of the training distribution, while simultaneously ensuring desirable convergence characteristics via fine-tuning. This generality not only provides greater flexibility and improved convergence properties, but also opens new perspectives for applications in machine learning and deep learning by expanding the design of regularizers and natural gradient algorithms. We extensively evaluate the validity, robustness, and performance of the proposed updates on large-scale, simplex-constrained quadratic programming problems.

URL PDF HTML ☆

赞 0 踩 0

2603.06738 2026-06-01 cs.LG cs.AI 版本更新

Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Mihir Chauhan, Damon Conover, Ziran Wang, Aniket Bera

发表机构 * Department of Computer Science, Purdue University, USA（普渡大学计算机科学系）； College of Engineering, Purdue University, USA（普渡大学工程学院）； DEVCOM Army Research Laboratory, USA（美国国防部 DEVCOM 军事研究实验室）

AI总结针对离线目标条件强化学习中值函数估计困难的问题，提出一种通过空间测度聚合约束（而非逐点微分约束）来诱导距离类值几何的方法，称为Mollified Value Learning（MVL），在导航和高维机器人操作任务中提升了目标达成性能。

详情

AI中文摘要

离线目标条件强化学习（GCRL）从静态数据集中学习达到目标的行为，但在有限的状态-动作覆盖下，准确的值估计仍然具有挑战性。现有的物理信息方法通过施加由Hamilton-Jacobi-Bellman（HJB）最优性原理导出的逐点距离类几何约束（通常通过一阶偏微分方程如Eikonal方程）来解决这一问题。然而，通过显式微分结构强制局部一致性在复杂高维环境中可能变得不稳定。我们的关键洞察是，将距离类约束重新解释为局部空间测度上的期望。通过在该测度上聚合约束而非逐点评估，目标函数充当空间平滑器（mollifier），在无需昂贵微分算子的情况下诱导出距离类值几何。我们称之为Mollified Value Learning（MVL）。在导航和高维机器人操作任务上的实验表明，当与隐式值表示学习方法结合使用时，MVL学习到结构化的值表示，提高了目标达成性能。开源代码可在https://github.com/HrishikeshVish/MVL获取。

英文摘要

Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.

URL PDF HTML ☆

赞 0 踩 0

2602.21620 2026-06-01 cs.GT cs.LG 版本更新

Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners

重新审视 Bertrand 悖论：通过无遗憾学习者的均衡分析

Arnab Maiti, Junyan Liu, Kevin Jamieson, Lillian J. Ratliff

发表机构 * University of Washington（华盛顿大学）

AI总结本文通过无遗憾学习者的重复博弈模型，分析 Bertrand 定价博弈中高价格均衡出现的条件，并比较外部遗憾与交换遗憾对竞争行为的影响。

Comments 36 pages, 34 figures

详情

AI中文摘要

我们研究具有非递增需求函数的离散 Bertrand 定价博弈。该博弈有 $n \ge 2$ 个玩家，他们同时从集合 $\{1/k, 2/k, \ldots, 1\}$ 中选择价格，其中 $k\in\mathbb{N}$。设定最低价格的玩家获得全部需求；如果多个玩家并列最低价格，则他们平分需求。我们研究 Bertrand 悖论，即经典理论预测低价格，而实际市场往往维持高价格。为了理解这一差距，我们分析了一个重复博弈模型，其中企业使用无遗憾学习算法设定价格。我们的目标是刻画在不同无遗憾学习保证下可能出现的均衡结果。我们特别关注诸如无外部遗憾学习者是否能收敛到不良的高价格结果，以及更强的保证（如无交换遗憾）如何塑造竞争性低价格行为的出现等问题。我们通过理论分析解决这些问题及相关问题，并辅以实验支持理论，揭示无交换遗憾学习者的惊人现象。

英文摘要

We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.

URL PDF HTML ☆

赞 0 踩 0

2602.21340 2026-06-01 cs.LG 版本更新

HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models

HiPPO动物园：可解释状态空间模型的显式记忆机制

Jack Goffinet, Casey Hanks, David E. Carlson

发表机构 * Department of Computer Science, Duke University, Durham NC, USA（计算机科学系，杜克大学，北卡罗来纳州达勒姆）

AI总结本文通过扩展HiPPO框架，提出五种显式、可解释的记忆机制（统称“HiPPO动物园”），使状态空间模型具备自适应记忆分配和联想记忆等能力，并在合成序列建模任务中验证其有效性。

Comments 24 pages, 7 figures; to be published in ICML 2026; additional experimental results included

详情

AI中文摘要

以压缩、高效且信息丰富的方式表示过去是处理序列数据系统的核心问题。Gu & Dao等人最初提出的HiPPO框架通过结构化线性常微分方程将信号投影到正交多项式（OP）基上，为序列压缩提供了一种原则性方法。后续工作将这些动态嵌入状态空间模型（SSM）中，其中HiPPO结构用作初始化。这些SSM方法的非线性后继（如Mamba）在许多具有长程依赖的任务中达到最先进水平，但它们表示和优先处理历史的机制在很大程度上仍是隐式的。在这项工作中，我们重新审视HiPPO框架，目标是使这些机制显式化。我们展示了如何扩展历史的多项式表示以支持现代SSM的能力（如自适应记忆分配和联想记忆），同时保留在OP基上的直接可解释性。我们引入一个统一的框架，包含五种这样的扩展，统称为“HiPPO动物园”。每种扩展通过对HiPPO框架进行显式、可解释的修改，暴露特定的建模能力。所得模型在线调整其记忆，并在流式设置中以高效更新进行训练。我们通过一系列合成序列建模任务展示了这些扩展的行为和建模优势，证明通常与现代SSM相关的能力可以通过显式、可解释的多项式记忆结构实现。

英文摘要

Representing the past in a compressed, efficient, and informative manner is a central problem for systems trained on sequential data. The HiPPO framework, originally proposed by Gu & Dao et al., provides a principled approach to sequential compression by projecting signals onto orthogonal polynomial (OP) bases via structured linear ordinary differential equations. Subsequent works have embedded these dynamics in state space models (SSMs), where HiPPO structure serves as an initialization. Nonlinear successors of these SSM methods such as Mamba are state-of-the-art for many tasks with long-range dependencies, but the mechanisms by which they represent and prioritize history remain largely implicit. In this work, we revisit the HiPPO framework with the goal of making these mechanisms explicit. We show how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive memory allocation and associative memory, while retaining direct interpretability in the OP basis. We introduce a unified framework comprising five such extensions, which we collectively refer to as a "HiPPO zoo." Each extension exposes a specific modeling capability through an explicit, interpretable modification of the HiPPO framework. The resulting models adapt their memory online and train in streaming settings with efficient updates. We illustrate the behaviors and modeling advantages of these extensions through a range of synthetic sequence modeling tasks, demonstrating that capabilities typically associated with modern SSMs can be realized through explicit, interpretable polynomial memory structures.

URL PDF HTML ☆

赞 0 踩 0

2411.00759 2026-06-01 cs.LG stat.ML 版本更新

Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching

离散流匹配中的小批量最优传输与困惑度界估计

Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman

发表机构 * Metadialog Research. Code: github.com/ehaxholli/DFM-OT（MetaDialog研究。代码：github.com/ehaxholli/DFM-OT）

AI总结针对离散流匹配中状态转移过多和概率估计困难的问题，提出基于小批量最优传输的动态优化目标以减少转移次数，并给出两个困惑度上界以支持训练与评估。

详情

AI中文摘要

离散流匹配是一种用于建模分类数据的最新框架，在性能上与自回归模型相当。然而，与连续流匹配不同，由于离散路径的随机性，整流策略无法应用，因此需要替代方法来最小化状态转移。我们提出了一种动态最优传输类的最小化目标，并推导了其用于具有凸插值的离散流的Kantorovich形式，其中传输成本仅取决于状态间的不相似性，并可通过小批量策略进行优化。我们表明，此类方法可以将转移次数减少多达32倍（从1024到32），以达到相同的生成困惑度，同时不损害多样性。此外，离散流中的路径非确定性排除了瞬时变量变换的类似物，从而无法进行连续流可用的精确概率估计。因此，我们提出了两个困惑度上界，实现了有原则的训练、评估和模型比较。最后，我们引入了多掩码流，其在生成困惑度上优于掩码流且不损害多样性，特别是在使用小批量最优传输时。

英文摘要

Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state dissimilarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity without compromising diversity, particularly when utilizing minibatch Optimal Transport.

URL PDF HTML ☆

赞 0 踩 0

2602.19049 2026-06-01 cs.CL cs.LG 版本更新

Zachary Berger, Daniel Prakah-Asante, John Guttag, Collin M. Stultz

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Massachusetts General Hospital（麻省总医院）

AI总结本文主张必须改进12导联心电图表示学习的基准测试实践，以确保进展可靠且符合临床目标，并提出了扩展评估范围、采用最佳实践以及将随机编码器作为基线等建议。

Comments Project website at https://ecgfix.csail.mit.edu/

详情

AI中文摘要

这篇立场论文认为，当前12导联心电图表示学习的基准测试实践必须加以改进，以确保进展可靠且与临床有意义的目标一致。该领域已基本集中于三个公共多标签基准（PTB-XL、CPSC2018、CSN），这些基准主要由心律失常和波形形态标签主导，尽管已知心电图编码了更广泛的临床信息。我们认为，下游评估应扩展到包括结构性心脏病评估和患者级预测，以及其他不断发展的心电图相关终点，作为相关的临床目标。接下来，我们概述了多标签、不平衡设置下的评估最佳实践，并表明当应用这些实践时，文献中关于哪些表示性能最佳的当前结论会发生变化。此外，我们展示了一个令人惊讶的结果：随机初始化的编码器在线性评估下与许多任务上的最先进预训练方法相匹配。这促使将随机编码器作为合理的基线模型。我们通过实证评估五种代表性心电图预训练方法在六种评估设置（三个标准基准、一个结构性心脏病数据集、血流动力学推断和患者预测）中的表现来证实我们的观察。

英文摘要

This position paper argues that current benchmarking practice in 12-lead ECG representation learning must be fixed to ensure progress is reliable and aligned with clinically meaningful objectives. The field has largely converged on three public multi-label benchmarks (PTB-XL, CPSC2018, CSN) dominated by arrhythmia and waveform-morphology labels, even though the ECG is known to encode substantially broader clinical information. We argue that downstream evaluation should expand to include an assessment of structural heart disease and patient-level forecasting, in addition to other evolving ECG-related endpoints, as relevant clinical targets. Next, we outline evaluation best practices for multi-label, imbalanced settings, and show that when they are applied, the literature's current conclusion about which representations perform best is altered. Furthermore, we demonstrate the surprising result that a randomly initialized encoder with linear evaluation matches state-of-the-art pre-training on many tasks. This motivates the use of a random encoder as a reasonable baseline model. We substantiate our observations with an empirical evaluation of five representative ECG pre-training approaches across six evaluation settings: the three standard benchmarks, a structural disease dataset, hemodynamic inference, and patient forecasting.

URL PDF HTML ☆

赞 0 踩 0

2602.16601 2026-06-01 stat.ML cs.LG 版本更新

Quantifying Error Propagation and Model Collapse in Diffusion Models

量化扩散模型中的误差传播与模型崩溃

Nail B. Khelifa, Richard E. Turner, Ramji Venkataramanan

发表机构 * University of Cambridge, Cambridge, United Kingdom（剑桥大学）

AI总结本文理论分析了基于分数的扩散模型中递归训练导致模型崩溃的误差传播机制，给出了生成分布与目标分布之间累积散度的上下界，并刻画了不同漂移区域。

Comments Accepted at ICML 2026

详情

AI中文摘要

机器学习模型越来越多地在合成数据上进行训练或微调。已观察到，在此类数据上递归训练会显著降低各种任务的性能，通常表现为逐渐偏离目标分布。在这项工作中，我们在基于分数的扩散模型设置下从理论上分析了这一现象。对于每个训练轮次使用合成数据与来自目标分布的新鲜样本组合的实际流程，我们获得了生成分布与目标分布之间累积散度的上界和下界。值得注意的是，据我们所知，这是首次对学习分布与目标分布之间的散度给出下界，即使对于标准扩散模型也是如此。我们的结果使我们能够根据分数估计误差和每代中使用的新鲜数据比例来表征不同的漂移区域。在某个区域中，多次再训练轮次后的累积散度可以表示为每代分数估计误差的折现和。我们还提供了合成数据和图像上的实证结果以说明该理论。

英文摘要

Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. Notably, to the best of our knowledge, this is the first lower bound on the divergence between the learned and target distributions, even for standard diffusion models. Our results allow us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. In a certain regime, the accumulated divergence after several retraining rounds can be expressed as a discounted sum of score estimation errors made at each generation. We also provide empirical results on synthetic data and images to illustrate the theory.

URL PDF HTML ☆

赞 0 踩 0

2602.16305 2026-06-01 cs.SD cs.LG 版本更新

BAT: Better Audio Transformer Guided by Convex Gated Probing

BAT: 基于凸门控探测的更好音频Transformer

Houtan Ghaffari, Lukas Rauch, Christoph Scholz, Paul Devos

发表机构 * Ghent University（根特大学）； University of Kassel（卡塞尔大学）

AI总结提出凸门控探测（CGP）方法，通过门控机制有效利用所有冻结层，缩小音频自监督学习中探测与微调的差距，并基于CGP改进SSL流程，构建Better Audio Transformer（BAT），在音频基准上取得新最优结果。

Comments Accepted @ ICML26

详情

AI中文摘要

探测在计算机视觉中被广泛用于忠实评估自监督学习（SSL）嵌入，因为微调可能扭曲其内在质量。相比之下，音频SSL模型仍依赖微调，因为简单探测无法充分发挥其潜力，并在AudioSet竞争时改变排名。因此，需要一种稳健高效的探测机制来引导音频SSL走向可靠和可重复的方法。我们引入凸门控探测（CGP），一种基于原型的方法，显著缩小了音频中微调和探测之间的差距。CGP通过门控机制高效利用所有冻结层，并揭示潜在任务相关信息的所在位置。以CGP作为可靠的事后评估探测为指导，我们重新设计了当前最佳音频模型的整个SSL流程，这些模型使用了先前SSL方法的遗留实现。通过改进数据预处理、模型架构和预训练方案，我们推出了Better Audio Transformer（BAT），并在音频基准上建立了新的最优结果。

英文摘要

Probing is widely adopted in computer vision to faithfully evaluate self-supervised learning (SSL) embeddings, as finetuning may misrepresent their inherent quality. In contrast, audio SSL models still rely on finetuning because simple probing fails to unlock their full potential and alters their rankings when competing on AudioSet. Hence, a robust and efficient probing mechanism is required to guide the trajectory of audio SSL towards reliable and reproducible methods. We introduce Convex Gated Probing (CGP), a prototype-based method that significantly closes the gap between finetuning and probing in audio. CGP efficiently utilizes all frozen layers via a gating mechanism and exposes the location of latent task-relevant information. Guided by CGP as a reliable post-hoc evaluation probe, we rework the entire SSL pipeline of current best performing audio models that use legacy implementations of prior SSL methods. By refining data preprocessing, model architecture, and pretraining recipe, we introduce Better Audio Transformer (BAT), and establish new SOTA on audio benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2602.15634 2026-06-01 cs.LG 版本更新

Beyond ReLU: Bifurcation, Oversmoothing, and Topological Priors

超越ReLU：分岔、过平滑与拓扑先验

Erkan Turan, Gaspard Abel, Maysam Behmanesh, Emery Pierson, Maks Ovsjanikov

发表机构 * Université Paris Saclay, Université Paris Cité, ENS Paris Saclay, CNRS, SSA, INSERM, Centre Borelli（巴黎萨克雷大学、巴黎城市大学、巴黎萨克雷高等师范学院、国家科学研究中心、SSA、国家卫生研究院、Borelli中心）； Centre d’Analyse et de Mathématique Sociales, EHESS, CNRS（社会科学分析与数学中心、EHESS、国家科学研究中心）

AI总结从分岔理论视角重新解释图神经网络的过平滑问题，发现用特定激活函数替代ReLU可打破同质稳定状态，诱导出抵抗过平滑的非同质模式，并推导出分岔感知初始化方法。

详情

AI中文摘要

图神经网络（GNN）通过基于网络的迭代消息传递学习节点表示。尽管强大，深层GNN却遭受过平滑问题，即节点特征收敛到同质、无信息的状态。我们从分岔理论的角度重新审视这种表示坍缩问题，将过平滑表征为收敛到稳定的“同质不动点”。我们的核心贡献是理论发现：通过用一类函数替代标准单调激活函数（如ReLU），可以打破这种不期望的稳定性。利用Lyapunov-Schmidt约化，我们解析证明这种替换会诱导分岔，使同质状态失稳，并产生一对新的稳定、非同质的模式，这些模式被证明能抵抗过平滑。我们的理论预测了这些涌现模式振幅的精确、非平凡标度律，并在实验中定量验证。最后，我们通过推导闭式的、分岔感知的初始化方法，并在实际基准实验中展示其效用，证明了我们理论的实用价值。

英文摘要

Graph Neural Networks (GNNs) learn node representations through iterative network-based message-passing. While powerful, deep GNNs suffer from oversmoothing, where node features converge to a homogeneous, non-informative state. We re-frame this problem of representational collapse from a \emph{bifurcation theory} perspective, characterizing oversmoothing as convergence to a stable ``homogeneous fixed point.'' Our central contribution is the theoretical discovery that this undesired stability can be broken by replacing standard monotone activations (e.g., ReLU) with a class of functions. Using Lyapunov-Schmidt reduction, we analytically prove that this substitution induces a bifurcation that destabilizes the homogeneous state and creates a new pair of stable, non-homogeneous \emph{patterns} that provably resist oversmoothing. Our theory predicts a precise, nontrivial scaling law for the amplitude of these emergent patterns, which we quantitatively validate in experiments. Finally, we demonstrate the practical utility of our theory by deriving a closed-form, bifurcation-aware initialization and showing its utility in real benchmark experiments.

URL PDF HTML ☆

赞 0 踩 0

2602.15293 2026-06-01 cs.LG cs.AI cs.CL stat.ML 版本更新

The Information Geometry of Softmax: Probing and Steering

Softmax的信息几何：探测与引导

Kiho Park, Todd Nief, Yo Joong Choe, Victor Veitch

发表机构 * University of Chicago（芝加哥大学）

AI总结本文从信息几何角度研究AI系统如何将语义结构编码到表示空间的几何结构中，并提出一种利用线性探针鲁棒引导表示以展现特定概念的“双重引导”方法。

Comments Code is available at https://github.com/KihoPark/dual-steering

详情

Journal ref: In Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

AI中文摘要

本文关注AI系统如何将语义结构编码到其表示空间的几何结构中的问题。动机观察是，这些表示空间的自然几何应反映模型使用表示产生行为的方式。我们聚焦于定义softmax分布的重要特例。在这种情况下，我们认为自然几何是信息几何。我们的重点是信息几何在语义编码和线性表示假设中的作用。作为一个说明性应用，我们开发了“双重引导”，一种利用线性探针鲁棒地引导表示以展现特定概念的方法。我们证明双重引导在最小化对非目标概念改变的同时，最优地修改目标概念。实验上，我们发现双重引导增强了概念操控的可控性和稳定性。

英文摘要

This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation is that the natural geometry of these representation spaces should reflect the way models use representations to produce behavior. We focus on the important special case of representations that define softmax distributions. In this case, we argue that the natural geometry is information geometry. Our focus is on the role of information geometry on semantic encoding and the linear representation hypothesis. As an illustrative application, we develop "dual steering", a method for robustly steering representations to exhibit a particular concept using linear probes. We prove that dual steering optimally modifies the target concept while minimizing changes to off-target concepts. Empirically, we find that dual steering enhances the controllability and stability of concept manipulation.

URL PDF HTML ☆

赞 0 踩 0

2602.13069 2026-06-01 cs.LG cs.CL 版本更新

Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning

面向设备上大语言模型微调的内存高效结构化反向传播

Juneyoung Park, Yuri Hong, Seongwan Kim, Jaeho Lee

发表机构 * OptAI Inc.（OptAI公司）

AI总结提出MeSP方法，通过手动推导利用LoRA低秩结构的反向传播，在计算数学等价梯度的同时平均减少49%内存，使内存受限设备上的微调成为可能。

Comments ACL2026

详情

AI中文摘要

设备上微调能够实现大语言模型的隐私保护个性化，但移动设备存在严重的内存限制，通常所有工作负载共享6-12GB内存。现有方法迫使在高内存的精确梯度（MeBP）和低内存的噪声估计（MeZO）之间进行权衡。我们提出内存高效结构化反向传播（MeSP），通过手动推导利用LoRA低秩结构的反向传播来弥合这一差距。我们的关键洞察是，中间投影 $h = xA$ 可以在反向传播中以最小成本重新计算，因为秩 $r \ll d_{in}$，从而无需存储它。在Qwen2.5模型（0.5B-3B）上，MeSP相比MeBP平均减少49%内存，同时计算数学上等价的梯度。我们的分析还揭示，MeZO的梯度估计与真实梯度的相关性接近零（余弦相似度≈0.001），解释了其收敛缓慢的原因。MeSP将Qwen2.5-0.5B的峰值内存从361MB降低到136MB，使得先前在内存受限设备上不可行的微调场景成为可能。

英文摘要

On-device fine-tuning enables privacy-preserving personalization of large language models, but mobile devices impose severe memory constraints, typically 6--12GB shared across all workloads. Existing approaches force a trade-off between exact gradients with high memory (MeBP) and low memory with noisy estimates (MeZO). We propose Memory-efficient Structured Backpropagation (MeSP), which bridges this gap by manually deriving backward passes that exploit LoRA's low-rank structure. Our key insight is that the intermediate projection $h = xA$ can be recomputed during backward at minimal cost since rank $r \ll d_{in}$, eliminating the need to store it. MeSP achieves 49\% average memory reduction compared to MeBP on Qwen2.5 models (0.5B--3B) while computing mathematically identical gradients. Our analysis also reveals that MeZO's gradient estimates show near-zero correlation with true gradients (cosine similarity $\approx$0.001), explaining its slow convergence. MeSP reduces peak memory from 361MB to 136MB for Qwen2.5-0.5B, enabling fine-tuning scenarios previously infeasible on memory-constrained devices.

URL PDF HTML ☆

赞 0 踩 0

2602.12386 2026-06-01 cs.MA cs.GT cs.LG 版本更新

Provably Convergent Actor-Critic for MARL through Risk-aversion

通过风险厌恶实现可证明收敛的MARL演员-评论家算法

Yizhou Zhang, Eric Mazumdar

发表机构 * caltech（加州理工学院）

AI总结针对无限时域一般和马尔可夫博弈，提出基于风险厌恶分位数响应均衡（RQE）的单时间尺度演员-评论家算法，利用RQE的正则性证明全局收敛并给出有限样本保证。

详情

AI中文摘要

在无限时域一般和马尔可夫博弈（MGs）中学习平稳策略仍然是多智能体强化学习（MARL）中的一个基本开放问题。尽管平稳策略因其实用性而受到青睐，但计算经典博弈论均衡的平稳形式在计算上是棘手的——这与解决单智能体RL或零和博弈的相对容易形成鲜明对比。为了弥合这一差距，我们研究了风险厌恶分位数响应均衡（RQE），这是一种根植于行为博弈论的概念，结合了风险厌恶和有限理性。我们证明RQE具有强正则性条件，使其特别适合在MGs中进行学习。我们提出了一种新颖的单时间尺度演员-评论家算法，其特点是演员更新更快而评论家更新较慢。利用RQE的正则性，我们证明该方法实现了具有有限样本保证的全局收敛。我们在多个环境中进行了实证验证，表明与风险中性基线相比，我们的算法具有优越的收敛性能。

英文摘要

Learning stationary policies in infinite-horizon general-sum Markov games (MGs) remains a fundamental open problem in Multi-Agent Reinforcement Learning (MARL). While stationary strategies are preferred for their practicality, computing stationary forms of classic game-theoretic equilibria is computationally intractable -- a stark contrast to the comparative ease of solving single-agent RL or zero-sum games. To bridge this gap, we study Risk-averse Quantal response Equilibria (RQE), a solution concept rooted in behavioral game theory that incorporates risk aversion and bounded rationality. We demonstrate that RQE possesses strong regularity conditions that make it uniquely amenable to learning in MGs. We propose a novel single-timescale Actor-Critic algorithm characterized by a faster actor and a slower critic. Leveraging the regularity of RQE, we prove that this approach achieves global convergence with finite-sample guarantees. We empirically validate our algorithm in several environments to demonstrate superior convergence properties compared to risk-neutral baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.11802 2026-06-01 cs.LG 版本更新

Structural Bias Beyond Homophily: A Study of Fairness in Link Prediction

超越同质性的结构偏差：链接预测中的公平性研究

Lilian Marey, Mathilde Perez, Tiphaine Viard, Charlotte Laclau

发表机构 * LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France（LTCI， Télécom 巴黎，巴黎理工学院，帕莱萨乌，法国）

AI总结本研究通过形式化拓扑偏差度量并引入可控结构属性的合成图生成方法，实证了图拓扑与链接预测公平性之间的强相关性，并揭示了现有公平感知方法对同质性之外的结构偏差仍然敏感。

详情

AI中文摘要

图链接预测（LP）在诸如工作推荐和友谊形成等具有社会影响力的应用中发挥着关键作用，使得公平性成为该任务中的一个关键问题。虽然许多公平感知方法通过操纵图结构来减轻预测差异，但社会图中固有的拓扑偏差仍然未被充分理解，并且始终仅与同质性混为一谈。在这项工作中，我们研究了结构偏差与LP中公平性结果之间的关系。为此，我们形式化了拓扑偏差度量的分类，并引入了一种图生成方法，该方法可生成具有可控结构属性的多样化合成图语料库。利用该语料库，我们实证表明公平性结果与图拓扑强相关，并且当前的公平感知方法对同质性之外的结构偏差仍然敏感。这些发现强调了在公平图学习中进行基于结构的评估的必要性。

英文摘要

Graph link prediction (LP) plays a critical role in socially impactful applications such as job recommendation and friendship formation, making fairness a critical concern in this task. While many fairness-aware methods manipulate graph structures to mitigate prediction disparities, the topological biases inherent to social graphs remain poorly understood and are consistently conflated with homophily alone. In this work, we study the relationship between structural biases and fairness outcomes in LP. To this end, we formalize a taxonomy of topological bias measures and introduce a graph generation method producing a diverse corpus of synthetic graphs with controlled structural properties. Using this corpus, we show empirically that fairness outcomes are strongly correlated with graph topology, and that current fairness-aware methods remain sensitive to structural biases beyond homophily. These findings highlight the need for structurally grounded evaluations in fair graph learning.

URL PDF HTML ☆

赞 0 踩 0

2602.11216 2026-06-01 cs.LG physics.bio-ph 版本更新

Protein Language Model Embeddings Improve Generalization of Implicit Transfer Operators

蛋白质语言模型嵌入提升隐式转移算子的泛化能力

Panagiotis Antoniadis, Beatrice Pavesi, Simon Olsson, Ole Winther

发表机构 * University of Copenhagen（哥本哈根大学）； Chalmers University of Technology（查尔姆斯理工大学）； University of Gothenburg（哥德堡大学）； Technical University of Denmark（丹麦技术大学）

AI总结本研究提出PLaTITO方法，通过整合蛋白质语言模型嵌入改进隐式转移算子，在分子动力学中实现更高效的数据利用和跨分子系统的泛化，在非平衡蛋白质系统采样中达到最优性能。

Comments 29 pages, 14 figures and 11 tables, Accepted at ICML 2026

详情

AI中文摘要

分子动力学（MD）是物理学、化学和生物学中的核心计算工具，能够将实验可观测量作为高维分子分布（如玻尔兹曼分布和转移密度）的期望进行定量预测。然而，传统MD受到生成独立样本所需高计算成本的根本限制。生成式分子动力学（GenMD）最近作为一种替代方案出现，通过数据或与能量模型交互学习分子分布的替代模型。尽管这些方法实现了高效采样，但它们在不同分子系统间的可迁移性通常有限。在本工作中，我们表明整合辅助信息源可以提高可迁移隐式转移算子（TITO）在分子动力学中的数据效率和泛化能力。我们发现粗粒化TITO模型比玻尔兹曼模拟器在数据效率上显著更高，并且整合蛋白质语言模型（pLM）嵌入进一步改善了分布外泛化。我们的方法PLaTITO在非平衡蛋白质系统（包括快速折叠蛋白质）的平衡采样基准测试中达到了最先进的性能。我们进一步研究了额外条件信号（如结构嵌入、温度和大语言模型衍生嵌入）对模型性能的影响。

英文摘要

Molecular dynamics (MD) is a central computational tool in physics, chemistry, and biology, enabling quantitative prediction of experimental observables as expectations over high-dimensional molecular distributions such as Boltzmann distributions and transition densities. However, conventional MD is fundamentally limited by the high computational cost required to generate independent samples. Generative molecular dynamics (GenMD) has recently emerged as an alternative, learning surrogates of molecular distributions either from data or through interaction with energy models. While these methods enable efficient sampling, their transferability across molecular systems is often limited. In this work, we show that incorporating auxiliary sources of information can improve the data efficiency and generalization of transferable implicit transfer operators (TITO) for molecular dynamics. We find that coarse-grained TITO models are substantially more data-efficient than Boltzmann Emulators, and that incorporating protein language model (pLM) embeddings further improves out-of-distribution generalization. Our approach, PLaTITO, achieves state-of-the-art performance on equilibrium sampling benchmarks for out-of-distribution protein systems, including fast-folding proteins. We further study the impact of additional conditioning signals such as structural embeddings, temperature, and large-language-model-derived embeddings on model performance.

URL PDF HTML ☆

赞 0 踩 0

2602.11208 2026-06-01 cs.LG 版本更新

Adaptive Physics Transformer with Fused Global-Local Attention for Subsurface Energy Systems

自适应物理Transformer融合全局-局部注意力用于地下能源系统

Xin Ju, Nok Hei, Fung, Yuyan Zhang, Carl Jacquemyn, Matthew Jackson, Randolph Settgast, Sally M. Benson, Gege Wen

发表机构 * Department of Energy Science and Engineering, Stanford University（斯坦福大学能源科学与工程系）； Department of Earth Sciences and Engineering, Imperial College London（伦敦帝国理工学院地球科学与工程系）； Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）； EarthFlow AI, Inc.（EarthFlow AI公司）

AI总结提出自适应物理Transformer（APT），通过融合图编码器和全局注意力机制，高效处理地下能源系统中的异构网格和物理耦合问题，在规则与不规则网格上均优于现有架构，并首次直接从高分辨率自适应网格细化模拟中学习。

详情

AI中文摘要

地球地下空间是现代社会的基石，提供碳氢化合物、地热和矿物等基本能源资源，同时是$CO_2$封存的主要储层。然而，由于地质异质性、高分辨率要求以及具有不同传播时间尺度的物理过程的紧密耦合，这些系统的全物理数值模拟计算成本极高。本文提出$ extbf{自适应物理Transformer}$（APT），这是一种与几何、网格和物理无关的神经算子，明确解决了这些挑战。APT融合了基于图的编码器以提取高分辨率局部异质特征，并结合全局注意力机制以解析远程物理影响。我们的结果表明，APT在规则和不规则网格上的地下任务中均优于最先进的架构，并具有鲁棒的超分辨率能力。值得注意的是，APT是第一个直接从高分辨率自适应网格细化模拟中学习的架构。我们还展示了APT良好的扩展行为和跨数据集学习能力，使其成为大规模地下基础模型开发的稳健且可扩展的骨干网络。

英文摘要

The Earth's subsurface is a cornerstone of modern society, providing essential energy resources like hydrocarbons, geothermal, and minerals while serving as the primary reservoir for $CO_2$ sequestration. However, full physics numerical simulations of these systems are notoriously computationally expensive due to geological heterogeneity, high resolution requirements, and the tight coupling of physical processes with distinct propagation time scales. Here we propose the $\textbf{Adaptive Physics Transformer}$ (APT), a geometry-, mesh-, and physics-agnostic neural operator that explicitly addresses these challenges. APT fuses a graph-based encoder to extract high-resolution local heterogeneous features with a global attention mechanism to resolve long-range physical impacts. Our results demonstrate that APT outperforms state-of-the-art architectures in subsurface tasks across both regular and irregular grids with robust super-resolution capabilities. Notably, APT is the first architecture that learns directly from HR-adaptive mesh refinement simulations. We also demonstrate APT's favorable scaling behavior and cross-dataset learning capability, positioning it as a robust and scalable backbone for large-scale subsurface foundation model development.

URL PDF HTML ☆

赞 0 踩 0

2602.11137 2026-06-01 cs.LG cs.AI cs.CL 版本更新

Weight Decay Improves Language Model Plasticity

权重衰减提升语言模型可塑性

Tessa Han, Sebastian Bordt, Hanlin Zhang, Sham Kakade

发表机构 * Broad Institute, Schmidt Center（Broad研究所，Schmidt中心）； University of Tübingen, Tübingen AI Center（图宾根大学，图宾根人工智能中心）； Harvard University（哈佛大学）

AI总结本文通过系统实验表明，预训练中较大的权重衰减能提高模型的可塑性，使微调后下游性能更优，并揭示了其促进线性可分表示、正则化注意力矩阵和减少过拟合的机制。

详情

AI中文摘要

大型语言模型通常分两个主要阶段训练：预训练以产生基础模型，然后进一步训练以提高下游性能。然而，超参数优化和缩放定律主要从基础模型验证损失的角度研究，忽略了一个关键的模型属性：下游适应性。在这项工作中，我们从模型可塑性的角度研究预训练，即基础模型在额外训练后成功适应下游任务的能力。我们关注权重衰减的作用，这是预训练中的一个关键正则化参数，并通过系统实验表明，较大的权重衰减提高了预训练模型的可塑性，导致微调后下游性能提升更大。这种效应可能导致反直觉的权衡，即预训练后表现较差的基础模型在进一步训练后可能表现更好。对权重衰减对模型行为的机制影响的进一步研究表明，它鼓励线性可分的表示，正则化注意力矩阵，并减少对训练数据的过拟合。这些发现共同强调了预训练模型可塑性的重要性，使用交叉熵损失作为超参数优化的唯一指标的局限性，以及单个优化超参数在塑造模型行为中的多方面作用。

英文摘要

Large language models are typically trained in two broad phases: pretraining to produce a base model, followed by further training to improve downstream performance. However, hyperparameter optimization and scaling laws are studied primarily from the perspective of the base model's validation loss, overlooking a crucial model property: downstream adaptability. In this work, we study pretraining from the perspective of model plasticity, that is, the ability of the base model to successfully adapt to downstream tasks upon additional training. We focus on the role of weight decay, a key regularization parameter during pretraining, and show through systematic experiments that larger weight decay increases the plasticity of the pretrained model, resulting in greater performance gains downstream after fine-tuning. This effect can lead to counterintuitive trade-offs where base models that perform worse after pretraining can perform better after further training. Further investigation of weight decay's mechanistic effects on model behavior reveals that it encourages linearly separable representations, regularizes attention matrices, and reduces overfitting on the training data. Together, these findings highlight the importance of pretrained model plasticity, the limits of using cross-entropy loss as the sole metric for hyperparameter optimization, and the multifaceted role that a single optimization hyperparameter plays in shaping model behavior.

URL PDF HTML ☆

赞 0 踩 0

2602.11083 2026-06-01 cs.LG cs.CR 版本更新

Token-Efficient Change Detection in LLM APIs

LLM API中的令牌高效变化检测

Timothée Chauvin, Clément Lalanne, Erwan Le Merrer, Jean-Michel Loubes, François Taïani, Gilles Tredan

发表机构 * Université de Rennes, Inria, CNRS/IRISA（里昂大学、法国国家信息与自动化技术研究院、法国国家科学研究中心/IRISA）； Equipe Regalia, Inria（Regalia团队、法国国家信息与自动化技术研究院）； LAAS, CNRS（拉劳斯研究中心、法国国家科学研究中心）； Univ Toulouse, INUC, UT2J, INSA Toulouse, TSE, CNRS, IMT（图卢兹大学、INUC、UT2J、图卢兹国家高等工业学院、TSE、法国国家科学研究中心、IMT）

AI总结提出基于边界输入的黑盒变化检测方案B3IT，在仅观察输出令牌的条件下实现低成本、高性能的LLM变化检测。

Comments ICML 2026

详情

AI中文摘要

远程检测LLM中的变化是一个难题。现有方法要么在大规模部署时成本过高，要么需要初始的白盒访问模型权重或灰盒访问对数概率。我们的目标是实现低成本和严格的黑盒操作，仅观察输出令牌。我们的方法依赖于我们称为边界输入的特定输入，对于这些输入，存在多个输出顶部令牌。从统计角度来看，最优变化检测取决于模型的雅可比矩阵和输出分布的Fisher信息。在低温状态下分析这些量表明，边界输入能够实现强大的变化检测测试。基于这一见解，我们提出了黑盒边界输入跟踪（B3IT）方案。大量的体内和体外实验表明，对于非推理测试端点，边界输入很容易找到，并且性能与最佳可用的灰盒方法相当。与现有方法相比，B3IT将成本降低了30倍，同时在严格的黑盒设置中运行。

英文摘要

Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model's Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.

URL PDF HTML ☆

赞 0 踩 0

2602.10286 2026-06-01 cs.LG 版本更新

What Does Preference Learning Recover from Pairwise Comparison Data?

成对比较数据中的偏好学习恢复了什么？

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

发表机构 * Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA（卡内基梅隆大学机器学习系）

AI总结本文通过条件偏好分布（CPRD）形式化成对比较数据中的偏好信息，分析了Bradley-Terry模型在数据违反假设时的恢复能力，并揭示了影响样本效率的关键因素（边界和连通性）。

详情

Journal ref: ICML 2026

AI中文摘要

成对偏好学习是机器学习的核心，最近应用于将语言模型与人类偏好对齐。典型数据集由三元组 $(x, y^+, y^-)$ 组成，其中对于上下文 $x$，响应 $y^+$ 优于响应 $y^-$。Bradley-Terry (BT) 模型是主要方法，将偏好概率建模为潜在得分差异的函数。标准实践假设数据遵循此模型，并相应地学习潜在得分。然而，真实数据可能违反这一假设，目前尚不清楚 BT 学习在这种情况下恢复了什么。从三元组比较数据出发，我们通过条件偏好分布 (CPRD) 形式化其编码的偏好信息。我们给出了 BT 适用于建模 CPRD 的精确条件，并确定了影响样本效率的因素——即边界和连通性。这些结果共同为理解偏好学习实际恢复了什么提供了以数据为中心的基础。

英文摘要

Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over response $y^-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.

URL PDF HTML ☆

赞 0 踩 0

2602.07721 2026-06-01 cs.LG cs.CL cs.DB 版本更新

ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs

ParisKV：面向长上下文LLM的快速且漂移鲁棒的KV缓存检索

Yanlin Qi, Xinhang Chen, Huiqiang Jiang, Qitong Wang, Botao Peng, Themis Palpanas

发表机构 * Xi'an Jiaotong University, Xi'an, China（西安交通大学）； Qwen Team, Alibaba Group, China（通义实验室）； Harvard University, Cambridge, MA, USA（哈佛大学）； Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China（中国科学院计算技术研究所）

AI总结提出基于碰撞候选选择和量化内积重排序的GPU原生KV缓存检索框架ParisKV，在百万token上下文中实现低延迟、高吞吐且分布漂移鲁棒的检索，性能优于或持平全注意力。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

KV缓存检索对于长上下文LLM推理至关重要，但现有方法在处理大规模分布漂移和高延迟时存在困难。我们提出ParisKV，一种基于碰撞候选选择、随后使用量化内积重排序估计器的漂移鲁棒、GPU原生的KV缓存检索框架。对于百万token上下文，ParisKV通过统一虚拟寻址（UVA）支持CPU卸载的KV缓存，实现按需的top-$k$获取，开销极小。ParisKV在长输入和长生成基准测试中匹配或超越全注意力质量。它实现了最先进的长上下文解码效率：即使在长上下文的批大小为1时，也能匹配或超过全注意力速度；在全注意力可运行范围内提供高达2.8倍的吞吐量；并扩展到全注意力内存不足的百万token上下文。在百万token规模下，与两个最先进的KV缓存Top-$k$检索基线MagicPIG和PQCache相比，ParisKV分别将解码延迟降低了17倍和44倍。代码可在https://github.com/amy-77/ParisKV/tree/main获取。

英文摘要

KV-cache retrieval is essential for long-context LLM inference, yet existing methods struggle with distribution drift and high latency at scale. We introduce ParisKV, a drift-robust, GPU-native KV-cache retrieval framework based on collision-based candidate selection, followed by a quantized inner-product reranking estimator. For million-token contexts, ParisKV supports CPU-offloaded KV caches via Unified Virtual Addressing (UVA), enabling on-demand top-$k$ fetching with minimal overhead. ParisKV matches or outperforms full attention quality on long-input and long-generation benchmarks. It achieves state-of-the-art long-context decoding efficiency: it matches or exceeds full attention speed even at batch size 1 for long contexts, delivers up to 2.8$\times$ higher throughput within full attention's runnable range, and scales to million-token contexts where full attention runs out of memory. At million-token scale, ParisKV reduces decode latency by 17$\times$ and 44$\times$ compared to MagicPIG and PQCache, respectively, two state-of-the-art KV-cache Top-$k$ retrieval baselines, code is available at https://github.com/amy-77/ParisKV/tree/main.

URL PDF HTML ☆

赞 0 踩 0

2602.09405 2026-06-01 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

记忆是有益还是有害？先验信息设定阈值

Chen Cheng, Rina Foygel Barber

发表机构 * Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结在过参数化线性模型和贝叶斯框架下，研究先验分布如何决定训练误差与泛化误差的关系，给出记忆必要或过拟合有害的条件。

Comments 33 pages, 3 figures. Accepted to the Conference on Learning Theory (COLT) 2026

2602.09309 2026-06-01 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG physics.atm-clus 版本更新

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

你能长多远？表征材料科学中图生成模型的外推前沿

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

发表机构 * Texas A\&M University College Station Texas USA ； Ankara University Ankara Turkey ； Texas A\&M University at Qatar Doha Qatar ； College of Science \& Engineering Hamad Bin Khalifa University Doha Qatar ； Texas A\&M University ； Ankara University ； Texas A\&M University at Qatar ； College of Science \& Engineering Hamad Bin Khalifa University

AI总结提出RADII基准，通过半径分辨的纳米粒子结构评估晶体生成模型的外推能力，发现模型在训练半径外误差增加，且外推前沿可预测。

详情

DOI: 10.1145/3770855.3818872

AI中文摘要

每种晶体材料生成模型都存在一个临界结构尺寸，超出该尺寸其输出变得不可靠；我们称之为外推前沿。尽管这对纳米材料设计有重要影响，但这一前沿从未被系统测量过。我们引入RADII，一个半径分辨的基准，包含约75,000个晶体衍生的纳米粒子结构（33-11,298个原子），将半径视为连续缩放旋钮，在无泄漏分割下追踪从分布内到分布外的生成质量。每个模型以目标组成和原子数为条件，将几何外推作为评估变量。RADII提供前沿特定的诊断：每个半径的误差曲线精确定位每个架构的缩放上限，表面-内部分解分离边界和体相失效，跨度量排序揭示结构保真度的哪个方面首先失效。对五种最先进架构进行基准测试，我们发现：(i) 表现良好的模型在训练半径外全局位置误差增加约13%，而发散模型在所有尺度上保真度差，局部键合保真度从可忽略的退化到超过2倍的误差增长；(ii) 没有两个架构共享相同的失效序列，揭示前沿是由模型族决定的多维表面；(iii) 表现良好的模型遵循预期的几何缩放指数α ~ 1/3，其分布内拟合可预测分布外误差，使前沿可预测。将MatterGen扩展到其公布的参数数量稳定了采样，但并未关闭前沿，而DiffCSP在公布规模下仍不稳定。这些发现将输出尺度确立为几何生成模型的一级评估轴。代码和数据：https://github.com/KurbanIntelligenceLab/RADII。

英文摘要

Every generative model for crystalline materials harbors a critical structure size beyond which its outputs become unreliable; we call this the extrapolation frontier. Despite its consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ~75,000 crystal-derived nanoparticle structures (33-11,298 atoms) that treats radius as a continuous scaling knob, tracing generation quality from in- to out-of-distribution under leakage-free splits. Each model is conditioned on target composition and atom count, isolating geometric extrapolation as the evaluation variable. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition separates boundary from bulk failures, and cross-metric sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) well-behaved models degrade by ~13% in global positional error beyond training radii, while divergent models show poor fidelity across scales, with local bond fidelity ranging from negligible degradation to over 2x error growth; (ii) no two architectures share a failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models follow the expected geometric scaling exponent alpha ~ 1/3, whose in-distribution fit predicts out-of-distribution error, making frontiers forecastable. Scaling MatterGen to its published parameter count stabilizes sampling but does not close the frontier, while DiffCSP remains unstable at published scale. These findings establish output scale as a first-class evaluation axis for geometric generative models. Code and data: https://github.com/KurbanIntelligenceLab/RADII.

URL PDF HTML ☆

赞 0 踩 0

2602.09276 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Effective Reasoning Chains Reduce Intrinsic Dimensionality

有效推理链降低内在维度

Archiki Prasad, Mandar Joshi, Kenton Lee, Mohit Bansal, Peter Shaw

发表机构 * Google DeepMind（谷歌深Mind）

AI总结本文通过内在维度量化推理链有效性，发现有效推理策略能降低任务内在维度，并在GSM8K上验证其与泛化性能的强负相关。

Comments ICML (spotlight) camera-ready; 22 pages, 3 figures

详情

AI中文摘要

思维链推理及其变体显著提升了语言模型在复杂推理任务上的性能，但不同策略促进泛化的精确机制仍不明确。虽然当前解释常指向增加测试时计算或结构引导，但建立这些因素与泛化之间一致、可量化的联系仍具挑战。本文中，我们将内在维度识别为表征推理链有效性的定量度量。内在维度量化了在给定任务上达到特定准确率阈值所需的最小模型维度数。通过固定模型架构并改变不同推理策略下的任务表述，我们证明有效推理策略持续降低任务的内在维度。在GSM8K上使用Gemma-3 1B和4B验证这一点，我们观察到推理策略的内在维度与其在分布内和分布外数据上的泛化性能之间存在强负相关。我们的发现表明，有效推理链通过使用更少参数更好地压缩任务来促进学习，为分析推理过程提供了新的定量度量。

英文摘要

Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increased test-time computation or structural guidance, establishing a consistent, quantifiable link between these factors and generalization remains challenging. In this work, we identify intrinsic dimensionality as a quantitative measure for characterizing the effectiveness of reasoning chains. Intrinsic dimensionality quantifies the minimum number of model dimensions needed to reach a given accuracy threshold on a given task. By keeping the model architecture fixed and varying the task formulation through different reasoning strategies, we demonstrate that effective reasoning strategies consistently reduce the intrinsic dimensionality of the task. Validating this on GSM8K with Gemma-3 1B and 4B, we observe a strong inverse correlation between the intrinsic dimensionality of a reasoning strategy and its generalization performance on both in-distribution and out-of-distribution data. Our findings suggest that effective reasoning chains facilitate learning by better compressing the task using fewer parameters, offering a new quantitative metric for analyzing reasoning processes.

URL PDF HTML ☆

赞 0 踩 0

2602.08964 2026-06-01 cs.LG cs.AI cs.CL cs.CY 版本更新

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

语言模型智能体中目标导向性的行为与表征评估

Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； New York University（纽约大学）； Indiana University, Bloomington（印第安纳大学，布卢明顿）； Northeastern University（东北大学）； University College London（伦敦大学学院）

AI总结本文提出一种结合行为评估与内部表征可解释性分析的目标导向性评估框架，并以LLM智能体在2D网格世界中的导航为例，验证了其行为与表征的一致性。

Comments Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

理解智能体的目标有助于解释和预测其行为，但目前尚无可靠的方法来归因智能系统的目标。我们提出一个评估目标导向性的框架，该框架将行为评估与基于可解释性的模型内部表征分析相结合。作为案例研究，我们考察了一个在二维网格世界中导航至目标状态的LLM智能体。在行为上，我们评估智能体在不同网格大小、障碍物密度和目标结构下的最优策略，发现其性能随任务难度扩展，同时对保持难度的变换和多目标结构具有鲁棒性。然后，我们使用探测方法解码环境及多步行动计划的内部表征。我们发现，LLM智能体非线性地编码了一个粗略的空间地图，保留了关于其位置和目标位置的任务相关近似线索；其行动与这些内部表征大致一致；推理过程重新组织这些表征，从空间线索转向即时行动选择。我们的研究结果支持这样的观点：除了行为评估之外，还需要内省检查来表征智能体如何表示和追求其目标。

英文摘要

Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study, we examine an LLM agent navigating a 2D grid world towards a goal state. Behaviourally, we evaluate the agent against optimal policies across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and multi-goal structures. We then use probing methods to decode internal representations of the environment and multi-step action plans. We find that the LLM agent non-linearly encodes a coarse spatial map, preserving approximate task-relevant cues about its position and the goal location; that its actions are broadly consistent with these internal representations; and that reasoning reorganises them, shifting from spatial cues towards immediate action selection. Our findings support the view that introspective examination is required beyond behavioural evaluations to characterise how agents represent and pursue their objectives.

URL PDF HTML ☆

赞 0 踩 0

2602.08267 2026-06-01 cs.LG cs.AI 版本更新

Inverting Data Transformations via Diffusion Sampling

通过扩散采样逆变换数据变换

Jinwoo Kim, Sékou-Oumar Kaba, Jiyun Park, Seunghoon Hong, Siamak Ravanbakhsh

发表机构 * Mila - Quebec Artificial Intelligence Institute, Montr\'eal, Canada ； School of Computer Science, McGill University, Montr\'eal, Canada

AI总结提出一种在一般李群上通过扩散采样逆变换未知变换的方法，用于恢复原始数据分布，并在测试时等变性应用中提升预训练神经网络的鲁棒性。

Comments 31 pages, 11 figures

详情

AI中文摘要

我们研究了一般李群上的变换逆问题：一个数据被未知群元素变换，目标是恢复一个逆变换，将其映射回原始数据分布。这种未知变换在机器学习和科学建模中广泛出现，会显著扭曲观测数据。我们采用概率视角，将变换的后验建模为玻尔兹曼分布，由数据空间上的能量函数定义。为了从该后验中采样，我们引入了一个李群上的扩散过程，该过程保持所有更新在流形上，并且仅需在关联的李代数中进行计算。我们的方法，即变换逆能量扩散（TIED），依赖于一个新的平凡化目标分数恒等式，能够高效地对变换后验进行基于分数的采样。作为一个关键应用，我们专注于测试时等变性，其目标是提高预训练神经网络对输入变换的鲁棒性。在图像单应性和PDE对称性上的实验表明，TIED可以在测试时将变换后的输入恢复到训练分布，表现出优于强规范化和采样基线的性能。代码可在 https://github.com/jw9730/tied 获取。

英文摘要

We study the problem of transformation inversion on general Lie groups: a datum is transformed by an unknown group element, and the goal is to recover an inverse transformation that maps it back to the original data distribution. Such unknown transformations arise widely in machine learning and scientific modeling, where they can significantly distort observations. We take a probabilistic view and model the posterior over transformations as a Boltzmann distribution defined by an energy function on the data space. To sample from this posterior, we introduce a diffusion process on Lie groups that keeps all updates on-manifold and only requires computations in the associated Lie algebra. Our method, Transformation-Inverting Energy Diffusion (TIED), relies on a new trivialized target-score identity that enables efficient score-based sampling of the transformation posterior. As a key application, we focus on test-time equivariance, where the objective is to improve the robustness of pretrained neural networks to input transformations. Experiments on image homographies and PDE symmetries demonstrate that TIED can restore transformed inputs to the training distribution at test time, showing improved performance over strong canonicalization and sampling baselines. Code is available at https://github.com/jw9730/tied.

URL PDF HTML ☆

赞 0 踩 0

2506.00175 2026-06-01 cs.LG cs.AI 版本更新

PAC-Bayesian 强化学习训练可泛化策略

Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, Ryan Horache, Nadia Kabachi, Omar Rivasplata

发表机构 * Université Claude Bernard Lyon 1, LIRIS, UMR CNRS 5205, France（里尔一大学，LIRIS，法国CNRS 5205）

AI总结提出一种新的 PAC-Bayesian 泛化界，通过链的混合时间显式考虑数据中的马尔可夫依赖性，并基于此设计 PB-SAC 算法以优化该界指导探索，在连续控制任务中提供有意义的置信度证书且保持竞争性能。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Camera-ready version

2602.06902 2026-06-01 cs.LG stat.ML 版本更新

Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory

无参数动态遗憾：时变移动成本、延迟反馈和记忆

Hao Qiu, Andrew Jacobsen, Emmanuel Esposito, Mengxiao Zhang

发表机构 * University of Iowa（爱荷华大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出一种新算法，在具有时变移动成本的在线凸优化中，首次实现了比较器自适应的动态遗憾界，并应用于延迟反馈和时变记忆问题。

Comments 28 pages; v2: ICML 2026

详情

AI中文摘要

在本文中，我们研究了具有移动成本的无约束在线凸优化（OCO）中的动态遗憾。具体来说，我们通过允许移动成本系数$λ_t$随时间任意变化来推广标准设置。我们的主要贡献是一种新颖的算法，该算法为此设置建立了第一个比较器自适应动态遗憾界，保证$\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$遗憾，其中$P_T$是比较器序列在$T$轮上的路径长度，$M$是最大比较器范数。我们的结果恢复了OCO中静态和动态遗憾的最优自适应率，作为所有轮次中$λ_t=0$的特例。为了展示我们结果的多功能性，我们考虑了两个应用：具有延迟反馈的OCO和具有时变记忆的OCO。我们表明这两个问题都可以转化为时变移动成本，特别是为延迟反馈设置建立了一种新颖的归约，这具有独立的意义。一个关键的观察是，我们的遗憾界中对移动成本的一阶依赖在实现两种设置中的最优比较器自适应动态遗憾保证中起着关键作用。

英文摘要

In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $λ_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$ regret, where $P_T$ is the path length of the comparator sequence over $T$ rounds and $M$ is the maximal comparator norm. Our result recovers the optimal adaptive rates for both static and dynamic regret in OCO as the special case where $λ_t=0$ for all rounds. To demonstrate the versatility of our results, we consider two applications: OCO with delayed feedback and OCO with time-varying memory. We show that both problems can be translated into time-varying movement costs, establishing a novel reduction specifically for the delayed feedback setting that is of independent interest. A crucial observation is that the first-order dependence on movement costs in our regret bound plays a key role in enabling optimal comparator-adaptive dynamic regret guarantees in both settings.

URL PDF HTML ☆

赞 0 踩 0

2602.00942 2026-06-01 cs.LG 版本更新

SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference

SALAAD: 基于ADMM的稀疏低秩适配用于大语言模型推理

Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach

发表机构 * ETH Zurich（苏黎世联邦理工学院）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； École polytechnique fédérale de Lausanne (EPFL)（洛桑联邦理工学院（EPFL））

AI总结提出SALAAD框架，通过增广拉格朗日方法在训练中诱导稀疏低秩结构，实现模型容量灵活控制，降低部署内存且无需重训。

详情

AI中文摘要

现代大型语言模型越来越多地在计算和内存限制下部署，使得模型容量的灵活控制成为核心挑战。虽然稀疏和低秩结构自然地权衡了容量和性能，但现有方法通常依赖于忽略层和矩阵异质性的启发式设计，或需要特定于模型的架构修改。我们提出了SALAAD，一个适用于不同模型架构的即插即用框架，在训练过程中诱导稀疏和低秩结构。通过在增广拉格朗日框架下制定结构化权重学习，并引入自适应控制器动态平衡训练损失和结构约束，SALAAD保持了标准训练动态的稳定性，同时实现了对训练过程中有效模型容量演变的显式控制。跨模型规模的实验表明，SALAAD在部署期间显著减少了内存消耗，同时实现了与特设方法相当的性能。此外，单次训练运行产生了一个连续谱的模型容量，使得能够在不同的内存预算下实现平滑和弹性的部署，而无需重新训练。

英文摘要

Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structures naturally trade off capacity and performance, existing approaches often rely on heuristic designs that ignore layer and matrix heterogeneity or require model-specific architectural modifications. We propose SALAAD, a plug-and-play framework applicable to different model architectures that induces sparse and low-rank structures during training. By formulating structured weight learning under an augmented Lagrangian framework and introducing an adaptive controller that dynamically balances the training loss and structural constraints, SALAAD preserves the stability of standard training dynamics while enabling explicit control over the evolution of effective model capacity during training. Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment while achieving performance comparable to ad-hoc methods. Moreover, a single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets without the need for retraining.

URL PDF HTML ☆

赞 0 踩 0

2601.01754 2026-06-01 cs.LG cs.CC cs.CL cs.FL 版本更新

Context-Free Recognition with Transformers

使用Transformer进行上下文无关语言识别

Selim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill

发表机构 * ETH Zürich（苏黎世联邦理工学院）； Boston University（波士顿大学）； Allen Institute for AI（人工智能研究院）

AI总结本文证明循环Transformer通过O(log N)层和O(N^6)填充符号可识别所有上下文无关语言，并针对无歧义子类将填充需求降至O(N^3)。

详情

AI中文摘要

Transformer在处理符合某种语法的良好形式输入（如自然语言和代码）的任务中表现出色。然而，它们如何处理语法句法仍不清楚。事实上，在标准复杂性猜想下，标准Transformer无法识别上下文无关语言（CFL）——一种描述句法的规范形式，甚至无法识别正则语言（CFL的子类）。过去的工作表明，O(log(N))循环层（相对于输入长度N）允许Transformer识别正则语言，但循环Transformer识别上下文无关语言的问题仍然开放。在这项工作中，我们证明具有O(log(N))循环层和O(N^6)填充符号的循环Transformer可以识别所有CFL。然而，使用O(N^6)填充符号的训练和推理可能不切实际。幸运的是，我们表明，对于无歧义CFL等自然子类，Transformer上的识别问题变得更加易处理，只需要O(N^3)填充。实验上，循环和填充Transformer在识别CFL方面比固定深度Transformer表现更好。总体而言，我们的结果揭示了Transformer识别CFL的复杂性：虽然一般识别可能需要难以处理的填充量，但无歧义性等自然约束产生了高效的识别算法。

英文摘要

Transformers excel empirically on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs. Past work has shown that $\mathcal{O}(\log(N))$ looping layers (w.r.t. input length $N$) allow transformers to recognize regular languages, but the question of context-free recognition with looped transformers remained open. In this work, we show that looped transformers with $\mathcal{O}(\log(N))$ looping layers and $\mathcal{O}(N^6)$ padding symbols can recognize all CFLs. However, training and inference with $\mathcal{O}(N^6)$ padding symbols is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(N^3)$ padding. Empirically, looped and padded transformers perform better than fixed-depth transformers in recognizing CFLs. Overall, our results shed light on the intricacy of CFL recognition by transformers: while general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.

URL PDF HTML ☆

赞 0 踩 0

2405.07836 2026-06-01 cs.LG stat.ME 版本更新

Forecasting with Hyper-Trees

超树预测

Alexander März, Kashif Rasul

发表机构 * Independent Researcher（独立研究者）； Morgan Stanley Research（摩根士丹利研究）

AI总结提出超树框架，通过梯度提升树学习目标时间序列模型（如ARIMA或指数平滑）的参数，结合决策树与经典预测模型，并引入混合架构解决高维参数估计的缩放限制。

Comments Gradient Boosted Trees, Hyper Models, Hybrid Models, Time Series Forecasting, Time-Varying Parameters

详情

AI中文摘要

我们引入超树作为一种新颖的框架，用于使用梯度提升树对时间序列数据进行建模。与直接预测时间序列的传统树方法不同，超树学习目标时间序列模型（如ARIMA或指数平滑）的参数，这些参数是特征的函数。然后，目标模型使用这些参数生成最终预测。我们的框架将决策树在表格数据上的有效性与经典预测模型相结合，从而将时间序列归纳偏差引入树模型。为了解决提升树在估计高维目标模型参数时的缩放限制，我们将决策树和神经网络结合在一个统一的框架中。在这种混合方法中，树从输入特征生成信息表示，然后浅层网络将其作为输入来学习时间序列模型的参数。通过我们的研究，我们探索了超树在各种预测任务中的有效性，并将基于树的建模扩展到时间序列分析中的传统用途之外。

英文摘要

We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series model, such as ARIMA or Exponential Smoothing, as functions of features. These parameters are then used by the target model to generate the final forecasts. Our framework combines the effectiveness of decision trees on tabular data with classical forecasting models, thereby inducing a time series inductive bias into tree-based models. To resolve the scaling limitations of boosted trees when estimating a high-dimensional set of target model parameters, we combine decision trees and neural networks within a unified framework. In this hybrid approach, the trees generate informative representations from the input features, which a shallow network then uses as input to learn the parameters of a time series model. With our research, we explore the effectiveness of Hyper-Trees across a range of forecasting tasks and extend tree-based modeling beyond its conventional use in time series analysis.

URL PDF HTML ☆

赞 0 踩 0

2601.19791 2026-06-01 cs.LG stat.ML 版本更新

To Grok Grokking: Provable Grokking in Ridge Regression

理解Grokking：岭回归中可证明的Grokking现象

Mingyue Xu, Gal Vardi, Itay Safran

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN, USA（普渡大学计算机科学系）； Department of Computer Science（计算机科学系）； Applied Mathematics, Weizmann Institute of Science, Israel（科学应用数学系，魏茨曼研究院）； Stein Faculty of Computer（斯坦因计算机科学学院）； Information Science, Ben-Gurion University of the Negev, Israel（信息科学系，本· Gurion 军事大学）

AI总结本文在经典岭回归设置中研究grokking现象，证明使用带权重衰减的梯度下降学习过参数化线性回归模型时，存在过拟合、泛化延迟和最终泛化误差任意小的三个阶段，并首次给出泛化延迟（grokking时间）的严格定量界，同时通过实验表明该界也适用于非线性神经网络。

详情

AI中文摘要

我们在经典岭回归设置中研究grokking现象，即过拟合后很久才出现泛化。我们证明了使用带权重衰减的梯度下降学习过参数化线性回归模型的端到端grokking结果。具体地，我们证明以下阶段发生：(i) 训练早期模型过拟合训练数据；(ii) 过拟合显现后长时间泛化性能差；(iii) 泛化误差最终变得任意小。此外，我们从理论和实验上表明，通过适当的超参数调优，可以以原则性的方式放大或消除grokking。据我们所知，这是首次以训练超参数表示的泛化延迟（我们称之为“grokking时间”）的严格定量界。最后，超越线性设置，我们实验证明我们的定量界也捕捉了非线性神经网络上grokking的行为。我们的结果表明，grokking不是深度学习固有的失败模式，而是特定训练条件的结果，因此不需要对模型架构或学习算法进行根本性改变来避免。

英文摘要

We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both theoretically and empirically, that grokking can be amplified or eliminated in a principled manner through proper hyperparameter tuning. To the best of our knowledge, these are the first rigorous quantitative bounds on the generalization delay (which we refer to as the "grokking time") in terms of training hyperparameters. Lastly, going beyond the linear setting, we empirically demonstrate that our quantitative bounds also capture the behavior of grokking on non-linear neural networks. Our results suggest that grokking is not an inherent failure mode of deep learning, but rather a consequence of specific training conditions, and thus does not require fundamental changes to the model architecture or learning algorithm to avoid.

URL PDF HTML ☆

赞 0 踩 0

2602.05649 2026-06-01 cs.LG 版本更新

End-to-End Compression for Tabular Foundation Models

表格基础模型的端到端压缩

Guri Zabërgja, Rafiq Kamel, Arlind Kadra, Christian M. M. Frey, Josif Grabocka

发表机构 * Department of Computer Science, Technical University of Nuremberg（纽伦堡技术大学计算机科学系）； Department of Computer Science University of Freiburg（弗赖堡大学计算机科学系）

AI总结提出TACO，一种端到端表格压缩模型，在潜在空间压缩训练数据，以解决表格Transformer在推理时间和内存上的二次复杂度问题，在TabArena基准上实现高达94倍加速和97%内存节省，且性能无明显下降。

Comments Accepted as Spotlight at ICML 2026

详情

AI中文摘要

长期以来，梯度提升决策树在表格数据上的主导地位最近受到了上下文学习表格基础模型的挑战。上下文学习方法通过将训练数据作为上下文来预测查询测试点，无需参数更新即可在一次前向传播中完成拟合和预测。尽管最近的表格基础模型达到了最先进的性能，但基于注意力机制的Transformer架构在数据集大小上具有二次复杂度，这增加了训练和推理时间的开销，并限制了模型处理大规模数据集的能力。在这项工作中，我们提出了TACO，一种端到端的表格压缩模型，它在潜在空间中压缩训练数据集。我们在TabArena基准上测试了我们的方法，与最先进的表格Transformer架构相比，我们的方法在推理时间上快了高达94倍，同时内存消耗减少了97%，且性能没有显著下降。最后，我们的方法不仅随着数据集规模的增大而更好地扩展，而且与其他基线相比也取得了更好的性能。

英文摘要

The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97\% less memory compared to the state-of-the-art tabular transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.

URL PDF HTML ☆

赞 0 踩 0

2512.14980 2026-06-01 cs.LG 版本更新

Softly Constrained Denoisers for Diffusion Models Applied to Partial Differential Equations

应用于偏微分方程的扩散模型的软约束去噪器

Victor M. Yeom-Song, Severi Rissanen, Arno Solin, Samuel Kaski, Mingfei Sun

发表机构 * ELLIS Institute Finland（芬兰ELLIS研究所）； Aalto University（阿alto大学）； University of Manchester（曼彻斯特大学）

AI总结提出在扩散模型的去噪器中引入基于偏微分方程的软归纳偏置，以在提高约束遵从性的同时保持对模型错误指定的适应性。

Comments 22 pages including appendix, 8 figures including appendix, preprint

2602.04737 2026-06-01 cs.LG 版本更新

Rationality Measurement and Theory for Reinforcement Learning Agents

强化学习智能体的理性度量与理论

Kejiang Qian, Amos Storkey, Fengxiang He

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结本文提出一套理性度量及其理论，用于评估强化学习智能体在部署中的行为理性，并分解理性风险差距为环境变化和算法泛化能力两部分。

详情

AI中文摘要

本文针对强化学习智能体提出了一套理性度量及其相关理论，该属性日益关键但鲜有探索。我们定义部署中的行动为完全理性，如果它在最陡方向上最大化隐藏的真实价值函数。策略行动与其理性对应物的期望价值差异，在部署轨迹上累积，被定义为期望理性风险；训练中的经验平均版本也被定义。它们的差异称为理性风险差距，被分解为（1）由训练和部署之间环境变化引起的外在成分，以及（2）由算法在动态环境中的泛化能力引起的内在成分。它们分别被（1）训练和部署中转移核与初始状态分布之间的$1$-Wasserstein距离，以及（2）价值函数类的经验Rademacher复杂度所上界。我们的理论提出了关于正则化（包括层归一化、$\ell_2$正则化和权重归一化）和领域随机化的益处，以及环境变化的危害的假设。实验与这些假设完全一致。代码可在https://github.com/EVIEHub/Rationality获取。

英文摘要

This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from regularisers (including layer normalisation, $\ell_2$ regularisation, and weight normalisation) and domain randomisation, as well as the harm from environment shifts. Experiments are in full agreement with these hypotheses. The code is available at https://github.com/EVIEHub/Rationality.

URL PDF HTML ☆

赞 0 踩 0

2506.05994 2026-06-01 cs.LG cs.AR cs.ET 版本更新

RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory

RETENTION: 基于内容可寻址存储器的资源高效树集成模型加速

Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo

发表机构 * Department of Computer Science and Information Engineering, National Taiwan University（国立台湾大学计算机科学与资讯工程学系）； IRIT, Université de Toulouse, Toulouse INP–UT3, CNRS（图卢兹大学IRIT实验室）； Lab-STICC, CNRS UMR 6285 , ENSTA, Institut Polytechnique de Paris（ENSTA巴黎理工学院Lab-STICC实验室）

AI总结提出RETENTION框架，通过迭代剪枝算法和树映射方案，显著减少内容可寻址存储器容量需求，实现资源高效的树集成模型加速。

Comments Under review by IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems

详情

DOI: 10.1109/TCAD.2026.3691288

AI中文摘要

尽管深度学习在处理非结构化数据方面展现了卓越的能力，但现代基于树的集成模型在从结构化数据中提取相关信息和学习方面仍然更胜一筹。虽然已有若干工作致力于加速树模型，但模型的固有特性对传统加速器构成了重大挑战。最近利用内容可寻址存储器（CAM）的研究为加速树模型提供了有前景的解决方案，然而现有设计存在内存消耗过大和利用率低的问题。本文通过引入RETENTION，一个端到端框架，显著降低了树模型推理的CAM容量需求。我们提出了一种迭代剪枝算法，该算法具有针对基于装袋模型（例如随机森林）的新颖剪枝准则，在确保受控精度下降的同时最小化模型复杂度。此外，我们提出了一种树映射方案，其中包含两种创新的数据放置策略，以缓解CAM中广泛使用的无关状态导致的内存冗余。实验结果表明，仅实施树映射方案即可将CAM容量需求降低1.46倍至21.30倍，而完整的RETENTION框架在精度损失小于3%的情况下实现了4.35倍至207.12倍的降低。这些结果表明，RETENTION在最小化CAM资源需求方面非常有效，为树模型加速提供了一种资源高效的方向。

英文摘要

Although deep learning has demonstrated remarkable capability in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don't care states in CAM. Experimental results show that implementing the tree mapping scheme alone reduces CAM capacity requirement by $1.46\times$ to $21.30 \times$, while the full RETENTION framework achieves $4.35\times$ to $207.12\times$ reduction with less than 3\% accuracy loss. These results demonstrate that RETENTION is highly effective in minimizing CAM resource demand, providing a resource-efficient direction for tree-based model acceleration.

URL PDF HTML ☆

赞 0 踩 0

2602.04107 2026-06-01 cs.LG cs.IT math.IT 版本更新

Supervised Learning as Lossy Compression: Characterizing Generalization and Sample Complexity via Finite Blocklength Analysis

监督学习作为有损压缩：通过有限块长分析刻画泛化与样本复杂度

Kosuke Sugiyama, Masato Uchida

发表机构 * Waseda University（早稻田大学）

AI总结本文通过将学习问题置于有损压缩框架中并应用有限块长分析，从信息论角度推导了固定随机学习算法及其最优采样策略的样本复杂度和泛化误差下界，显式分离了过拟合程度与归纳偏置-任务不匹配项。

Comments 40 pages, 1 figure

详情

AI中文摘要

本文通过将学习问题置于有损压缩的背景下并应用有限块长分析，提出了一种关于机器学习中泛化的新颖信息论视角。在我们的方法中，训练数据的采样形式上对应于编码过程，而模型构建对应于解码过程。通过利用有限块长分析，我们推导了固定随机学习算法及其相关最优采样策略的样本复杂度和泛化误差的下界。我们的界限明确地将学习算法的过拟合程度与其归纳偏置和任务之间的不匹配作为不同的项进行刻画。这种分离提供了相对于现有框架的显著优势。此外，我们分解了过拟合项，以显示其与信息论界限和稳定性理论中现有度量的理论联系，从而在我们的提议框架下统一了这些视角。

英文摘要

This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the sampling of training data formally corresponds to an encoding process, and the model construction to a decoding process. By leveraging finite blocklength analysis, we derive lower bounds on sample complexity and generalization error for a fixed randomized learning algorithm and its associated optimal sampling strategy. Our bounds explicitly characterize the degree of overfitting of the learning algorithm and the mismatch between its inductive bias and the task as distinct terms. This separation provides a significant advantage over existing frameworks. Additionally, we decompose the overfitting term to show its theoretical connection to existing metrics found in information-theoretic bounds and stability theory, unifying these perspectives under our proposed framework.

URL PDF HTML ☆

赞 0 踩 0

2602.04031 2026-06-01 cs.LG 版本更新

The Illusion of Generalization in Tabular Language Models

表格语言模型中的泛化错觉

Aditya Gorla, Ratish Puduppully

发表机构 * University of California, Los Angeles, USA（加州大学洛杉矶分校）； IT University of Copenhagen, Denmark（哥本哈根IT大学）

AI总结通过系统评估Tabula-8B在165个数据集上的表现，发现其声称的泛化能力主要源于评估伪影（如数据污染和格式熟悉度），而非真正的表格推理。

详情

Journal ref: In Proc. 43th International Conference on Machine Learning (ICML 2026)

AI中文摘要

表格语言模型（TLMs）据称在表格预测中实现了强大的泛化能力。我们对代表性TLM——Tabula-8B进行了系统性的重新评估，使用了UniPredict基准中的165个数据集。我们的研究揭示了三个发现。首先，二分类和多类别分类在多数类基线上实现了接近零的中位数提升，而强大的聚合性能完全由四分位数分类任务驱动。其次，表现最好的数据集存在普遍的数据污染，包括完整的训练-测试重叠和任务级泄露，这些污染规避了标准的去重方法。第三，在没有表格数据暴露的情况下进行指令微调，恢复了标准分类性能的92.2%，而在四分位数分类上，格式熟悉度缩小了71.3%的差距，剩余部分归因于污染数据集。这些发现表明，声称的泛化能力可能反映的是评估伪影，而非学到的表格推理。最后，我们提出了加强TLM评估的建议。

英文摘要

Tabular Language Models (TLMs) have been claimed to achieve strong generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utilizing 165 datasets from the UniPredict benchmark. Our investigation reveals three findings. First, binary and categorical classification achieve near-zero median lift over majority-class baselines and strong aggregate performance is driven entirely by quartile classification tasks. Second, top-performing datasets exhibit pervasive contamination, including complete train-test overlap and task-level leakage that evades standard deduplication. Third, instruction-tuning without tabular exposure recovers 92.2% of standard classification performance and on quartile classification, format familiarity closes 71.3% of the gap with the residual attributable to contaminated datasets. These findings suggest claimed generalization likely reflects evaluation artifacts rather than learned tabular reasoning. We conclude with recommendations for strengthening TLM evaluation.

URL PDF HTML ☆

赞 0 踩 0

2602.03896 2026-06-01 stat.ML cs.LG q-bio.NC 版本更新

A hitchhiker's guide to Poisson gradient estimation

泊松梯度估计的旅行者指南

Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Reality Labs, Meta（Meta现实实验室）； Aerospace Information Research Institute, Chinese Academy of Sciences（中国科学院航空信息研究所）； Redwood Center for Theoretical Neuroscience, UC Berkeley（伯克利大学理论神经科学中心）； VERSES AI Research Lab, Los Angeles, USA（洛杉矶VERSES AI研究实验室）

AI总结本文系统比较了指数到达时间模拟和Gumbel-SoftMax松弛两种方法，提出改进的EAT方法以降低偏差，并在泊松潜变量模型上验证其优越性能。

Comments Published at ICML2026 --- code: https://github.com/hadivafaii/PoissonGradientEstimation

详情

AI中文摘要

泊松分布潜变量模型在计算神经科学中广泛使用，但通过离散随机样本进行微分仍然具有挑战性。两种方法解决了这一问题：*指数到达时间*（EAT）模拟和*Gumbel-SoftMax*（GSM）松弛。我们首次对这些方法进行了系统比较，并为实践者提供了实用指导。我们的主要技术贡献是对EAT方法的修改，理论上保证了无偏的一阶矩（精确匹配发放率），并减少了二阶矩偏差。我们在分布保真度、梯度质量以及两个任务上的性能对这些方法进行了评估：（1）具有泊松潜变量的变分自编码器，以及（2）部分可观测的广义线性模型，其中必须从观测到的脉冲序列推断潜在的神经连接性。在所有指标上，我们修改后的EAT方法表现出更好的整体性能（通常与精确梯度相当），并且对超参数选择具有更高的鲁棒性。这些结果扩展到过度分散的负二项潜变量，其中修改后的EAT再次表现最佳。然而，只有GSM可以推广到任意非泊松分布，包括欠分散的情况。总之，我们的结果阐明了这些方法之间的权衡，并为使用泊松潜变量模型的实践者提供了具体建议。

英文摘要

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT) simulation and *Gumbel-SoftMax* (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. These results extend to over-dispersed Negative Binomial latents, where modified EAT again performs best. However, only GSM generalizes to arbitrary non-Poisson distributions, including the under-dispersed regime. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.

URL PDF HTML ☆

赞 0 踩 0

2602.03655 2026-06-01 cs.LG 版本更新

Sequential Group Composition: A Window into the Mechanics of Deep Learning

序列群组合：深度学习机制的一扇窗口

Giovanni Luca Marchetti, Daniel Kunin, Adele Myers, Francisco Acosta, Nina Miolane

发表机构 * KTH Royal Institute of Technology（皇家理工学院）

AI总结通过序列群组合任务，研究神经网络如何学习结构化运算，揭示群结构、编码统计和序列长度对学习的影响，并证明深度架构能显著改善宽度需求。

Comments Accepted at ICML 2026

详情

AI中文摘要

经过序列训练的神经网络如何获得执行结构化运算（如算术、几何和算法计算）的能力？为了深入了解这个问题，我们引入了序列群组合任务。在该任务中，网络接收来自有限群的元素序列（这些元素编码在实向量空间中），并必须预测它们的累积乘积。该任务可能对顺序敏感，且无法通过线性模型解决。我们的分析隔离了群结构、编码统计和序列长度在塑造学习中的作用。我们证明，从零初始化开始的两层网络一次学习群的一个不可约表示，顺序由编码的傅里叶统计决定。为了完美学习该任务，这些网络需要隐藏宽度随序列长度 $k$ 呈指数增长。相比之下，我们构建了利用结合律的更深层架构，显著改善了这种缩放：循环神经网络可以在 $k$ 步内顺序组合元素，而多层网络可以在 $\log k$ 层内并行组合相邻对。总体而言，序列群组合任务为深度学习机制提供了一个可处理的窗口。

英文摘要

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. This task can be order-sensitive and cannot be solved by a linear model. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks from vanishing initialization learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. To perfectly learn the task, these networks require a hidden width exponential in the sequence length $k$. In contrast, we construct deeper architectures that exploit associativity to dramatically improve this scaling: recurrent neural networks can compose elements sequentially in $k$ steps, while multilayer networks can compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.

URL PDF HTML ☆

赞 0 踩 0

2601.20789 2026-06-01 cs.CL cs.LG cs.SE 版本更新

SERA: Soft-Verified Efficient Repository Agents

SERA：软验证的高效仓库智能体

Ethan Shen, Daniel Tormoen, Saurabh Shah, Ali Farhadi, Tim Dettmers

发表机构 * Allen Institute for AI（艾伦人工智能研究所）； University of Washington（华盛顿大学）； Carnegie Mellon University（卡内基梅隆大学）； Paul G. Allen School of Computer Science and Engineering（保罗·G·艾伦计算机科学与工程学院）； Allen Institute of Artificial Intelligence（艾伦人工智能研究所）； Machine Learning Department（机器学习系）

AI总结提出SERA方法，通过软验证生成（SVG）高效训练编码智能体，使其快速适应私有代码库，在开源模型中取得领先性能且成本极低。

Comments 21 main pages, 6 pages appendix

详情

AI中文摘要

开源编码智能体应比闭源系统具有根本优势，因为它们可以专门化到私有代码库，将仓库特定信息直接编码在其权重中。然而，训练的成本和复杂性一直使这一优势停留在理论层面。我们提出了软验证高效仓库智能体（SERA），一种高效的编码智能体训练方法，能够快速、廉价地创建专门化到私有代码库的智能体。利用软验证生成（SVG），我们可以从任何代码仓库生成数千条轨迹，而无需单元测试。除了仓库专门化，我们将SVG应用于更大的代码库语料库，生成了超过200,000条合成轨迹。仅使用监督微调（SFT），SERA在全开源（开放数据、方法、代码）模型中取得了领先结果，同时匹配了如Devstral-Small-2等开源权重模型的性能。创建SERA模型的成本比强化学习便宜26倍，比先前达到同等性能的合成数据方法便宜57倍。我们利用数据集提供了关于训练编码智能体的缩放定律、消融实验和混淆因素的详细分析。总体而言，我们相信我们的工作将极大加速开源编码智能体的研究，并展示能够适应私有代码库的开源模型的优势。我们将SERA作为Ai2开源编码智能体系列的第一个模型发布，同时公开所有代码、数据和Claude Code集成，以支持研究社区。

英文摘要

Open-weight coding agents should hold a fundamental advantage over closed-source systems because they can specialize to private codebases, encoding repository-specific information directly in their weights. Yet the cost and complexity of training has kept this advantage theoretical until now. We present Soft-Verified Efficient Repository Agents (SERA), an efficient method for training coding agents that enables the rapid and cheap creation of agents specialized to private codebases. Using Soft Verified Generation (SVG), we generate thousands of trajectories from any code repository, without requiring unit tests. Beyond repository specialization, we apply SVG to a larger corpus of codebases, generating 200,000+ synthetic trajectories. Using only supervised finetuning (SFT), SERA achieves leading results among fully open-source (open data, method, code) models while matching the performance of open-weight models like Devstral-Small-2. Creating SERA models is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. We use our dataset to provide detailed analysis of scaling laws, ablations, and confounding factors for training coding agents. Overall, we believe our work will greatly accelerate research on open coding agents and showcase the advantage of open-source models that can adapt to private codebases. We release SERA as the first model in Ai2's Open Coding Agents series, along with all our code, data, and Claude Code integration to support the research community.

URL PDF HTML ☆

赞 0 踩 0

2510.00845 2026-06-01 cs.LG cs.AI cs.CL 版本更新

深入Kronecker适配器：组件设计至关重要

Jiayu Bai, Danchen Yu, Zhenyu Liao, TianQi Hou, Feng Zhou, Robert C. Qiu, Zenan Ling

发表机构 * School of Electronic Information and Communications, Huazhong University of Science and Technology（华中科技大学电子信息学院）； Huawei（华为）； Center for Applied Statistics and School of Statistics, Renmin University of China（中国人民大学应用统计中心和统计学院）

AI总结本文通过分析Kronecker适配器的组件维度和数量，提出组件设计的Kronecker适配器（CDKA），并给出参数预算感知的配置指南和训练稳定策略，实验证明其有效性。

详情

AI中文摘要

Kronecker适配器已成为微调大规模模型的一种有前景的方法，通过可调组件结构实现高秩更新。然而，现有工作大多将组件结构视为固定或启发式设计选择，对Kronecker组件的维度和数量探索不足。在本文中，我们确定组件结构是控制Kronecker适配器能力的关键因素。我们对Kronecker组件的维度和数量进行了细粒度分析。特别地，我们展示了Kronecker适配器与全微调之间的对齐取决于组件配置。在这些见解的指导下，我们提出了组件设计的Kronecker适配器（CDKA）。我们进一步提供了参数预算感知的配置指南和针对实际部署的定制训练稳定策略。跨各种架构和模态的实验证明了CDKA的有效性。代码可在https://github.com/rainstonee/CDKA获取。

英文摘要

Kronecker adapters have emerged as a promising approach for fine-tuning large-scale models, enabling high-rank updates through tunable component structures. However, existing work largely treats the component structure as a fixed or heuristic design choice, leaving the dimensions and number of Kronecker components underexplored. In this paper, we identify component structure as a key factor governing the capacity of Kronecker adapters. We perform a fine-grained analysis of both the dimensions and number of Kronecker components. In particular, we show that the alignment between Kronecker adapters and full fine-tuning depends on component configurations. Guided by these insights, we propose Component Designed Kronecker Adapters (CDKA). We further provide parameter-budget-aware configuration guidelines and a tailored training stabilization strategy for practical deployment. Experiments across various architectures and modalities demonstrate the effectiveness of CDKA. Code is available at https://github.com/rainstonee/CDKA.

URL PDF HTML ☆

赞 0 踩 0

2602.01186 2026-06-01 cs.LG cs.AI 版本更新

The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics

高斯头OFL系列：基于客户端全局统计的一次性联邦学习

Fabio Turazza, Marco Picone, Marco Mamei

发表机构 * Department of Sciences and Methods for Engineering（工程科学与方法系）； Artificial Intelligence Research and Innovation Center（人工智能研究与创新中心）； University of Modena and Reggio Emilia（摩德纳和雷吉奥艾米利亚大学）

AI总结提出高斯头OFL系列方法，通过客户端仅传输每类计数和一二阶矩，服务器利用闭式高斯头、FisherMix和Proto-Hyper三种组件构建模型，实现严格无数据的一次性联邦学习，在强非独立同分布下达到最先进鲁棒性和准确性。

Comments Accepted at the International Conference on Learning Representations (ICLR) 2026 - Final Version

详情

AI中文摘要

经典联邦学习依赖于服务器与客户端之间多轮迭代的模型交换和聚合过程，存在高通信成本和重复模型传输带来的隐私风险。相比之下，一次性联邦学习（OFL）通过将通信减少到单轮来缓解这些限制，从而降低开销并增强实际部署能力。然而，现有大多数一次性方法仍然不切实际或受限，例如，它们通常依赖公共数据集的可用性、假设同质客户端模型，或需要上传额外数据或模型信息。为克服这些问题，我们引入了高斯头OFL（GH-OFL）系列，这是一套一次性联邦方法，假设预训练嵌入具有类条件高斯性。客户端仅传输充分统计量（每类计数和一阶/二阶矩），服务器通过三个组件构建头部：（i）直接从接收统计量计算的闭式高斯头（NB/LDA/QDA）；（ii）FisherMix，一种在估计的Fisher子空间中采样的合成样本上训练的带余弦边界的线性头；以及（iii）Proto-Hyper，一种轻量级低秩残差头，通过知识蒸馏在这些合成样本上细化高斯logits。在我们的实验中，GH-OFL方法在强非独立同分布偏移下提供了最先进的鲁棒性和准确性，同时保持严格无数据。

英文摘要

Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds heads via three components: (i) Closed-form Gaussian heads (NB/LDA/QDA) computed directly from the received statistics; (ii) FisherMix, a linear head with cosine margin trained on synthetic samples drawn in an estimated Fisher subspace; and (iii) Proto-Hyper, a lightweight low-rank residual head that refines Gaussian logits via knowledge distillation on those synthetic samples. In our experiments, GH-OFL methods deliver state-of-the-art robustness and accuracy under strong non-IID skew while remaining strictly data-free.

URL PDF HTML ☆

赞 0 踩 0

2601.13433 2026-06-01 cs.CL cs.LG 版本更新

Who Endorsed It? Measuring Authority Bias Across Expertise Levels in Language Models

谁背书了它？测量语言模型中跨专业水平的权威偏差

Priyanka Mary Mammen, Emil Joswin, Shankar Venkitachalam

发表机构 * UMass Amherst（马萨诸塞大学阿默斯特分校）； Independent Research（独立研究）

AI总结研究语言模型在推理任务中是否因背书来源的专业水平而产生系统性偏差，发现模型对高权威来源的错误背书更易受影响，导致准确率下降和错误答案置信度增加，但可通过机制干预减轻偏差。

详情

AI中文摘要

先前研究表明，语言模型在推理任务上的表现可能受到建议、提示和背书的影响。然而，背书来源可信度的影响仍未充分探索。我们调查语言模型是否根据背书提供者的感知专业水平表现出系统性偏差。跨越数学、法律和医学推理的4个数据集，我们使用代表每个领域四个专业水平的角色评估了11个模型。我们的结果表明，随着来源专业水平的增加，模型越来越容易受到错误/误导性背书的影响，更高权威的来源不仅导致准确率下降，还增加了对错误答案的置信度。我们还表明，这种权威偏差在模型内部被机制性地编码，并且模型可以被引导远离偏差，从而即使在专家给出误导性背书时也能提高其性能。

英文摘要

Prior research demonstrates that performance of language models on reasoning tasks can be influenced by suggestions, hints and endorsements. However, the influence of endorsement source credibility remains underexplored. We investigate whether language models exhibit systematic bias based on the perceived expertise of the provider of the endorsement. Across 4 datasets spanning mathematical, legal, and medical reasoning, we evaluate 11 models using personas representing four expertise levels per domain. Our results reveal that models are increasingly susceptible to incorrect/misleading endorsements as source expertise increases, with higher-authority sources inducing not only accuracy degradation but also increased confidence in wrong answers. We also show that this authority bias is mechanistically encoded within the model and a model can be steered away from the bias, thereby improving its performance even when an expert gives a misleading endorsement.

URL PDF HTML ☆

赞 0 踩 0

2601.22985 2026-06-01 cs.LG 版本更新

dgMARK: Decoding-Guided Watermarking for Diffusion Language Models

dgMARK: 面向扩散语言模型的解码引导水印方法

Pyo Min Hong, Albert No

发表机构 * Department of Computer Engineering, Hongik University（鸿基大学计算机工程系）； Department of Artificial Intelligence, Yonsei University（延世大学人工智能系）

AI总结提出dgMARK方法，通过引导离散扩散语言模型的去掩码顺序满足奇偶约束，实现无需显式重加权概率的文本水印嵌入，并利用滑动窗口检测器保证对编辑操作的鲁棒性。

Comments Accepted at ICML 2026. Project page: https://dgmark-watermarking.github.io

详情

AI中文摘要

我们提出了dgMARK，一种面向离散扩散语言模型（dLLMs）的解码引导水印方法。与自回归模型不同，dLLMs可以以任意顺序生成token。虽然理想的条件预测器应对此顺序不变，但实际dLLMs对去掩码顺序表现出强敏感性，这为水印创建了一个新通道。dgMARK将去掩码顺序引导至那些高奖励候选token满足由二元哈希引入的简单奇偶约束的位置，而不显式重新加权模型学习到的概率。该方法可与常见解码策略（例如基于置信度、熵和边界的排序）即插即用，并可通过一步前瞻变体增强。水印通过升高的奇偶匹配统计量检测，滑动窗口检测器确保在插入、删除、替换和释义等后期编辑操作下的鲁棒性。项目网站：https://dgmark-watermarking.github.io

英文摘要

We propose dgMARK, a decoding-guided watermarking method for discrete diffusion language models (dLLMs). Unlike autoregressive models, dLLMs can generate tokens in arbitrary order. While an ideal conditional predictor would be invariant to this order, practical dLLMs exhibit strong sensitivity to the unmasking order, creating a new channel for watermarking. dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint induced by a binary hash, without explicitly reweighting the model's learned probabilities. The method is plug-and-play with common decoding strategies (e.g., confidence, entropy, and margin-based ordering) and can be strengthened with a one-step lookahead variant. Watermarks are detected via elevated parity-matching statistics, and a sliding-window detector ensures robustness under post-editing operations including insertion, deletion, substitution, and paraphrasing. Project website: https://dgmark-watermarking.github.io

URL PDF HTML ☆

赞 0 踩 0

2601.22943 2026-06-01 cs.LG 版本更新

Scalable Topology-Preserving Graph Coarsening: Concepts and Algorithms

可扩展的拓扑保持图粗化：概念与算法

Xiang Wu, Rong-Hua Li, Xunkai Li, Kangfei Zhao, Hongchao Qin, Guoren Wang

发表机构 * Department of Computer Science, Beijing Institute of Technology（北京理工大学计算机科学系）

AI总结针对现有拓扑保持图粗化方法时间复杂度高的问题，提出基于代数拓扑的图强坍缩和图边坍缩概念的可扩展拓扑保持图粗化（STPGC），通过三种新算法消除主导节点和边，严格保持拓扑特征，并证明其保持GNN感受野，加速GNN训练。

2601.22787 2026-06-01 cs.LG 版本更新

Float8@2bits: Entropy Coding Enables Data-Free Model Compression

Float8@2bits: 熵编码实现无数据模型压缩

Patrick Putzky, Martin Genzel, Mattes Mollenhauer, Sebastian Schulze, Thomas Wollmann, Stefan Dietzel

发表机构 * Merantix Momentum GmbH

AI总结提出EntQuant框架，通过熵编码解耦数值精度与存储成本，在无需数据和微调的情况下实现2比特极端压缩，10分钟内压缩70B参数模型并保持性能。

Comments ICML 2026. Code available at https://github.com/merantix-momentum/entquant

详情

AI中文摘要

不要那么Stief！在Stiefel流形上学习KV缓存低秩近似

Luca Benfenati, Matteo Risso, Andrea Vannozzi, Ahmet Caner Yüzügüler, Lukas Cavigelli, Enrico Macii, Daniele Jahier Pagliari, Alessio Burrello

发表机构 * Department of Control and Computer Engineering, Politecnico di Torino（控制与计算机工程系，托里诺理工学院）； Huawei Zurich Research Center（华为苏黎世研究中心）

AI总结提出StiefAttention方法，通过在Stiefel流形上学习正交投影基并最小化解码器层输出重建误差，实现KV缓存压缩，优于现有SVD方法。

详情

AI中文摘要

键值（KV）缓存能够实现快速自回归解码，但在长上下文中成为高带宽内存（HBM）容量和带宽的主要瓶颈。一种常见的缓解方法是通过将每个头的矩阵投影到较低秩来压缩缓存的键和值，仅将投影存储在HBM中。然而，现有的训练后方法通常使用SVD风格的代理目标来拟合这些投影，这可能无法很好地反映softmax、值混合以及后续解码器层变换后的端到端重建。为此，我们引入了StiefAttention，一种训练后KV缓存压缩方法，通过直接最小化解码器层输出重建误差来学习正交投影基。StiefAttention还构建了候选秩上的逐层误差-秩分布，从而能够在用户指定的KV缓存预算下进行顺序秩分配。值得注意的是，在相同条件下，对于Llama3-8B，StiefAttention在C4困惑度上比EigenAttention高出4.2个点，在0-shot MMLU准确率上高出8.9个点，在等压缩率下，相对于原始解码器层输出，实现了更低的相对误差和更高的余弦相似度。

英文摘要

Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end reconstruction after softmax, value mixing, and subsequent decoder-layer transformations. For these reasons, we introduce StiefAttention, a post-training KV-cache compression method that learns orthonormal projection bases by directly minimizing decoder-layer output reconstruction error. StiefAttention additionally constructs layer-wise error-rank profiles over candidate ranks, enabling sequential rank allocation under a user-specified KV cache budget. Notably, on Llama3-8B under the same conditions, StiefAttention outperforms EigenAttention by $4.2$ points on C4 perplexity and $8.9$ points on 0-shot MMLU accuracy at iso-compression, yielding lower relative error and higher cosine similarity with respect to the original decoder-layer outputs.

URL PDF HTML ☆

赞 0 踩 0

2601.21645 2026-06-01 cs.LG math.CT math.RT 版本更新

Identifiable Equivariant Networks are Layerwise Equivariant

可识别的等变网络是逐层等变的

Vahid Shahverdi, Giovanni Luca Marchetti, Georg Bökman, Kathlén Kohn

发表机构 * Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden（瑞典皇家理工学院数学系）； University of Amsterdam, The Netherlands（荷兰阿姆斯特丹大学）

AI总结本文证明，在适当可识别性条件下，端到端等变网络的参数选择可使每一层在潜在空间上等变，从而从数学上解释了训练中权重等变结构的涌现。

Comments Accepted at ICML 2026

详情

AI中文摘要

我们研究了深度神经网络中端到端等变性与逐层等变性之间的关系。我们证明：对于一个端到端函数关于输入和输出空间上的群作用等变的网络，存在一个参数选择使得该网络产生相同的端到端函数，并且其每一层关于潜在空间上的某些群作用是等变的。我们的结果假设模型参数在适当意义下是可识别的。对于一大类网络，这种可识别性已在文献中得到确立，我们的结果立即适用；而对于其他网络，它仍是推测性的。我们发展的理论基于抽象形式化，因此与架构无关。总体而言，我们的结果为训练过程中神经网络权重中等变结构的涌现——这一在实践中持续观察到的现象——提供了数学解释。

英文摘要

We investigate the relation between end-to-end equivariance and layerwise equivariance in deep neural networks. We prove the following: For a network whose end-to-end function is equivariant with respect to group actions on the input and output spaces, there is a parameter choice yielding the same end-to-end function such that its layers are equivariant with respect to some group actions on the latent spaces. Our result assumes that the parameters of the model are identifiable in an appropriate sense. This identifiability property has been established in the literature for a large class of networks, to which our results apply immediately, while it is conjectural for others. The theory we develop is grounded in an abstract formalism, and is therefore architecture-agnostic. Overall, our results provide a mathematical explanation for the emergence of equivariant structures in the weights of neural networks during training -- a phenomenon that is consistently observed in practice.

URL PDF HTML ☆

赞 0 踩 0

2601.20774 2026-06-01 cs.LG 版本更新

When More Data Doesn't Help: Limits of Adaptation in Multitask Learning

当更多数据无济于事：多任务学习中适应的极限

Steve Hanneke, Mingyue Xu

发表机构 * Department of Computer Science, Purdue University, West Lafayette, IN, USA（计算机科学系，普渡大学，西拉法济，印第安纳州，美国）

AI总结本文通过建立更强的适应性不可能性结果，证明即使每个任务的数据量任意大，多任务学习仍然存在统计极限，无法通过聚合样本克服。

详情

AI中文摘要

多任务学习及相关框架在现代应用中取得了巨大成功。在多任务学习问题中，我们有一组从相关源任务收集的异构数据集，并希望提高性能，超过单独解决每个任务所能达到的效果。arXiv:2006.15785 的最新工作表明，在无法访问分布信息的情况下，只要每个任务的样本量有界，任何基于聚合样本的算法都无法保证最优风险。在本文中，我们专注于理解多任务学习的统计极限。我们超越了 arXiv:2006.15785 中的无免费午餐定理，建立了一个更强的适应性不可能性结果，该结果对每个任务的任意大样本量都成立。这一改进传达了一个重要信息：多任务学习的困难无法通过每个任务拥有大量数据来克服。我们还讨论了可能对未来研究感兴趣的最优适应性的概念。

英文摘要

Multitask learning and related frameworks have achieved tremendous success in modern applications. In multitask learning problem, we are given a set of heterogeneous datasets collected from related source tasks and hope to enhance the performance above what we could hope to achieve by solving each of them individually. The recent work of arXiv:2006.15785 has showed that, without access to distributional information, no algorithm based on aggregating samples alone can guarantee optimal risk as long as the sample size per task is bounded. In this paper, we focus on understanding the statistical limits of multitask learning. We go beyond the no-free-lunch theorem in arXiv:2006.15785 by establishing a stronger impossibility result of adaptation that holds for arbitrarily large sample size per task. This improvement conveys an important message that the hardness of multitask learning cannot be overcame by having abundant data per task. We also discuss the notion of optimal adaptivity that may be of future interests.

URL PDF HTML ☆

赞 0 踩 0

2601.20076 2026-06-01 math.OC cs.LG 版本更新

用于蒸气压和气味阈值预测的安全多任务分子图网络

Shuang Wu, Meijie Wang, Lun Yu

发表机构 * Department of Civil, Environmental and Geomatic Engineering, University College London（伦敦大学学院土木、环境与地理工程系）； Metanovas Biotech, Inc.（MetaNovas生物技术公司）

AI总结提出一种安全多任务方法，以蒸气压为主任务、气味阈值为辅助任务，结合A20/E17分子图特征和PNA骨干网络，在Bemis-Murcko骨架划分下实现最优蒸气压泛化性能。

详情

DOI: 10.1016/j.chemolab.2026.105766

AI中文摘要

我们研究了气味相关属性建模中的两个重要任务：蒸气压（VP）和气味阈值（OP）。为了评估模型的分布外（OOD）能力，我们采用了Bemis-Murcko骨架划分。在特征方面，我们引入了丰富的A20/E17分子图特征（20维原子特征+17维键特征），并系统比较了GINE和PNA骨干网络。结果表明：对于VP，使用简单回归头的PNA实现了验证MSE≈0.21（归一化空间）；对于相同骨架划分下的OP单任务，使用A20/E17和鲁棒训练（Huber/winsor）实现了验证MSE≈0.60-0.61。对于多任务训练，我们提出了一种**“安全多任务”**方法：以VP为主任务，OP为辅助任务，使用延迟激活+梯度裁剪+小权重，这避免了对主任务的损害，同时获得了最佳的VP泛化性能。本文提供了完整的可重复实验、消融研究和误差相似性分析，同时讨论了数据噪声的影响和方法的局限性。

英文摘要

We investigate two important tasks in odor-related property modeling: Vapor Pressure (VP) and Odor Threshold (OP). To evaluate the model's out-of-distribution (OOD) capability, we adopt the Bemis-Murcko scaffold split. In terms of features, we introduce the rich A20/E17 molecular graph features (20-dimensional atom features + 17-dimensional bond features) and systematically compare GINE and PNA backbones. The results show: for VP, PNA with a simple regression head achieves Val MSE $\approx$ 0.21 (normalized space); for the OP single task under the same scaffold split, using A20/E17 with robust training (Huber/winsor) achieves Val MSE $\approx$ 0.60-0.61. For multitask training, we propose a **"safe multitask"** approach: VP as the primary task and OP as the auxiliary task, using delayed activation + gradient clipping + small weight, which avoids harming the primary task and simultaneously yields the best VP generalization performance. This paper provides complete reproducible experiments, ablation studies, and error-similarity analysis while discussing the impact of data noise and method limitations.

URL PDF HTML ☆

赞 0 踩 0

2601.16366 2026-06-01 cs.LG cs.SC 版本更新

Post-Training Neural Network Pruning using Graph Curvature

使用图曲率的训练后神经网络剪枝

Shuhang Tan, Jayson Sia, Paul Bogdan, Radoslav Ivanov

发表机构 * Rensselaer Polytechnic Institute（新罕布什尔理工学院）； University of Southern California（南加州大学）

AI总结提出基于Ollivier-Ricci曲率（ORC）的神经曲率（NC）概念，通过计算激活模式下的边曲率来识别神经网络中不重要的连接，实现高效剪枝。

详情

AI中文摘要

本文通过图论的视角为神经网络（NN）剪枝问题提供了新的视角。为了实现有效的剪枝，我们旨在识别主要的NN数据流以及相应的NN连接，这些连接对于完整模型的性能最重要和最不重要。与基于信息论的NN数据分析标准方法不同，我们采用了图曲率的概念，特别是Ollivier-Ricci曲率（ORC）。ORC已成功用于识别各种领域中的重要图边，如道路交通分析、生物网络和社交网络。特别是，具有负ORC的边被认为是瓶颈，因此对图的整体连通性至关重要，而正ORC的边则不那么重要。我们将这种直觉用于NN：（1）构建由NN结构诱导的图，并基于ORC引入神经曲率（NC）的概念；（2）根据一组输入示例的激活模式计算曲率；（3）证明NC可用于根据边对整体NN功能的重要性对边进行排序。我们通过在三个图像数据集（MNIST、CIFAR-10和CIFAR-100）上训练的各种中小型模型上进行剪枝实验来评估我们的方法。结果表明，与现有剪枝方法相比，我们的方法可以识别出更多不重要的边。

英文摘要

This paper provides a fresh view of the neural network (NN) pruning problem through the lens of graph theory. To achieve effective pruning, we aim to identify the main NN data flows and the corresponding NN connections that are most and least important for the performance of the full model. Unlike the standard approach to NN data flow analysis, which is based on information theory, we employ the notion of graph curvature, specifically Ollivier-Ricci curvature (ORC). ORC has been successfully used to identify important graph edges in various domains such as road traffic analysis, biological networks, and social networks. In particular, edges with negative ORC are considered bottlenecks and are therefore critical to the graph's overall connectivity, whereas positive-ORC edges are less essential. We use this intuition for NNs to (1) construct a graph induced by the NN structure and introduce the notion of neural curvature (NC) based on ORC; (2) calculate curvatures based on activation patterns for a set of input examples; and (3) demonstrate that NC can be used to rank edges according to their importance for overall NN functionality. We evaluate our method through pruning experiments on a variety of small and medium size models trained on three image datasets: MNIST, CIFAR-10, and CIFAR-100. The results indicate that our method can identify a larger number of unimportant edges compared to existing pruning methods.

URL PDF HTML ☆

赞 0 踩 0

2601.13704 2026-06-01 cs.SD cs.AI cs.LG eess.AS 版本更新

Performance and Complexity Trade-off Optimization of Speech Models During Training

训练过程中语音模型的性能与复杂度权衡优化

Esteban Gómez, Tom Backström

发表机构 * Department of Information and Communications Engineering, Aalto University（信息与通信工程系，艾尔托大学）

AI总结提出一种基于特征噪声注入的重新参数化技术，利用随机梯度下降方法在训练中联合优化语音模型的性能和计算复杂度，实现动态模型大小调整。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

在语音机器学习中，神经网络模型通常通过选择具有固定层大小和结构的架构来设计。这些模型随后被训练以最大化与任务目标相关的性能指标。虽然整体架构通常由任务的先验知识指导，但各层的大小往往是启发式选择的。然而，这种方法并不能保证性能与计算复杂度之间的最优权衡；因此，通常采用权重量化或模型剪枝等后处理方法以降低计算成本。这是因为随机梯度下降（SGD）方法只能优化可微函数，而影响计算复杂度的因素（如层大小和每秒浮点运算次数（FLOP/s））是不可微的，需要在训练过程中修改模型结构。我们提出了一种基于特征噪声注入的重新参数化技术，使得在训练过程中能够使用基于SGD的方法联合优化性能和计算复杂度。与传统的剪枝方法不同，我们的方法允许模型大小针对目标性能-复杂度权衡进行动态优化，而无需依赖启发式标准来选择要移除的权重或结构。我们通过三个案例研究证明了我们方法的有效性，包括一个合成示例和两个实际应用：语音活动检测和音频反欺骗。与我们的工作相关的代码已公开，以鼓励进一步研究。

英文摘要

In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure. These models are then trained to maximize performance on metrics aligned with the task's objective. While the overall architecture is usually guided by prior knowledge of the task, the sizes of individual layers are often chosen heuristically. However, this approach does not guarantee an optimal trade-off between performance and computational complexity; consequently, post hoc methods such as weight quantization or model pruning are typically employed to reduce computational cost. This occurs because stochastic gradient descent (SGD) methods can only optimize differentiable functions, while factors influencing computational complexity, such as layer sizes and floating-point operations per second (FLOP/s), are non-differentiable and require modifying the model structure during training. We propose a reparameterization technique based on feature noise injection that enables joint optimization of performance and computational complexity during training using SGD-based methods. Unlike traditional pruning methods, our approach allows the model size to be dynamically optimized for a target performance-complexity trade-off, without relying on heuristic criteria to select which weights or structures to remove. We demonstrate the effectiveness of our method through three case studies, including a synthetic example and two practical real-world applications: voice activity detection and audio anti-spoofing. The code related to our work is publicly available to encourage further research.

URL PDF HTML ☆

赞 0 踩 0

2510.01137 2026-06-01 cs.LG 版本更新

Re-examining Low Rank adaptation for private LLM fine-tuning

重新审视用于私有LLM微调的低秩适应

Ali Dadsetan, Frank Rudzicz

发表机构 * Dalhousie University（达尔豪斯大学）； Vector Institute（向量研究所）

AI总结研究差分隐私SGD中噪声导致的梯度奇异值膨胀问题，提出通过部分恢复原始奇异值分布来提升DP-SGD的样本效率。

详情

条件覆盖诊断用于共形预测

Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach

发表机构 * Sierra team, Inria Paris, France（Inria巴黎研究院法国团队）； Ecole Normale Supérieure, PSL Research University, Paris（巴黎高等师范学院PSL研究大学）； Soda team, Inria Paris-Saclay, France（Inria巴黎-萨克雷分校法国团队）； Departments of EECS（电子工程与计算机科学系）

AI总结提出将条件覆盖估计转化为分类问题，通过超额风险度量(ERT)来诊断共形预测的条件覆盖偏差，实验表明使用现代分类器比传统指标具有更高的统计功效。

详情

AI中文摘要

评估条件覆盖仍然是评估预测系统可靠性中最持久的挑战之一。尽管共形方法可以保证边际覆盖，但没有方法能保证产生具有正确条件覆盖的集合，这使得实践者无法清晰解释局部偏差。为了克服现有指标的样本低效和过拟合问题，我们将条件覆盖估计转化为一个分类问题。当且仅当某个分类器能够达到比目标覆盖更低的风险时，条件覆盖被违反。通过选择（适当的）损失函数，得到的风险差异给出了自然误覆盖度量（如L1和L2距离）的保守估计，甚至可以分离过覆盖和欠覆盖以及非恒定目标覆盖的影响。我们将得到的度量族称为目标覆盖的超额风险（ERT）。实验表明，使用现代分类器比基于简单分类器的现有指标（如CovGap）具有更高的统计功效。此外，我们使用我们的度量来基准测试不同的共形预测方法。最后，我们发布了ERT以及先前条件覆盖度量的开源软件包。这些贡献共同为理解、诊断和改进预测系统的条件可靠性提供了新视角。

英文摘要

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if some classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.

URL PDF HTML ☆

赞 0 踩 0

2512.11561 2026-06-01 cs.LG 版本更新

View Space: Learning Representation across Arbitrary Graphs

视图空间：跨任意图的学习表示

Dooho Lee, Myeong Kong, Minho Jeong, Jaemin Yoo

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea（韩国釜山科学技术院电子工程学院）； Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea（首尔国立大学计算机科学与工程系）

AI总结本文提出视图空间概念，通过图视图变换（GVT）实现跨任意图的归纳节点表示学习，并在节点分类任务中显著优于现有方法。

Comments Accepted to ICML 2026

详情

AI中文摘要

将预训练模型泛化到未见数据集而无需重新训练是基础模型的核心挑战。由于数据集间特征维度和语义的巨大差异，在数值数据上实现完全归纳推理尤为困难。我们观察到，在存在图结构的情况下，数值数据在特征空间之外还允许一个由结构诱导的独特表示轴，我们将其形式化为视图空间。该视图空间能够统一表示具有异构特征的图，并激发了图视图变换（GVT），这是一类可在任意图间共享的参数化映射。我们通过循环GVT实例化该框架，这是一种用于节点分类中完全归纳节点表示学习的架构。在OGBN-Arxiv上预训练并在27个基准上评估，循环GVT比先前的完全归纳图模型GraphAny高出8.93%，并超过12个单独调优的GNN至少3.30%。这些结果确立了视图空间作为跨异构特征空间图学习的原理性和实用基础。代码和检查点可在https://github.com/dooho00/graph-view-space获取。

英文摘要

Generalizing pretrained models to unseen datasets without retraining is a central challenge toward foundation models. Achieving fully inductive inference on numerical data is particularly difficult due to large variations in feature dimensionality and semantics across datasets. We observe that, in the presence of graph structure, numerical data admits a distinct structure-induced representational axis beyond the feature space, which we formalize as the view space. This view space enables a unified representation of graphs with heterogeneous features and motivates Graph View Transformation (GVT), a class of parametric mappings that can be shared across arbitrary graphs. We instantiate this framework with Recurrent GVT, an architecture for fully inductive node representation learning in node classification. Pretrained on OGBN-Arxiv and evaluated on 27 benchmarks, Recurrent GVT outperforms GraphAny, the prior fully inductive graph model, by +8.93%, and surpasses 12 individually tuned GNNs by at least +3.30%. These results establish the view space as a principled and practical foundation for learning across graphs with heterogeneous feature spaces. Code and checkpoints are available in https://github.com/dooho00/graph-view-space.

URL PDF HTML ☆

赞 0 踩 0

2512.05038 2026-06-01 cs.LG 版本更新

IntAttention: 面向高效边缘推理的全整数注意力流水线

Wanli Zhong, Haibo Feng, Zirui Zhou, Hanyang Peng, Shiqi Yu

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结针对Transformer在边缘设备上部署时softmax路径导致的数据类型转换瓶颈，提出IntAttention全整数注意力流水线，通过IndexSoftmax算子、稀疏感知裁剪、32项查找表近似和直接整数归一化，消除数据类型转换开销，在Armv8 CPU上实现高达3.7倍加速和61%能耗降低。

详情

AI中文摘要

在边缘设备上部署Transformer模型受到延迟和能量预算的限制。虽然INT8量化有效加速了主要的矩阵乘法，但它将softmax相关路径暴露为主要瓶颈。该阶段需要进行昂贵的反量化->softmax->再量化绕行，这可以占到总注意力延迟的65%，并破坏了边缘硬件效率至关重要的端到端整数数据流。为了解决这一限制，我们提出了IntAttention，这是第一个全整数注意力流水线，可作为无需训练的即插即用替代方案。我们方法的核心是IndexSoftmax，一种在整数域内完全替代浮点指数运算的硬件友好算子。IntAttention集成了稀疏感知裁剪、32项查找表近似和直接整数归一化，从而消除了注意力路径上的数据类型转换开销。在Armv8 CPU上的实验表明，与FP16基线相比，我们的方法实现了高达3.7倍的加速和61%的能耗降低，与传统的INT8注意力流水线相比，加速高达2.0倍。在多种语言和视觉模型以及额外的推理和长上下文评估中，IntAttention保持了强大的整体保真度，并展示了比现有基于LUT的softmax近似更有利的权衡。代码可在https://github.com/WanliZhong/IntAttention获取。

英文摘要

Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency. To address this limitation, we present IntAttention, the first fully integer attention pipeline that serves as a training-free drop-in replacement. At the core of our approach lies IndexSoftmax, a hardware-friendly operator that replaces floating-point exponentials entirely within the integer domain. IntAttention integrates sparsity-aware clipping, a 32-entry lookup table approximation, and direct integer normalization, thereby eliminating datatype conversion overhead along the attention path. Experiments on Armv8 CPUs show that our method achieves up to 3.7x speedup and 61% energy reduction over FP16 baselines, and up to 2.0x speedup over conventional INT8 attention pipelines. Across diverse language and vision models, as well as additional reasoning and long-context evaluations, IntAttention maintains strong overall fidelity and demonstrates a more favorable trade-off than existing LUT-based softmax approximations. Code is available at https://github.com/WanliZhong/IntAttention

URL PDF HTML ☆

赞 0 踩 0

2511.19513 2026-06-01 cs.LG 版本更新

Row-Stochastic Matrices Can Provably Outperform Doubly Stochastic Matrices in Decentralized Learning

行随机矩阵在去中心化学习中可证明优于双随机矩阵

Bing Liu, Boao Kong, Limin Lu, Kun Yuan, Chengcheng Zhao

发表机构 * College of Control Science and Engineering, Zhejiang University（浙江大学控制科学与工程学院）； Center for Data Science, Peking University（北京大学数据科学中心）； Center for Machine Learning Research, Peking University（北京大学机器学习研究中心）

AI总结本文通过加权希尔伯特空间框架，严格证明了行随机矩阵相比双随机矩阵在去中心化学习中具有更快的收敛速度，并给出了拓扑条件指导设计。

详情

AI中文摘要

去中心化学习通常涉及具有异构节点权重$λ$的加权全局损失。我们重新审视了两种融入这些权重的自然策略：(i) 将权重嵌入局部损失以保持均匀权重（从而得到双随机矩阵），以及(ii) 保留原始损失同时采用由$λ$诱导的行随机矩阵。尽管先前的工作表明两种策略都针对相同的$λ$加权全局损失，但尚不清楚欧几里得空间中的保证是否紧致，以及它们的表现有何根本差异。为了澄清这一点，我们开发了一个加权希尔伯特空间框架$L^2(λ;\\\mathbb{R}^d)$，并获得了比标准欧几里得分析严格更紧的收敛速率。在该几何中，行随机矩阵成为\\emph{自伴的}，而双随机矩阵则不是，从而产生了额外的\\emph{惩罚项}，放大了共识误差，进而减缓了收敛。因此，收敛差异不仅来自谱间隙，还来自这些惩罚项。然后，我们推导了行随机设计即使具有更小的谱间隙也能更快收敛的充分条件。最后，通过使用瑞利商和Loewner序特征值比较，我们进一步获得了保证这一优势的拓扑条件，并给出了实用的拓扑设计指南。

英文摘要

Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies target the same $λ$-weighted global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from standard Euclidean analysis. In this geometry, the row-stochastic matrix becomes \emph{self-adjoint} whereas the doubly stochastic one does not, creating additional \emph{penalty terms} that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.

URL PDF HTML ☆

赞 0 踩 0

2511.17826 2026-06-01 cs.LG cs.CL stat.ML 版本更新

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

跨张量并行大小的确定性推理，消除训练-推理不匹配

Ziyang Zhang, Xinheng Ding, Jiayi Yuan, Rixin Liu, Huizi Mao, Jiarong Xing, Zirui Liu

发表机构 * Independent Researcher（独立研究者）； University of Minnesota, Minneapolis, Minnesota, USA（明尼苏达大学）； Rice University, Houston, Texas, USA（里士满大学）； NVIDIA Corp., Santa Clara, California, USA（NVIDIA公司）

AI总结针对不同张量并行大小导致浮点运算非结合性引起的推理非确定性问题，提出基于树的核（TBIK）实现跨TP大小的比特级一致结果，消除RL训练中推理与训练引擎间的精度不匹配。

详情

AI中文摘要

确定性推理对于大型语言模型（LLM）应用（如LLM-as-a-judge评估、多智能体系统和强化学习（RL））日益关键。然而，现有的LLM服务框架表现出非确定性行为：当系统配置（例如张量并行（TP）大小、批大小）变化时，即使采用贪心解码，相同的输入也可能产生不同的输出。这是由于浮点运算的非结合性以及GPU间归约顺序不一致导致的。虽然先前的工作通过批不变核解决了与批大小相关的非确定性，但跨不同TP大小的确定性仍然是一个开放问题，特别是在RL设置中，训练引擎通常使用全分片数据并行（即TP=1），而部署引擎依赖多GPU TP以最大化推理吞吐量，从而在两者之间产生自然的不匹配。这种精度不匹配问题可能导致RL训练性能次优甚至崩溃。我们识别并分析了TP引起不一致的根本原因，并提出了基于树的核（TBIK），这是一组TP不变的矩阵乘法和归约原语，无论TP大小如何，都能保证比特级相同的结果。我们的关键见解是通过统一的层次二叉树结构对齐GPU内和GPU间的归约顺序。我们在Triton中实现了这些核，并将其集成到vLLM和FSDP中。实验证明，在不同TP大小下，确定性推理的概率发散为零，且具有比特级可重复性。此外，在采用不同并行策略的RL训练流程中，我们在vLLM和FSDP之间实现了比特级相同的结果。代码可在https://github.com/nanomaoli/llm_reproducibility获取。

英文摘要

Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs. While prior work has addressed batch-size-related nondeterminism through batch-invariant kernels, determinism across different TP sizes remains an open problem, particularly in RL settings, where the training engine typically uses Fully Sharded Data Parallel (i.e., TP = 1) while the rollout engine relies on multi-GPU TP to maximize the inference throughput, creating a natural mismatch between the two. This precision mismatch problem may lead to suboptimal performance or even collapse for RL training. We identify and analyze the root causes of TP-induced inconsistency and propose Tree-Based Invariant Kernels (TBIK), a set of TP-invariant matrix multiplication and reduction primitives that guarantee bit-wise identical results regardless of TP size. Our key insight is to align intra- and inter-GPU reduction orders through a unified hierarchical binary tree structure. We implement these kernels in Triton and integrate them into vLLM and FSDP. Experiments confirm zero probability divergence and bit-wise reproducibility for deterministic inference across different TP sizes. Also, we achieve bit-wise identical results between vLLM and FSDP in RL training pipelines with different parallel strategy. Code is available at https://github.com/nanomaoli/llm_reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2506.08255 2026-06-01 cs.LG cs.AI cs.CR 版本更新

SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense

SHIELD: 用于增量扩展学习防御的安全超网络

Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek

发表机构 * Jagiellonian University, Faculty of Mathematics and Computer Science（杰洛内维大学数学与计算机科学学院）； Jagiellonian University, Doctoral School of Exact and Natural Sciences（杰洛内维大学精确与自然科学研究博士学院）； Akces NCBR ； IDEAS Research Institute（IDEAS研究所）

AI总结提出一种结合区间边界传播（IBP）与超网络的框架SHIELD，通过生成任务特定参数和区间混合训练策略，实现可认证鲁棒的持续学习，在保持可扩展性的同时达到最优平均准确率。

Comments Accepted to CVPR 2026 (Findings track)

详情

AI中文摘要

在对抗条件下的持续学习仍然是一个开放问题，现有方法往往在鲁棒性、可扩展性或两者之间做出妥协。我们提出了一种新颖的框架，将区间边界传播（IBP）与基于超网络的架构相结合，以实现跨顺序任务的可认证鲁棒持续学习。我们的方法SHIELD通过一个共享的超网络生成任务特定的模型参数，该超网络仅依赖于紧凑的任务嵌入，从而消除了对重放缓冲区或完整模型副本的需求，并实现了高效的时间扩展。为了进一步增强鲁棒性，我们引入了区间混合（Interval MixUp），这是一种新颖的训练策略，它将表示为以MixUp点为中心的$\ell_{\infty}$球的虚拟示例混合。利用区间算术，该技术保证了可认证的鲁棒性，同时减轻了包裹效应，从而产生更平滑的决策边界。我们在多个基准测试上评估了SHIELD在强白盒对抗攻击（包括PGD和AutoAttack）下的表现。它持续优于现有的鲁棒持续学习方法，在保持可扩展性和认证性的同时，实现了最先进的平均准确率。这些结果向在对抗环境中实现实用且理论扎实的持续学习迈出了重要一步。

英文摘要

Continual learning under adversarial conditions remains an open problem, as existing methods often compromise either robustness, scalability, or both. We propose a novel framework that integrates Interval Bound Propagation (IBP) with a hypernetwork-based architecture to enable certifiably robust continual learning across sequential tasks. Our method, SHIELD, generates task-specific model parameters via a shared hypernetwork conditioned solely on compact task embeddings, eliminating the need for replay buffers or full model copies and enabling efficient over time. To further enhance robustness, we introduce Interval MixUp, a novel training strategy that blends virtual examples represented as $\ell_{\infty}$ balls centered around MixUp points. Leveraging interval arithmetic, this technique guarantees certified robustness while mitigating the wrapping effect, resulting in smoother decision boundaries. We evaluate SHIELD under strong white-box adversarial attacks, including PGD and AutoAttack, across multiple benchmarks. It consistently outperforms existing robust continual learning methods, achieving state-of-the-art average accuracy while maintaining both scalability and certification. These results represent a significant step toward practical and theoretically grounded continual learning in adversarial settings.

URL PDF HTML ☆

赞 0 踩 0

2511.17380 2026-06-01 cs.CV cs.LG 版本更新

Non-Parametric Probabilistic Robustness: A Conservative Risk Estimator under Unknown Perturbation Distributions

非参数概率鲁棒性：未知扰动分布下的保守风险估计

Zheng Wang, Yi Zhang, Siddartha Khastgir, Carsten Maple, Xingyu Zhao

发表机构 * WMG, University of Warwick, Coventry, United Kingdom（沃里克大学商学院，沃里克，英国）； Wuhan University, Wuhan, China（武汉大学，武汉，中国）

AI总结提出非参数概率鲁棒性（NPPR）度量，通过从数据中学习扰动分布，在分布不确定性下实现保守的概率鲁棒性估计，并基于高斯混合模型开发估计器。

详情

AI中文摘要

深度学习模型尽管取得了显著成功，但仍然容易受到微小输入扰动的影响，导致错误输出，这促使最近提出概率鲁棒性（PR）作为对抗鲁棒性（AR）的补充替代方案。然而，现有的PR公式假设扰动分布固定且已知，这在实践中是不现实的期望。为了解决这一限制，我们提出了非参数概率鲁棒性（NPPR），一种更实用的PR度量，不依赖于任何预定义的扰动分布。遵循统计建模中的非参数范式，NPPR直接从数据中学习优化的扰动分布，从而在分布不确定性下实现保守的PR评估。我们进一步开发了基于高斯混合模型（GMM）的NPPR估计器，涵盖了各种输入相关和输入无关的扰动场景。理论分析建立了AR、PR和NPPR之间的关系。在CIFAR-10、CIFAR-100和Tiny ImageNet上使用ResNet18/50、WideResNet50和VGG16的大量实验验证了NPPR作为更实用的鲁棒性度量，与假设最先进技术中使用的常见扰动分布相比，显示出保守（较低）的PR估计。

基于纳什谈判的稀疏混合专家模型专家合并

Dung V. Nguyen, Anh T. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Shiqi Jiang, Ethan Fetaya, Linh Duy Tran, Gal Chechik, Tan M. Nguyen

发表机构 * Department of Mathematics, National University of Singapore（新加坡国立大学数学系）； Viettel AI, Viettel Group（越南电信AI部门）； Faculty of Mathematics and Informatics, Hanoi University of Science and Technology（河内科学技术大学数学与信息学系）； Bar Ilan University, Israel（以色列巴伊兰大学）； AI Imaging Team, Data Solution Department, FPT Software Japan（日本FPT软件数据解决方案部门AI成像团队）

AI总结针对稀疏混合专家模型缺乏原则性加权机制的专家合并问题，提出基于纳什谈判的NAMEx框架，实现专家间更平衡高效的协作，在多项任务中优于现有方法。

Comments 10 pages in the main text. ICLR 2026 Poster

详情

AI中文摘要

现有的稀疏混合专家模型（SMoE）专家合并策略通常依赖于输入相关或输入无关的专家参数平均，但往往缺乏原则性的加权机制。在这项工作中，我们通过博弈论的视角重新解释专家合并，揭示了专家之间的合作与竞争动态。基于这一视角，我们引入了专家纳什合并（NAMEx），这是一个将纳什谈判融入合并过程的新框架，使专家之间能够实现更平衡和高效的协作。此外，我们将复杂动量纳入NAMEx，以加速专家传播，并提供了收敛的理论保证。在语言建模、文本分类、图像分类以及数据损坏下的零样本鲁棒性等广泛实验中，NAMEx始终优于竞争方法，同时与流行的MoE架构无缝集成。最后，我们通过将NAMEx应用于大规模系统（包括Qwen1.5-MoE (14B)和DeepSeek-MoE (16B)）展示了其可扩展性，在零样本和微调设置中均证明了其有效性。代码公开于：https://github.com/anh147/NAMEx。

英文摘要

Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modelling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx's scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings. The code is publicly available at: https://github.com/anh147/NAMEx.

URL PDF HTML ☆

赞 0 踩 0

2510.11683 2026-06-01 cs.LG cs.AI cs.CL 版本更新

关于通过鞅驱动的Fisher提示进行顺序测试时间自适应的技术说明

Behraj Khan, Tahir Qasim Syed

发表机构 * Institute of Business Administration（商业管理学院）

AI总结提出M-FISHER框架，通过指数鞅检测分布漂移并利用Fisher预条件更新实现稳定自适应，提供时间一致的错误控制保证和最优检测延迟。

详情

AI中文摘要

我们提出了M-FISHER的理论框架，这是一种用于流数据中顺序分布漂移检测和稳定自适应的方法。对于检测，我们从非一致性分数构建指数鞅，并应用Ville不等式获得关于误报控制的时间一致保证，确保在任何停止时间下的统计有效性。在持续漂移下，我们进一步将期望检测延迟界定为$\mathcal{O}(\log(1/δ)/Γ)$，其中$Γ$反映了漂移后的信息增益，从而将检测效率与分布散度联系起来。对于自适应，我们展示了提示参数的Fisher预条件更新实现了在分布流形上的自然梯度下降，产生局部最优更新，最小化KL散度同时保持稳定性和参数化不变性。总之，这些结果确立了M-FISHER作为一种在协变量漂移下的顺序决策中实现鲁棒、任意时间有效检测和几何稳定自适应的原则性方法。

英文摘要

We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/δ)/Γ)$, where $Γ$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.

URL PDF HTML ☆

赞 0 踩 0

2505.05168 2026-06-01 math.ST cs.LG stat.ML stat.TH 版本更新

Dynamical local Fréchet curve regression in manifolds

流形上的动态局部Fréchet曲线回归

M. D. Ruiz-Medina, A. Torres-Signes

发表机构 * organization= 1 University of Granada 2 University of M\'alaga , country= Spain

AI总结本文在可分离希尔伯特空间中推导了响应和回归变量的最小二乘局部线性Fréchet曲线预测器，并提出了基于加权Fréchet均值的流形内蕴局部线性Fréchet曲线预测器，证明了其渐近最优性。

Comments This paper is currently under journal second revision

详情

AI中文摘要

在温和条件下，本文推导了在可分离希尔伯特空间中评估的响应和回归变量的最小二乘局部线性Fréchet曲线预测器。我们获得了允许在向量函数的L^{2}空间中实现该局部线性Fréchet函数预测器的条件，该空间的值位于紧致黎曼流形上的时变切空间。其次，基于加权Fréchet均值方法，提出了在该流形上评估的内蕴局部线性Fréchet曲线预测器。证明了其渐近最优性。模拟研究和实际数据分析分析了两种预测器经验版本的有限样本性能，并与测地线Nadaraya-Watson型曲线预测器进行了比较。在实际数据分析中，基于NASA MAGSAT卫星的地心纬度和经度观测，对地球磁场的时变球坐标进行了函数预测。

英文摘要

Under mild conditions, this paper derives a least-squares local linear Fréchet curve predictor for response and regressor evaluated in a separable Hilbert space. We obtain the conditions allowing the implementation of this local linear Fréchet functional predictor in the ambient L^{2}-space of vector functions, with values in the time-varying tangent space on a compact Riemannian manifold. An intrinsic local linear Fréchet curve predictor evaluated in such a manifold is secondly proposed, based on a weighted Fréchet mean approach. Its asymptotical optimality is proved. The simulation study and real-data application analyze the finite-sample performance of the empirical versions of both predictors, compared with a geodesic Nadaraya-Watson-type curve predictor. In the real-data application, the functional prediction of the time-varying spherical coordinates of the Earth's magnetic field is addressed, from the observation of the geocentric latitude and longitude of the satellite NASA's MAGSAT spacecraft.

URL PDF HTML ☆

赞 0 踩 0

2510.02060 2026-06-01 cs.AI cs.LG 版本更新

ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection

ReTabAD: 恢复表格异常检测中语义上下文的基准

Sanghyu Yoon, Dongmin Kim, Suhee Yoon, Ye Seul Sim, Seungdong Yoa, Hye-Seung Cho, Soonyoung Lee, Hankook Lee, Woohyung Lim

发表机构 * LG AI Research, Seoul, South Korea（LG人工智能研究实验室，首尔，韩国）； Sungkyunkwan University, Suwon, South Korea（成均馆大学，水原，韩国）

AI总结针对现有表格异常检测基准缺乏语义上下文的问题，提出ReTabAD基准，通过丰富结构化文本元数据并集成零样本LLM框架，验证了语义上下文能提升检测性能和可解释性。

Comments Accepted to ICLR 2026

详情

AI中文摘要

在表格异常检测（AD）中，文本语义通常承载关键信号，因为异常的定义与特定领域的上下文紧密相关。然而，现有基准仅提供原始数据点，缺乏语义上下文，忽略了专家在实践中依赖的丰富文本元数据，如特征描述和领域知识。这一限制阻碍了研究灵活性，并阻止模型充分利用领域知识进行检测。ReTabAD通过恢复文本语义来解决这一差距，以实现上下文感知的表格AD研究。我们提供（1）20个精心策划的表格数据集，这些数据集丰富了结构化的文本元数据，以及最先进的AD算法的实现，包括经典方法、深度学习和基于LLM的方法，以及（2）一个零样本LLM框架，该框架利用语义上下文而无需特定任务训练，为未来研究建立了强大的基线。此外，本工作通过实验和分析提供了关于文本元数据在AD中的作用和实用性的见解。结果表明，语义上下文通过支持领域感知推理提高了检测性能并增强了可解释性。这些发现将ReTabAD确立为系统探索上下文感知AD的基准。

英文摘要

In tabular anomaly detection (AD), textual semantics often carry critical signals, as the definition of an anomaly is closely tied to domain-specific context. However, existing benchmarks provide only raw data points without semantic context, overlooking rich textual metadata such as feature descriptions and domain knowledge that experts rely on in practice. This limitation restricts research flexibility and prevents models from fully leveraging domain knowledge for detection. ReTabAD addresses this gap by restoring textual semantics to enable context-aware tabular AD research. We provide (1) 20 carefully curated tabular datasets enriched with structured textual metadata, together with implementations of state-of-the-art AD algorithms including classical, deep learning, and LLM-based approaches, and (2) a zero-shot LLM framework that leverages semantic context without task-specific training, establishing a strong baseline for future research. Furthermore, this work provides insights into the role and utility of textual metadata in AD through experiments and analysis. Results show that semantic context improves detection performance and enhances interpretability by supporting domain-aware reasoning. These findings establish ReTabAD as a benchmark for systematic exploration of context-aware AD.

URL PDF HTML ☆

赞 0 踩 0

2510.00419 2026-06-01 cs.LG 版本更新

Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

学习零阶优化器以微调大语言模型

Kairun Zhang, Haoyu Li, Yanjun Zhao, Yifan Sun, Huan Zhang

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出一种基于学习的零阶优化器ZO-Finetuner，通过紧凑且内存高效的设计自动学习高效扰动策略，实现大语言模型微调时避免反向传播并降低内存开销，在4个LLM和7个数据集上82.1%的任务-模型组合中优于现有零阶基线方法。

Comments ICML 2026

详情

AI中文摘要

零阶优化器最近成为微调大语言模型（LLM）的一种有吸引力的方法，因为它们避免了反向传播，并且相对于标准一阶训练可以大幅减少内存开销。然而，现有的零阶方法依赖于手工设计的静态采样策略，无法适应模型特定的结构。为了解决这个问题，我们提出了ZO-Finetuner，一种基于学习的零阶优化器，通过紧凑且内存高效的设计自动学习高效的扰动策略。基于少量基础LLM在多个任务上被重复微调这一事实，ZO-Finetuner支持一次性每模型训练，并在下游任务中以最小开销重用。因此，为给定LLM学习一次优化器并在不同下游任务中重用既是可行的也是高度可取的。相应地，ZO-Finetuner旨在通过支持一次性每模型训练且开销最小，将学习优化（L2L）扩展到基础模型时代。在4个LLM和7个数据集上的实验表明，ZO-Finetuner在82.1%的任务-模型组合中优于先前的零阶基线方法，从而展示了其在高效LLM微调中的强大性能和可扩展性。代码可在https://github.com/ASTRAL-Group/ZO_Fine_tuner找到。

英文摘要

Zeroth-order optimizers have recently emerged as an attractive approach for fine-tuning large language models (LLMs), as they avoid backpropagation and can substantially reduce memory overhead relative to standard first-order training. However, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO-Finetuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Motivated by the fact that a small set of base LLMs is repeatedly fine-tuned across tasks, ZO-Finetuner supports one-time per-model training and reuse across downstream tasks with minimal overhead. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO-Finetuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time per-model training with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO-Finetuner outperforms prior zeroth-order baselines in 82.1\% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. The code can be found in https://github.com/ASTRAL-Group/ZO_Fine_tuner.

URL PDF HTML ☆

赞 0 踩 0

2509.25906 2026-06-01 cs.LG 版本更新

Federated Learning with Enhanced Privacy via Model Splitting and Random Client Participation

通过模型拆分和随机客户端参与增强隐私的联邦学习

Yiwei Li, Shuai Wang, Zhuojun Tian, Xiuhua Wang, Shijian Su

发表机构 * School of Optoelectronic & Communication Engineering, Xiamen University of Technology（厦门理工学院光电信息与通信工程学院）； National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China（电子科技大学信息与通信国家重点实验室）； Division of Information Science and Engineering, KTH Royal Institute of Technology（皇家理工学院信息科学与工程系）； School of Cyber Science and Engineering, Huazhong University of Science and Technology（华中科技大学网络安全科学与工程学院）； School of Engineering, Huaqiao University（华侨大学工程学院）

AI总结提出MS-PAFL框架，通过将模型拆分为私有和公共子模型并仅向公共子模型注入噪声，结合随机客户端参与和本地数据子采样的隐私放大分析，在强隐私保证下实现更优的隐私-效用权衡。

Comments Accepted for publication in IEEE Transactions on Cognitive Communications and Networking

详情

DOI: 10.1109/TCCN.2026.3694029

AI中文摘要

联邦学习（FL）通常采用差分隐私（DP）来保护客户端数据，但隐私保证所需的附加噪声会显著降低模型精度。为解决这一挑战，我们提出了模型拆分隐私放大联邦学习（MS-PAFL），一种结合结构模型拆分与统计隐私放大的新颖框架。在该框架中，每个客户端的模型被划分为保留在本地私有子模型和用于全局聚合的公共子模型。校准的高斯噪声仅注入公共子模型，从而限制其不利影响，同时保留本地模型的效用。我们进一步提供了严格的理论分析，刻画了在该架构下通过随机客户端参与和本地数据子采样实现的联合隐私放大。分析给出了单轮和总隐私损失的紧界，表明MS-PAFL显著减少了满足目标隐私保护水平所需的噪声。大量实验验证了我们的理论发现，表明MS-PAFL始终获得更优的隐私-效用权衡，并能在强隐私保证下训练高精度模型。

英文摘要

Federated Learning (FL) often adopts differential privacy (DP) to protect client data, but the added noise required for privacy guarantees can substantially degrade model accuracy. To resolve this challenge, we propose model-splitting privacy-amplified federated learning (MS-PAFL), a novel framework that combines structural model splitting with statistical privacy amplification. In this framework, each client's model is partitioned into a private submodel, retained locally, and a public submodel, shared for global aggregation. The calibrated Gaussian noise is injected only into the public submodel, thereby confining its adverse impact while preserving the utility of the local model. We further present a rigorous theoretical analysis that characterizes the joint privacy amplification achieved through random client participation and local data subsampling under this architecture. The analysis provides tight bounds on both single-round and total privacy loss, demonstrating that MS-PAFL significantly reduces the noise necessary to satisfy a target privacy protection level. Extensive experiments validate our theoretical findings, showing that MS-PAFL consistently attains a superior privacy-utility trade-off and enables the training of highly accurate models under strong privacy guarantees.

URL PDF HTML ☆

赞 0 踩 0

2506.01467 2026-06-01 cs.LG cs.DM 版本更新

Feature-Aware (Hyper)graph Generation via Next-Scale Prediction

特征感知的（超）图生成：基于下一尺度预测

Dorian Gailhard, Enzo Tartaglione, Lirida Naviner, Jhony H. Giraldo

发表机构 * GitHub

AI总结提出FAHNES框架，通过层次化下一尺度预测联合生成图/超图的拓扑和特征，实现大规模带特征图/超图的高效生成。

详情

AI中文摘要

图生成模型在小型结构化数据上表现良好，但难以扩展到大型复杂结构。层次化方法提高了可扩展性，但通常忽略节点和边特征，而这些特征在实际应用中至关重要，特别是对于建模高阶关系的超图。在本文中，我们提出FAHNES（通过下一尺度预测进行特征感知的（超）图生成），这是一个层次化框架，可联合生成图和超图的拓扑与特征。FAHNES通过节点粗化和局部扩展构建多尺度表示，并由一种新颖的层次化尺度编码引导，该编码控制粒度并确保跨尺度一致性。在合成数据集、3D网格和图点云数据集上的实验表明，该方法在独特扩展到带特征的大规模图和超图的同时，实现了具有竞争力或最先进的性能。我们的代码是开源的。

英文摘要

Graph generative models perform well on small structured data but struggle to scale to large, complex structures. Hierarchical approaches improve scalability but often ignore node and edge features, which are critical in real-world applications, particularly for hypergraphs that model higher-order relationships. In this paper, we propose FAHNES (feature-aware (hyper)graph generation via next-scale prediction), a hierarchical framework that jointly generates topology and features for graphs and hypergraphs. FAHNES builds multi-scale representations through node coarsening and localized expansion, guided by a novel hierarchical scale encoding that controls granularity and ensures cross-scale consistency. Experiments on synthetic, 3D mesh, and graph point cloud datasets demonstrate competitive or state-of-the-art performance while uniquely scaling to featured large-scale graphs and hypergraphs. Our code is open source

URL PDF HTML ☆

赞 0 踩 0

2509.22335 2026-06-01 cs.LG cs.AI 版本更新

Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

深度持续学习中的谱坍缩导致塑性丧失

Arjun Prakash, Naicheng He, Kaicheng Guo, Saket Tiwari, Ruo Yu Tao, Tyrone Serapio, Amy Greenwald, George Konidaris

发表机构 * Department of Computer Science, Brown University（布朗大学计算机科学系）

AI总结研究深度神经网络在持续学习中塑性丧失的原因，发现新任务初始化时的Hessian谱坍缩是主要因素，并提出基于Kronecker分解的两种正则化方法以保持塑性。

详情

AI中文摘要

我们研究为什么深度神经网络在持续学习中会丧失塑性，从而在不重新初始化参数的情况下无法学习新任务。我们表明，这种失败之前在新任务初始化时会出现Hessian谱坍缩，其中有意义的曲率方向消失，梯度下降变得无效。通过分析线性化ReLU网络，我们推导出成功训练的显式$ε$-秩条件，并证明损失加权Gram矩阵在谱上与广义高斯-牛顿近似等价，从而将NTK动力学与Hessian曲率联系起来。直接针对谱坍缩，我们讨论了Hessian的Kronecker因子近似，这激发了两种正则化增强：保持高有效特征秩和应用L2惩罚。在持续监督学习和强化学习任务上的实验证实，结合这两种正则化器可以有效保持塑性。

英文摘要

We investigate why deep neural networks suffer from loss of plasticity in continual learning, and thus fail to learn new tasks without reinitializing parameters. We show that this failure is preceded by Hessian spectral collapse at new-task initialization, where meaningful curvature directions vanish and gradient descent becomes ineffective. Analyzing a linearized ReLU network, we derive explicit $ε$-rank conditions for successful training and prove that the loss-weighted Gram matrix is spectrally equivalent to the Generalized Gauss-Newton approximation, thereby relating NTK dynamics to Hessian curvature. Targeting spectral collapse directly, we then discuss the Kronecker factored approximation of the Hessian, which motivates two regularization enhancements: maintaining high effective feature rank and applying L2 penalties. Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.

URL PDF HTML ☆

赞 0 踩 0

2509.19452 2026-06-01 cs.RO cs.CV cs.LG 版本更新

HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames

HUNT：通过瞬时相对帧在非结构化环境中进行高速无人机导航与跟踪

Alessandro Saviolo, Jeffrey Mao, Giuseppe Loianno

发表机构 * New York University（纽约大学）； University of California Berkeley（加州大学伯克利分校）

AI总结提出HUNT框架，利用瞬时相对帧统一搜索与跟踪，实现高速飞行和鲁棒自主性。

详情

AI中文摘要

搜索与救援任务要求无人机既能高速穿越未知的非结构化环境，又能在检测到目标后跟踪目标。在感知退化且无全局定位的情况下实现这两种能力仍是一个开放挑战。最近的相对导航工作通过将规划和控制锚定到可见的检测目标上展示了鲁棒跟踪，但在视野中没有目标时无法进行导航。我们提出了HUNT（高速无人机导航与跟踪），一个实时框架，在单一相对公式中统一了穿越、获取和跟踪。HUNT直接从机载瞬时观测量（如姿态、高度和速度）定义导航目标，从而在搜索过程中实现反应式高速飞行。一旦检测到目标，相同的感知-控制管道无缝过渡到跟踪。在茂密森林、集装箱场地以及使用车辆和人体模型的搜索与救援任务中的户外实验表明，在全局方法失败的情况下，该框架实现了鲁棒自主性。

英文摘要

Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.

URL PDF HTML ☆

赞 0 踩 0

2506.11653 2026-06-01 cs.CV cs.AI cs.LG 版本更新

DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation

DISCO: 使用条件距离相关性减轻深度学习中的偏差

Emre Kavak, Tom Nuno Wolf, Christian Wachinger

发表机构 * Technical University of Munich, Germany（慕尼黑技术大学）； Konrad Zuse School of Excellence in Reliable AI, Germany（Konrad Zuse可靠性人工智能卓越学院）； Munich Center for Machine Learning (MCML), Germany（慕尼黑机器学习中心（MCML））

AI总结提出基于反因果模型的条件独立性准则，并设计条件距离相关性的高效估计器DISCO$_m$和sDISCO，通过正则化实现梯度模型中的偏差缓解，在多个数据集上优于或媲美现有方法。

Comments Accepted to ICML 2026 (oral)

2509.06856 2026-06-01 stat.ML cs.LG cs.NA math.NA 版本更新

Sequential Least-Squares Estimators with Fast Randomized Sketching for Linear Statistical Models

用于线性统计模型的快速随机草图化序贯最小二乘估计器

Guan-Yu Chen, Dong-Yue Xie, Xi Yang

发表机构 * School of Mathematics, Nanjing University of Aeronautics and Astronautics（南京航空航天大学数学学院）

AI总结提出一种融合草图-求解与迭代草图方法的序贯最小二乘估计框架，通过逐步增大草图尺寸迭代求解子问题，高效获得高精度参数估计。

2509.00834 2026-06-01 cs.AI cs.FL cs.LG cs.LO 版本更新

Neuro-Symbolic Predictive Process Monitoring

神经符号预测性过程监控

Axel Mezini, Elena Umili, Ivan Donadello, Fabrizio Maria Maggi, Matteo Mancanelli, Fabio Patrizi

发表机构 * Faculty of Engineering, Free University of Bozen-Bolzano（博洛尼亚-博尔扎诺自由大学工程学院）； Department of Computer, Control and Management Engineering, Sapienza, Università di Roma（罗马大学计算机、控制与管理工程系）

AI总结提出一种结合数据驱动学习与时序逻辑先验知识的神经符号方法，通过可微逻辑损失函数训练自回归序列预测器，以提升业务过程管理中后缀预测的准确性和逻辑一致性。

详情

AI中文摘要

基于辛神经网络的哈密顿动力学降阶建模

Yongsheng Chen, Wei Guo, Qi Tang, Xinghui Zhong

发表机构 * School of Mathematical Sciences, Zhejiang University（浙江大学数学科学学院）； Department of Mathematics and Statistics, Texas Tech University（德克萨斯理工大学数学与统计系）； School of Computational Science and Engineering, Georgia Institute of Technology（佐治亚理工学院计算科学与工程学院）

AI总结提出一种数据驱动的辛诱导降阶建模框架，通过统一端到端神经架构同时发现潜空间和学习动力学，确保降阶模型精确保持辛结构，提升长期稳定性和保真度。

详情

AI中文摘要

我们为高维哈密顿系统引入了一种新颖的数据驱动辛诱导降阶建模框架，该框架在单个端到端神经架构中统一了潜空间发现和动力学学习。编码器-解码器由Henon神经网络构建，并可增加线性SGS-反射层，从而在全相空间和潜相空间之间产生精确的辛映射。潜动力学由作为HenonNet实现的辛流映射推进。这种统一的神经架构确保在降阶水平上精确保持底层辛结构，显著增强所得ROM的保真度和长期稳定性。我们通过在典型哈密顿系统上的全面数值实验验证了该方法。结果表明，该方法具有准确的轨迹重建能力、训练时间范围之外的鲁棒预测性能以及精确的哈密顿量保持。这些有希望的结果强调了我们的辛ROM框架在广泛科学和工程学科中复杂动力系统的有效性和潜在适用性。

英文摘要

We introduce a novel data-driven symplectic induced-order modeling (ROM) framework for high-dimensional Hamiltonian systems that unifies latent-space discovery and dynamics learning within a single, end-to-end neural architecture. The encoder-decoder is built from Henon neural networks (HenonNets) and may be augmented with linear SGS-reflector layers. This yields an exact symplectic map between full and latent phase spaces. Latent dynamics are advanced by a symplectic flow map implemented as a HenonNet. This unified neural architecture ensures exact preservation of the underlying symplectic structure at the reduced-order level, significantly enhancing the fidelity and long-term stability of the resulting ROM. We validate our method through comprehensive numerical experiments on canonical Hamiltonian systems. The results demonstrate the method's capability for accurate trajectory reconstruction, robust predictive performance beyond the training horizon, and accurate Hamiltonian preservation. These promising outcomes underscore the effectiveness and potential applicability of our symplectic ROM framework for complex dynamical systems across a broad range of scientific and engineering disciplines.

URL PDF HTML ☆

赞 0 踩 0

2508.04457 2026-06-01 stat.ML cs.LG 版本更新

Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification

多标签胸部X光分类中的不确定性及其解缠基准测试

Simon Baur, Wojciech Samek, Jackie Ma

发表机构 * Fraunhofer Heinrich-Hertz-Institut（弗劳恩霍夫海因里希-赫兹研究所）； Technische Universität Berlin（柏林技术大学）； The Berlin Institute for the Foundations of Learning and Data (BIFOLD)（柏林学习与数据基础研究所）

AI总结本研究使用MIMIC-CXR-JPG数据集，对多标签胸部X光分类任务中的13种不确定性量化方法进行基准测试，评估了卷积和Transformer架构，并扩展了三种方法到多标签设置，揭示了不同方法和架构在不确定性估计和解缠认知与偶然不确定性方面的优缺点。

详情

AI中文摘要

可靠的不确定性量化对于医疗影像中可信赖的决策和AI模型的部署至关重要。虽然先前的工作已经探索了神经网络在合成或定义良好的数据设置（如自然图像分类）中使用信息论方法量化预测、认知和偶然不确定性的能力，但其在真实医学诊断任务中的适用性仍未得到充分探索。在本研究中，我们使用MIMIC-CXR-JPG数据集为多标签胸部X光分类提供了广泛的不确定性量化基准。我们评估了基于卷积（ResNet）和基于Transformer（Vision Transformer）架构的13种不确定性量化方法，涵盖广泛的任务。此外，我们将证据深度学习、HetClass神经网络和深度确定性不确定性扩展到多标签设置。我们的分析提供了对不确定性估计有效性以及解缠认知和偶然不确定性能力的见解，揭示了方法和架构特定的优势和局限性。

英文摘要

Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our analysis provides insights into uncertainty estimation effectiveness and the ability to disentangle epistemic and aleatoric uncertainties, revealing method- and architecture-specific strengths and limitations.

URL PDF HTML ☆

赞 0 踩 0

2508.02217 2026-06-01 cs.LG 版本更新

Population-Free Pareto Tracking for Sample-Efficient Multi-Policy MORL

无种群的帕累托跟踪：面向样本高效的多策略多目标强化学习

Zeyu Zhao, Yueling Che, Kaichen Liu, Jian Li, Junmei Yao

发表机构 * College of Computer Science and Software Engineering, Shenzhen University, China（深圳大学计算机科学与软件工程学院）

AI总结提出MPFT框架，通过无自进化种群的帕累托跟踪机制，结合单目标极端策略初始化，高效逼近完整帕累托前沿，显著提升样本效率并减少智能体-环境交互。

Comments 37 pages, 10 figures, ICML26 accepted paper

详情

AI中文摘要

多目标强化学习（MORL）是涉及多个冲突标准的现实世界决策问题的基本框架。现有的多策略（MP）方法通常依赖于维护大型策略种群的在线进化框架，导致高样本复杂性和过多的智能体-环境交互。为了缓解这些限制，我们提出了多策略帕累托前沿跟踪（MPFT），一种无需自进化种群的框架。它利用高效的帕累托跟踪机制，以单目标极端策略初始化来追踪帕累托前沿，并进一步加密稀疏区域以实现对完整帕累托前沿的精确近似。MPFT可以无缝集成先进的离线MORL算法，从而显著提高样本效率。我们在最多三个目标的六个机器人控制任务和超过三个目标的三个高维任务上评估了MPFT。实验结果表明，MPFT在超体积和期望效用方面优于最先进的基线。它还显著减少了智能体-环境交互。这些结果进一步证明，MPFT是一个通用框架，可以无缝集成在线和离线MORL算法。

英文摘要

Multi-objective reinforcement learning (MORL) is a fundamental framework for real-world decision-making problems involving multiple conflicting criteria. Existing multi-policy (MP) methods typically rely on online evolutionary frameworks that maintain large policy populations, leading to high sample complexity and excessive agent-environment interactions. To mitigate these limitations, we present Multi-policy Pareto Front Tracking (MPFT), a framework without a self-evolving population. It leverages an efficient Pareto-tracking mechanism initialized with single-objective extreme policies to trace the Pareto front, and further densifies sparse regions to achieve an accurate approximation of the full Pareto front. MPFT can be seamlessly integrated with advanced offline MORL algorithms, thereby substantially improving sample efficiency. We evaluate MPFT on six robotic control tasks with up to three objectives and three high-dimensional tasks with more than three objectives. Experimental results show that MPFT outperforms state-of-the-art baselines in terms of hypervolume and expected utility. It also significantly reduces agent-environment interactions. These results further demonstrate that MPFT serves as a general-purpose framework that can seamlessly integrate both online and offline MORL algorithms.

URL PDF HTML ☆

赞 0 踩 0

2507.17026 2026-06-01 stat.ML cs.LG 版本更新

Conformal C2ST: Turning weak classifiers into strong two-sample tests

Conformal C2ST：将弱分类器转化为强双样本检验

Vansh Bansal, Tianyu Chen, James G. Scott

发表机构 * Department of Statistics and Data Sciences, University of Texas at Austin, United States（统计与数据科学系，德克萨斯大学奥斯汀分校，美国）

AI总结本文提出基于共形预测的C2ST变体，使任意弱分类器都能产生精确有限样本p值，实现可控第一类错误和温和退化的检验功效，并应用于神经后验估计验证。

详情

AI中文摘要

双样本检验问题是统计学和机器学习中的一项基本任务，旨在判断来自潜在分布$p$和$q$的两组样本是否实际上同分布（即$p=q$）。一种流行且直观的方法是分类器双样本检验（C2ST），其中训练一个分类器来区分来自$p$和$q$的样本。然而，尽管C2ST简单，其可靠性依赖于接近贝叶斯最优的分类器，这一要求很少满足且难以验证。这引发了一个重要的开放问题：弱分类器是否仍能用于双样本检验？我们证明答案是肯定的。基于Hu和Lei（2024）的工作，我们分析了C2ST的两种共形变体，它们将任何训练好的分类器（即使是弱的、有偏的或过拟合的）的分数转化为精确的有限样本p值。我们建立了共形C2ST的两个关键理论性质：（i）有限样本第一类错误控制，以及（ii）非平凡的功效，该功效随训练分类器误差的增加而温和退化。结果是，即使是表现不佳的分类器也能产生强大且可靠的双样本检验。这一通用框架在贝叶斯推断中找到了强大的应用，特别是在验证神经后验估计（NPE）模型时，其中比较学习到的后验近似$q(θ\mid y)$与真实后验$p(θ\mid y)$的任务可以表述为双样本检验。实验上，共形C2ST在此任务的广泛基准测试中优于经典判别检验。我们的结果确立了共形C2ST作为一种实用、理论基础的诊断工具。

英文摘要

The two-sample testing problem, a fundamental task in statistics and machine learning, seeks to determine whether two sets of samples, drawn from underlying distributions $p$ and $q$, are in fact identically distributed (i.e. whether $p=q$). A popular and intuitive approach is the classifier two-sample test (C2ST), where a classifier is trained to distinguish between samples from $p$ and $q$. Yet despite simplicity of the C2ST, its reliability hinges on access to a near-Bayes-optimal classifier, a requirement that is rarely met and difficult to verify. This raises a major open question: can a weak classifier still be useful for two-sample testing? We show that the answer is a definitive yes. Building on the work of Hu and Lei (2024), we analyze two conformal variants of the C2ST that convert the scores from any trained classifier -- even if weak, biased, or overfit -- into exact, finite-sample p-values. We establish two key theoretical properties of the conformal C2ST: (i) finite-sample Type-I error control, and (ii) non-trivial power that degrades gently in tandem with the error of the trained classifier. The upshot is that even poorly performing classifiers can yield powerful and reliable two-sample tests. This general framework finds a powerful application in Bayesian inference, particularly for validating Neural Posterior Estimation (NPE) models, where the task of comparing a learned posterior approximation $q(θ\mid y)$ to the true posterior $p(θ\mid y)$ can be framed as a two-sample test. Empirically, the Conformal C2ST outperforms classical discriminative tests across a wide range of benchmarks for this task. Our results establish the conformal C2ST as a practical, theoretically grounded diagnostic tool.

URL PDF HTML ☆

赞 0 踩 0

2506.03779 2026-06-01 quant-ph cs.LG stat.ML 版本更新

Position: Quantum Kernel Machines Should Move Beyond Scalar-Valued Kernels to Realize Their Potential

立场：量子核机器应超越标量值核以实现其潜力

Hachem Kadri, Joachim Tomasi, Yuka Hashimoto, Sandrine Anthoine

发表机构 * Aix-Marseille University, CNRS, LIS, Marseille, France（艾克斯-马赛大学，法国国家科学研究中心，LIS实验室，马赛，法国）； Aix-Marseille University, CNRS, I2M, Marseille, France（艾克斯-马赛大学，法国国家科学研究中心，I2M实验室，马赛，法国）； NTT, Inc., Tokyo, Japan（日本NTT公司，东京）

AI总结本文主张量子核机器应转向算子值核等更富表达力的框架，以利用纠缠和非交换结构处理复杂结构化预测问题，并通过初步概念验证展示其优势。

详情

Journal ref: ICML 2026

AI中文摘要

基于量子力学原理构建的量子核函数已成为量子机器学习的核心。最近的研究表明，当从经典数据学习时，量子核无法提供显著的计算或统计优势，这削弱了最初对量子核机器的热情。然而，该领域的大多数研究都集中在标准分类或回归设置中的标量值核上，而经典核方法在这些设置中已经高效且有效，留给量子核改进的空间很小。在这篇立场论文中，我们认为该领域的进展需要超越标量值核，转向更富表达力的核框架。标量值核缺乏充分利用纠缠等内在量子资源所需的自由度，并且不足以处理经典学习方法难以应对的复杂学习任务。基于算子值核学习和$C^*$-代数核表示的最新进展，我们提出了一条设计能够利用纠缠和非交换结构来处理复杂结构化预测问题的量子核的路线图。为了支持这一观点，我们展示了一个初步的概念验证，说明量子算子值核公式如何揭示标量值核方法难以访问的结构依赖性。这一焦点的转移可能为新一代量子核机器及其潜在优势的更忠实探索开辟道路。

英文摘要

Quantum kernel functions built using quantum-mechanical principles and have emerged as a centerpiece of quantum machine learning. The initial enthusiasm for quantum kernel machines has been tempered by recent studies suggesting that quantum kernels could not offer significant computational or statistical advantages when learning from classical data. However, most of the research in this area has been devoted to scalar-valued kernels in standard classification or regression settings for which classical kernel methods are efficient and effective, leaving very little room for improvement with quantum kernels. In this position paper, we argue that progress in this field requires moving beyond scalar-valued kernels toward more expressive kernel frameworks. Scalar-valued kernels lack the degrees of freedom necessary to fully exploit intrinsically quantum resources such as entanglement and are not rich enough to deal with complex learning tasks where classical learning methods struggle. Building on recent advances in operator-valued kernel learning and $C^*$-algebraic kernel representations, we propose a roadmap for designing quantum kernels capable of leveraging entanglement and non-commutative structures to tackle complex structured prediction problems. To support this viewpoint, we present an initial proof-of-concept illustrating how quantum operator-valued kernel formulations can reveal structural dependencies that remain difficult to access for scalar-valued kernel methods. This shift in focus could open a pathway toward a new generation of quantum kernel machines and a more faithful exploration of their potential advantages.

URL PDF HTML ☆

赞 0 踩 0

2505.22934 2026-06-01 cs.CL cs.AI cs.LG 版本更新

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

解开LoRA干扰：用于鲁棒模型合并的正交子空间

Haobo Zhang, Jiayu Zhou

发表机构 * University of Michigan Ann Arbor（密歇根大学安娜堡分校）

AI总结针对LoRA微调模型合并时性能下降的问题，提出通过微调前约束LoRA子空间正交性来减少任务间干扰的方法OSRM，可无缝集成现有合并算法，提升合并性能并保持单任务准确率。

Comments 14 pages, 5 figures, 16 tables, accepted by ACL 2025

详情

AI中文摘要

针对单个任务微调大型语言模型（LM）虽然性能强劲，但部署和存储成本高昂。近期研究探索模型合并，将多个任务特定模型组合成单个多任务模型，无需额外训练。然而，现有合并方法对于使用低秩适应（LoRA）微调的模型往往失败，导致性能显著下降。本文表明，这一问题源于模型参数与数据分布之间先前被忽视的相互作用。我们提出用于鲁棒模型合并的正交子空间（OSRM），在微调*之前*约束LoRA子空间，确保与一个任务相关的更新不会对其他任务的输出产生不利偏移。我们的方法可以无缝集成到大多数现有合并算法中，减少任务间的意外干扰。在八个数据集上使用三种广泛使用的LM和两种大型LM进行的广泛实验表明，我们的方法不仅提升了合并性能，还保持了单任务准确率。此外，我们的方法对合并的超参数表现出更强的鲁棒性。这些结果突显了数据-参数交互在模型合并中的重要性，并为合并LoRA模型提供了一种即插即用的解决方案。

英文摘要

Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.

URL PDF HTML ☆

赞 0 踩 0

2505.20840 2026-06-01 cs.LG 版本更新

Aggregation Buffer: Revisiting DropEdge with a New Parameter Block

聚合缓冲区：用新参数块重新审视 DropEdge

Dooho Lee, Myeong Kong, Sagad Hamid, Cheonwoo Lee, Jaemin Yoo

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea（韩国釜山国立大学电气工程学院）； Computer Science Department, University of Münster, Münster, Germany（德国穆斯堡大学计算机科学系）

AI总结针对 DropEdge 在监督学习中性能受限的问题，提出一种名为 Aggregation Buffer 的参数块，通过改进 GNN 的鲁棒性来提升性能，并统一解决度偏差和结构差异等问题。

Comments Published at ICML 2025

详情

AI中文摘要

我们重新审视了 DropEdge，这是一种用于 GNN 的数据增强技术，通过在训练过程中随机移除边来暴露多样化的图结构。虽然这是一种有效减少对图中特定连接过拟合的有前途的方法，但我们观察到其在监督学习任务中的潜在性能提升非常有限。为了理解原因，我们提供了理论分析，表明 DropEdge 的有限性能来自于许多 GNN 架构中存在的根本性限制。基于此分析，我们提出了 Aggregation Buffer，这是一个专门设计的参数块，通过解决 DropEdge 的限制来提高 GNN 的鲁棒性。我们的方法与任何 GNN 模型兼容，并在多个数据集上展示了一致的性能提升。此外，我们的方法作为统一解决方案，有效解决了度偏差或结构差异等众所周知的问题。代码和数据集可在 https://github.com/dooho00/agg-buffer 获取。

英文摘要

We revisit DropEdge, a data augmentation technique for GNNs which randomly removes edges to expose diverse graph structures during training. While being a promising approach to effectively reduce overfitting on specific connections in the graph, we observe that its potential performance gain in supervised learning tasks is significantly limited. To understand why, we provide a theoretical analysis showing that the limited performance of DropEdge comes from the fundamental limitation that exists in many GNN architectures. Based on this analysis, we propose Aggregation Buffer, a parameter block specifically designed to improve the robustness of GNNs by addressing the limitation of DropEdge. Our method is compatible with any GNN model, and shows consistent performance improvements on multiple datasets. Moreover, our method effectively addresses well-known problems such as degree bias or structural disparity as a unifying solution. Code and datasets are available at https://github.com/dooho00/agg-buffer.

URL PDF HTML ☆

赞 0 踩 0

2411.13865 2026-06-01 cs.IR cs.AI cs.CL cs.LG 版本更新

Breaking Information Cocoons: A Hyperbolic Framework for Balancing Exploration and Exploitation in Recommender Systems

打破信息茧房：推荐系统中平衡探索与利用的双曲框架

Qiyao Ma, Menglin Yang, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying

发表机构 * University of California, Davis（加州大学戴维斯分校）； The Hong Kong University of Science（香港科学大学）； Snap Inc.（Snap公司）； Yale University（耶鲁大学）

AI总结提出双曲框架HERec，通过语义增强的层次机制和自动层次聚类，在推荐系统中平衡探索与利用，有效缓解信息茧房。

Comments Accepted to KDD 2026. Code: https://github.com/Martin-qyma/HERec

详情

AI中文摘要

现代推荐系统常常形成信息茧房，限制用户接触多样化内容。核心挑战在于平衡内容探索与利用，同时允许用户调整推荐偏好。理想情况下，这种平衡可以通过层次表示来捕捉，其中深度搜索促进利用，广度搜索促进探索。然而，现有方法面临两个基本限制：欧几里得方法难以捕捉层次结构，而双曲方法尽管在层次建模上表现优越，但缺乏对用户和物品画像的语义理解，且未能提供平衡探索与利用的原则性机制。为解决这些问题，我们提出HERec，一个在推荐系统中有效平衡探索与利用的双曲框架。我们的框架引入两项关键创新：（1）语义增强的层次机制，直接在双曲空间中将丰富的文本描述与协同信息对齐。理论梯度分析表明，这种对齐有效利用了底层双曲流形结构，从而更准确地建模用户和物品；（2）通过优化Dasgupta代价的自动层次聚类机制，无需预定义超参数即可发现层次结构，实现用户可调节的探索-利用权衡。大量实验表明，HERec持续优于欧几里得和双曲基线，在效用指标上提升高达5.49%，多样性指标提升11.39%，有效缓解了信息茧房。

英文摘要

Modern recommender systems often create information cocoons, restricting users' exposure to diverse content. The central challenge is to balance content exploration and exploitation while allowing users to adjust their recommendation preferences. Ideally, this balance can be captured with a hierarchical representation, where depth search facilitates exploitation and breadth search enables exploration. However, existing approaches face two fundamental limitations: Euclidean methods struggle to capture hierarchical structures, while hyperbolic methods, despite their superior hierarchical modeling, lack semantic understanding of user and item profiles and fail to provide a principled mechanism for balancing exploration and exploitation. To address these challenges, we propose HERec, a hyperbolic framework that effectively balances exploration and exploitation in recommender systems. Our framework introduces two key innovations: (1) a semantic-enhanced hierarchical mechanism that aligns rich textual descriptions with collaborative information directly in hyperbolic space. Theoretical gradient analysis demonstrates that this alignment effectively leverages the underlying hyperbolic manifold structure, resulting in more accurate modeling of users and items; (2) an automatic hierarchical clustering mechanism by optimizing Dasgupta's cost, which discovers hierarchical structures without requiring predefined hyperparameters, enabling user-adjustable exploration-exploitation trade-offs. Extensive experiments demonstrate that HERec consistently outperforms both Euclidean and hyperbolic baselines, achieving up to 5.49% improvement in utility metrics and 11.39% increase in diversity metrics, effectively mitigating information cocoons.

URL PDF HTML ☆

赞 0 踩 0

2504.10564 2026-06-01 q-bio.QM cs.LG q-bio.BM 版本更新

FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation

FLOWR: 用于结构感知的从头、基于相互作用和片段的配体生成的流匹配

Julian Cremer, Ross Irwin, Alessandro Tibo, Jon Paul Janet, Simon Olsson, Djork-Arné Clevert

发表机构 * Machine Learning & Computational Sciences, Pfizer Worldwide R&D（机器学习与计算科学，辉瑞全球研发）； Molecular AI, Discovery Sciences, R&D, AstraZeneca（分子人工智能，发现科学，研发，阿斯利康）； Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg（计算机科学与工程系，查尔姆斯理工大学和哥德堡大学）

AI总结提出FLOWR框架，通过结合连续和分类流匹配与等变最优传输，并利用高效蛋白口袋条件化，实现三维配体的生成与优化，在有效性、姿态精度和相互作用恢复上超越现有方法，推理速度提升高达70倍。

详情

AI中文摘要

我们介绍了FLOWR，一个新颖的基于结构的框架，用于三维配体的生成和优化。FLOWR将连续和分类流匹配与等变最优传输相结合，并通过高效的蛋白口袋条件化增强。与FLOWR一起，我们提出了SPINDR，一个精心策划的数据集，包含配体-口袋共晶复合物，专门用于解决现有数据质量问题。实证评估表明，FLOWR在PoseBusters有效性、姿态精度和相互作用恢复方面超越了当前最先进的基于扩散和流的方法，同时提供了显著的推理加速，性能提升高达70倍。此外，我们引入了FLOWR:multi，一个高精度的多用途模型，允许针对性地采样符合预定义相互作用谱和化学子结构的新配体，用于基于片段的设计，无需重新训练或任何重采样策略。

英文摘要

We introduce FLOWR, a novel structure-based framework for the generation and optimization of three-dimensional ligands. FLOWR integrates continuous and categorical flow matching with equivariant optimal transport, enhanced by an efficient protein pocket conditioning. Alongside FLOWR, we present SPINDR, a thoroughly curated dataset comprising ligand-pocket co-crystal complexes specifically designed to address existing data quality issues. Empirical evaluations demonstrate that FLOWR surpasses current state-of-the-art diffusion- and flow-based methods in terms of PoseBusters-validity, pose accuracy, and interaction recovery, while offering a significant inference speedup, achieving up to 70-fold faster performance. In addition, we introduce FLOWR:multi, a highly accurate multi-purpose model allowing for the targeted sampling of novel ligands that adhere to predefined interaction profiles and chemical substructures for fragment-based design without the need of re-training or any re-sampling strategies

URL PDF HTML ☆

赞 0 踩 0

2502.15224 2026-06-01 cs.LG cs.AI 版本更新

Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery

自动发现基准：在Oracle引导发现中诊断结构化状态追踪

Tingting Chen, Beibei Lin, Srinivas Anumasa, Vedant Shah, Zifeng Yuan, Qiran Zou, Anirudh Goyal, Dianbo Liu

发表机构 * National University of Singapore（国立新加坡大学）； Mila-Quebec AI institute（魁北克AI研究院）； Meta Superintelligence Labs（Meta超智能实验室）

AI总结提出Auto-Discovery-Bench基准，通过确定性Oracle引导的假设-干预-反馈循环，诊断智能体在结构化状态追踪中的能力瓶颈。

Comments 13 pages

详情

AI中文摘要

交互式发现要求智能体在多轮反馈中维护和更新结构化信念。在评估智能体于嘈杂、开放的科学环境中的表现之前，有必要在受控条件下隔离这一先决能力。我们引入了Auto-Discovery-Bench，一个确定性的Oracle引导诊断基准，其中智能体通过重复的假设-干预-反馈循环恢复隐藏结构。该基准实例化了三种受控发现抽象：有向图发现、无向关系发现和符号方程发现。在所有模型中，性能随着变量数量、轨迹长度和干扰项的增加而下降。一个独立的轨迹追踪诊断表明，即使移除了干预选择和假设生成，许多失败仍然存在，这表明在维护和整合长程结构化信息方面的限制是Oracle引导发现的重要瓶颈。Auto-Discovery-Bench并非旨在取代真实的发现环境；相反，它提供了一个可重复、低混淆的诊断测试平台，用于隔离交互式科学智能体的先决能力。

英文摘要

Interactive discovery requires agents to maintain and update structured beliefs over many rounds of feedback. Before evaluating agents in noisy, open-ended scientific environments, it is useful to isolate this prerequisite capability under controlled conditions. We introduce Auto-Discovery-Bench, a deterministic oracle-guided diagnostic benchmark in which agents recover hidden structures through repeated hypothesis--intervention--feedback cycles. The benchmark instantiates three controlled discovery abstractions: directed graph discovery, undirected relational discovery, and symbolic equation discovery. Across models, performance degrades as the number of variables, trajectory length, and distractors increase. A separate trajectory-tracking diagnostic shows that many failures persist even when intervention selection and hypothesis generation are removed, suggesting that limitations in maintaining and integrating long-range structured information are an important bottleneck for oracle-guided discovery. Auto-Discovery-Bench is not intended to replace realistic discovery environments; rather, it provides a reproducible, low-confound diagnostic testbed for isolating a prerequisite capability for interactive scientific agents.

URL PDF HTML ☆

赞 0 踩 0

2502.04671 2026-06-01 cs.AI cs.LG cs.LO cs.PL 版本更新

元学习的进展与挑战：技术综述

Anna Vettoruzzo, Mohamed-Rafik Bouguelia, Joaquin Vanschoren, Thorsteinn Rögnvaldsson, KC Santosh

发表机构 * Automated Machine Learning Group, Eindhoven University of Technology, Netherlands（埃因霍温理工大学自动化机器学习小组）； Applied AI Research Lab, Department of Computer Science, University of South Dakota, USA（南达科他大学计算机科学系应用人工智能研究实验室）

AI总结本文全面综述元学习技术，探讨其与多任务学习、迁移学习等领域的关联，并指出未来研究方向。

详情

AI中文摘要

元学习使学习系统能够从多个任务中获取知识，从而更快地适应和泛化到新任务。本综述对元学习进行了全面的技术概述，强调了其在数据稀缺或获取成本高的实际应用中的重要性。本文涵盖了最先进的元学习方法，并探讨了元学习与多任务学习、迁移学习、领域适应与泛化、自监督学习、个性化联邦学习和持续学习之间的关系。通过突出这些主题与元学习领域之间的协同作用，本文展示了某一领域的进展如何惠及整个领域，同时避免不必要的重复工作。此外，本文深入探讨了高级元学习主题，例如从复杂的多模态任务分布中学习、无监督元学习、学习有效适应数据分布变化以及持续元学习。最后，本文指出了该领域未来研究的开放问题和挑战。通过综合最新的研究进展，本文提供了对元学习及其对各种机器学习应用潜在影响的深入理解。我们相信，这篇技术综述将有助于元学习的进步及其在解决实际问题中的实际应用。

英文摘要

Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, self-supervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised meta-learning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing real-world problems.

URL PDF HTML ☆

赞 0 踩 0

1709.08894 2026-06-01 stat.ML cs.LG 版本更新

On the regularization of Wasserstein GANs

关于Wasserstein GANs的正则化

Henning Petzka, Asja Fischer, Denis Lukovnikov

发表机构 * Fraunhofer Institute IAIS（弗劳恩霍夫研究所IAIS）； Department of Computer Science, University of Bonn（波恩大学计算机科学系）

AI总结本文研究Wasserstein GANs中Lipschitz约束的正则化方法，通过理论分析和实验证明使用较弱的正则化项优于权重裁剪。

Comments Published as a conference paper at ICLR 2018. * Henning Petzka and Asja Fischer contributed equally to this work (11 pages +13 pages appendix)