赞 0 踩 0

2606.18326 2026-06-18 cs.LG 新提交

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

基于重正化群神经网络的类别不平衡故障诊断

Evgeny Nikulchev, Dmitry Ilin

发表机构 * MIREA – Russian Technological University（莫斯科俄罗斯技术大学）

AI总结提出RGNet，一种基于重正化群概念的神经网络架构，通过层次化粗粒化特征空间处理类别不平衡和多维噪声，在AI4I数据集上验证了其有效性。

Comments 8 pages

2606.18388 2026-06-18 cs.LG cs.AI cs.CL cs.MA 新提交

InTrain: 面向零成本神经架构搜索的内在可训练性

Qinqin Zhou, Fuhai Chen, Jipeng Wu, Zhiwei Chen, Zhikai Hu, Weiwei Cai

发表机构 * School of Computer and Data Science, Fuzhou University（福州大学计算机与数据科学学院）； School of Computer and Data Science, Minjiang University（闽江学院计算机与数据科学学院）； School of Artificial Intelligence, Nanchang University（南昌大学人工智能学院）； Department of Computer Science, Hong Kong Baptist University（香港浸会大学计算机科学系）； School of Interdisciplinary Medicine and Engineering, Harbin Medical University（哈尔滨医科大学跨学科医学与工程学院）

AI总结提出统一理论代理InTrain，通过几何容量和优化韧性两个协同成分形式化架构的可训练性，在NAS基准上达到与集成方法相当的排序相关性。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

AI中文摘要

免训练神经架构搜索有望在不进行昂贵训练的情况下高效发现高性能网络。然而，现有的零成本代理依赖于碎片化的启发式方法，未能捕捉基本问题：是什么使一个架构具有可训练性？本文引入内在可训练性（InTrain），一个统一的理论代理，将可训练性形式化为由两个协同成分——几何容量和优化韧性——涌现出的架构不变性。我们通过分析神经信息处理来操作化内在可训练性。几何容量通过激活协方差特征谱的参与比量化，捕捉表示流形的有效维度。优化韧性通过累积梯度健康度测量，评估跨网络深度的反向传播鲁棒性。InTrain通过尺度不变的乘法耦合综合这些维度，我们假设这对于捕捉它们协同、非加性的关系至关重要。在标准NAS基准和搜索空间上的大量实验表明，InTrain达到了与最先进的基于集成的代理相当的排序相关性，并优于其他单指标方法。

英文摘要

Training-free neural architecture search promises efficient discovery of high-performance networks without costly training. However, existing zero-cost proxies rely on fragmented heuristics that fail to capture the fundamental question: what makes an architecture trainable? This paper introduces Intrinsic Trainability (InTrain), a unified theoretical proxy that formalizes trainability as an architectural invariant emerging from two synergistic components: geometric capacity and optimization resilience. We operationalize intrinsic trainability through analysis of neural information processing. Geometric capacity is quantified via the participation ratio of activation covariance eigenspectrum, capturing the effective dimensionality of representation manifolds. Optimization resilience is measured through cumulative gradient health, assessing the robustness of backpropagation across network depth. InTrain synthesizes these dimensions through a scale-invariant multiplicative coupling, which we hypothesize is essential for capturing their synergistic, non-additive relationship. Extensive experiments on standard NAS benchmarks and search spaces demonstrate that InTrain achieves ranking correlations on par with state-of-the-art ensemble-based proxies and outperforms other single-metric methods.

URL PDF HTML ☆

赞 0 踩 0

2606.18694 2026-06-18 cs.LG cond-mat.dis-nn cs.CL cs.NE nlin.AO 新提交

Attention as Frustrated Synchronization

注意力作为受挫同步

Joshua Nunley

发表机构 * Cognitive Science Program（认知科学项目）； Luddy School of Informatics, Computing, and Engineering（信息学、计算与工程学院）； Indiana University Bloomington（印第安纳大学布卢明顿分校）

AI总结提出受挫同步网络（FSN），通过复值耦合核和延迟项实现基于同步的注意力机制，在百万参数级字符级文本和代码任务上优于调优的RoPE-SwiGLU Transformer。

Comments 25 pages, 4 figures. Preliminary report at the 1-10M parameter scale

详情

AI中文摘要

一个完美同步的振荡器网络无法进一步计算，因此基于同步构建的注意力架构必须将其计算定位在结构性的偏离一致中。我们引入了受挫同步网络（FSN），其令牌状态是环面上的相位，整个值通路是一个学习到的复值耦合核，包含谐波和一步延迟。核的每个分量在同步文献意义上都是一个受挫。复相位是静态的Kuramoto-Sakaguchi受挫角，带符号的谐波是排斥性的Daido分量，而延迟项（将每个令牌与其关注的令牌的后继耦合）在代数上与Kuramoto-Sakaguchi耦合相同，其受挫角是数据自身的转移，因此下一个令牌预测被实现为由数据受挫的同步。在匹配百万参数和训练预算的字符级文本和代码任务上，FSN的验证损失在每个测量周期都低于调优的RoPE-SwiGLU Transformer，并且该比较在基线训练至收敛后仍然成立：每30个周期的enwik8种子都低于Transformer收敛的50周期损失1.611，而FSN完成的50周期运行收敛至1.5953 ± 0.0014。一种变体将每个前馈块替换为对学习到的集体模式的平均场耦合，堆栈中不保留多层感知机，其性能与Transformer相当。在自然文本上，无受挫的基础层在每个复制深度上都落后于收敛的Transformer，在长距离复制事件上最差；而核在四个及以上深度处逆转了这种劣势。标题比较在百万参数规模下进行；规模阶梯在四百万参数下完成，优势持续存在，其余分支标记为进行中。

英文摘要

A network of oscillators that synchronizes perfectly computes nothing further, so an attention architecture built from synchronization must locate its computation in structured departures from agreement. We introduce the Frustrated Synchronization Network (FSN), whose token states are phases on a torus and whose entire value pathway is one learned complex coupling kernel over harmonics and a one-step delay. Each component of the kernel is a frustration in the sense of the synchronization literature. The complex phases are static Kuramoto-Sakaguchi frustration angles, the signed harmonics are repulsive Daido components, and the delay term, which couples each token to the successors of the tokens it attends to, is algebraically identical to Kuramoto-Sakaguchi coupling whose frustration angle is the data's own transition, so next-token prediction is implemented as synchronization frustrated by the data. At matched one-million-parameter and training budgets on character-level text and code, the FSN's validation loss is below a tuned RoPE-SwiGLU transformer's at every epoch measured, and the comparison survives training the baseline to convergence: every thirty-epoch enwik8 seed finishes below the transformer's converged fifty-epoch loss of 1.611, and the FSN's completed fifty-epoch runs converge to 1.5953 +/- 0.0014. A variant with every feed-forward block replaced by mean-field coupling to learned collective modes, leaving no multilayer perceptron in the stack, tracks the transformer. On natural text the unfrustrated base layer falls behind the converged transformer at every copy depth, worst on long-range copy events; the kernel reverses the deficit at every depth of four and beyond. Headline comparisons are at the one-million-parameter scale; a scale ladder is complete through four million parameters with the advantage persisting, and remaining arms are marked as in progress.

URL PDF HTML ☆

赞 0 踩 0

2606.18844 2026-06-18 cs.LG 新提交

INDEQS: 信息引导的神经控制微分方程

Michael Detzel, Gabriel Nobis, Kristiyan Blagov, Juri Schubert, Jackie Ma, Wojciech Samek

AI总结提出INDEQS，一种基于图的NCDE预测方法，通过在不同架构位置注入有向图先验知识，结合内外混合机制和自适应图卷积，在合成和真实任务中优于无信息NCDE。

详情

AI中文摘要

神经控制微分方程（NCDE）为时间序列预测提供了强大的连续时间框架，但标准的基于图的扩展通常纯粹从数据中学习空间结构，即使在已知有向图结构的情况下也是如此。我们引入了信息引导的神经控制微分方程（INDEQS），这是一种基于图的NCDE预测方法，在特定的架构位置融入有向图的先验知识。INDEQS将隐藏状态在图节点上的内部混合与向量场和控制之间的外部混合分开，并提供了一种轻量级的图约束变体和一种更具表现力的变体，通过自适应图卷积从数据中学习额外的图连接。为了系统研究图信息在预测中的有益时机，我们在有向图上设计了一个连续平流模拟，生成了具有已知真实流结构的合成时空数据集。然后，我们在两个实际任务上评估INDEQS：水文网络上的河流流量预测和PeMS08上的交通流预测。在这些合成和真实基准测试中，外部信息引导在参数数量相当的情况下，持续改善了无信息NCDE的平均绝对误差，尤其是在较大图上，而内部信息引导在需要严格遵循已知邻接时提供了一种更参数高效的替代方案。离散卷积和连续时间解码器的比较进一步表明，连续解码器在实际任务中提供了更好的准确性和更大的时间灵活性。INDEQS和平流模拟的实现可在以下网址获取：此 https URL。

英文摘要

Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely from data, even in settings where a directed graph structure is known a priori. We introduce Informed Neural controlled Differential EQuationS (INDEQS), a graph-based NCDE forecasting method that incorporates prior knowledge of a directed graph at distinct architectural positions. INDEQS separates inner mixing of hidden states across graph nodes from outer mixing between vector field and control, and offers both a lightweight graph-constrained variant and a more expressive variant, learning additional graph connections from data via adaptive graph convolutions. To systematically study when graph informedness is beneficial in forecasting, we devise a continuous advection simulation on directed graphs, yielding synthetic spatio-temporal datasets with known ground-truth flow structure. We then evaluate INDEQS on two real-world tasks: river discharge forecasting on a hydrological network and traffic flow prediction on PeMS08. Across these synthetic and real-world benchmarks, outer informedness consistently improves mean absolute error over an uninformed NCDE with comparable parameter count, particularly on larger graphs, while inner informedness offers a more parameter-efficient alternative when strict adherence to a known adjacency is desired. A comparison of discrete convolutional and continuous-time decoders further shows that continuous decoders yield better accuracy and greater temporal flexibility on real-world tasks. An implementation of INDEQS and the advection simulation is available at https://github.com/Mitchi1/indeqs.

URL PDF HTML ☆

赞 0 踩 0

2606.18275 2026-06-18 cs.ET cond-mat.mtrl-sci cs.LG 交叉投稿

A physical adaptive material motor unit neural network: a hygromorph composite material machine

一种物理自适应材料运动单元神经网络：潮致变形复合材料机器

Charles de Kergariou, David Correa, Adam W. Perriman, Helmut Hauser, Fabrizio Scarpa

发表机构 * Bristol Composites Institute, School of Civil, Aerospace and Mechanical Engineering, University of Bristol（布里斯托尔复合材料研究所，土木、航空航天与机械工程学院，布里斯托尔大学）； School of Architecture, University of Waterloo（滑铁卢大学建筑学院）； Research School of Chemistry and John Curtin School of Medical Research, Australian National University（化学研究学校和约翰·库廷医学研究学院，澳大利亚国立大学）； School of Cellular and Molecular Medicine, University of Bristol（细胞与分子医学学院，布里斯托尔大学）； School of Engineering Mathematics and Technology, University of Bristol（工程数学与技术学院，布里斯托尔大学）； Bristol Robotics Lab, Bristol, United Kingdom（布里斯托尔机器人实验室，布里斯托尔，英国）

AI总结提出一种基于木材和炭黑复合材料的物理自适应运动单元神经网络，通过数据感知反向传播训练，实现动态遮阳控制，并能随数据库扩展增量学习。

Comments 35 pages, 16 figures

详情

AI中文摘要

新型材料科学的进步使得结构能够通过将记忆和学习能力直接嵌入材料来充当智能机器。我们的工作介绍了一种物理自适应材料运动单元神经网络，利用由木材和炭黑基复合材料组成的新一代可控执行器，这些执行器对温度和相对湿度敏感。这些材料执行器被组装成一种类似肌肉收缩触发的运动单元结构，形成一种能够进行动态遮阳控制的智能机器，例如可用于建筑物。该机器由一个神经网络控制，该网络在超过350个在不同环境条件下收集的实验数据点上进行训练。通过建立一种新的数据感知反向传播训练，我们展示了该机器能够预测遮阳响应，并随着数据库的扩展逐步学习预测适当的行为。我们还展示了该机器优化配置以在两种不同条件下实现相似遮阳输出的能力。

英文摘要

Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.18305 2026-06-18 math.NA cs.LG cs.NA 交叉投稿

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

起始迭代神经算子：面向高保真正问题和逆问题的统一架构

Kuilin Qin, Lianfang Wang, Xu Sun, Jiwei Jia, Yu Wang, Yong Wang, Yuping Duan

发表机构 * School of Mathematical Sciences, Beijing Normal University（北京师范大学数学科学学院）； School of Mathematics, Jilin University（吉林大学数学学院）； Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang（浙江省数字医疗诊断技术重点实验室）； School of Physics, Nankai University（南开大学物理学院）

AI总结提出起始迭代神经算子（SINO），通过神经网络重解释传统迭代方法的初始化与迭代格式，实现频谱-时空协同建模，在Navier-Stokes方程、声波方程等正逆问题中提升数值精度与泛化能力。

详情

AI中文摘要

算子学习是一个新兴的交叉学科领域，融合了机器学习与科学计算。通过映射无限维函数空间，该方法为高维偏微分方程（PDE）提供了高效的代理建模框架。与传统数值求解器相比，它在计算复杂度和逼近精度之间实现了更优的权衡，在实时预测和参数扫描等多查询任务中展现出显著优势。鉴于正演模拟和反演推理对精度的严格要求，以及现有算子学习方法在处理复杂边界或长期演化时的精度瓶颈，我们提出了起始迭代神经算子（SINO）。我们的框架通过神经网络重新诠释传统迭代方法的初始化策略和迭代格式，建立了一种高效的频谱-时空协同建模方法。具体而言，频域初始化模块捕获全局稳定的低频特征，而时域学习模块专注于优化局部解残差，从而有效克服了传统单域建模方法的内在局限性。在典型动力系统（如Navier-Stokes方程和声波方程）以及实际应用（包括超分辨率成像和天气预报）上的大量实验表明，SINO在数值精度、泛化能力和鲁棒性方面均取得了卓越性能。

英文摘要

Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

URL PDF HTML ☆

赞 0 踩 0

2606.18611 2026-06-18 cs.SD cs.AI cs.LG stat.ML 交叉投稿

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company（朝日新闻社）； Tokyo Woman's Christian University（东京女子基督教大学）

AI总结提出参数高效的QC-GAN，结合四元数Conformer生成器和MetricGAN训练，通过汉密尔顿积共享权重减少参数量，在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48，性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

2606.18759 2026-06-18 cs.CG cs.LG cs.NA math.NA 交叉投稿

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

参数曲面上类测地线曲线计算的神经网络框架

Sheng-Gwo Chen, Chen-Chang Peng

发表机构 * Department of Applied Mathematics, National Chiayi University, Chia-Yi 600, Taiwan（国立嘉义大学应用数学系，嘉义600，台湾）

AI总结提出基于物理信息神经网络（PINNs）的框架，高效计算参数曲面上的类测地线曲线，支持多曲面系统和旋转曲面。

Comments 22 pages, 16 figures, 8 tables

2606.18837 2026-06-18 cs.MA cs.AI cs.LG 交叉投稿

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS: 演化元技能以自动生成多智能体系统

Hehai Lin, Qi Yang, Chengwei Qin

发表机构 * Ant Group（蚂蚁集团）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出Skill-MAS，通过将高层编排能力解耦为可演化的元技能，在无需参数更新的情况下实现经验保留，利用多轨迹采样和选择性反思优化元技能，在多个基准和LLM上取得显著性能提升且成本可控。

详情

AI中文摘要

基于大型语言模型（LLM）的自动多智能体系统（MAS）生成已成为处理复杂任务的关键前沿。然而，现有方法在模型能力和经验保留之间面临两难困境。推理时MAS利用冻结的尖端LLM，但重复相同搜索而不从过去经验中学习。相反，训练时MAS通过梯度更新内化经验，但受限于较小模型的低能力上限，且难以扩展到大型尖端LLM。为弥合这一差距，我们提出Skill-MAS，一种新颖的第三条路径，通过将高层编排能力概念化为可演化的元技能，将经验保留与参数更新解耦。Skill-MAS通过一个封闭优化循环来精炼这种架构知识：（1）多轨迹采样在当前元技能下为每个任务采样行为分布；（2）选择性反思自适应选择优先任务，并应用分层对比分析将系统经验蒸馏为可泛化的策略级原则。在四个复杂基准和四个不同LLM上的大量实验表明，Skill-MAS不仅实现了显著的性能提升，而且保持了良好的成本-性能权衡。进一步分析揭示，演化后的元技能高度鲁棒，并在未见任务和不同LLM之间表现出强迁移性。

英文摘要

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention. Inference-time MAS leverages frozen frontier LLMs but repeats identical searches without learning from past experience. Conversely, Training-time MAS internalizes experience via gradient updates but is constrained by the low capability ceiling of smaller models, and is hard to scale to large frontier LLMs. To bridge this gap, we propose Skill-MAS, a novel third path that decouples experience retention from parametric updates by conceptualizing the high-level orchestration capability as an evolvable Meta-Skill. Skill-MAS refines this architectural knowledge through a closed optimization loop: (1) Multi-Trajectory Rollout samples a behavioral distribution for each task under the current Meta-Skill; and (2) Selective Reflection adaptively selects priority tasks and applies hierarchical contrastive analysis to distill systemic experience into generalizable, strategy-level principles. Extensive experiments across four complex benchmarks and four distinct LLMs demonstrate that Skill-MAS not only achieves remarkable performance gains but also maintains a favorable cost-performance trade-off. Further analysis reveals that the evolved Meta-Skills are highly robust and exhibit strong transferability across unseen tasks and different LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.18853 2026-06-18 stat.ML cs.LG 交叉投稿

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核：树集成的统一表示

Nicolas Mahler

AI总结提出KPP核，通过路径度量索引森林节点，统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界，为树集成提供几何框架。

Comments 31 pages

详情

AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型，为oracle不等式和特征重要性重解释开辟了途径，但留下了一个开放问题：当通过节点而非分裂索引特征映射时，森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射，并由路径度量加权，该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱：预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径，以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件，并在三种显式条件机制下陈述；鲁棒半径保证在KPP度量下是确定性的，而非原始输入的范数。回归和分类的快速率改进被推测为开放问题，并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

URL PDF HTML ☆

赞 0 踩 0

2606.19039 2026-06-18 cs.NE cs.LG cs.SD 交叉投稿

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

自适应语音到脉冲编码用于脉冲神经网络

Taharim Rahman Anon, Jakaria Islam Emon

发表机构 * PI LLC（1 PI LLC）

AI总结提出一种可学习的残差语音到脉冲编码器，与R-LIF骨干网络联合训练，在GSC-v2上达94.97%准确率，参数高效且学习任务对齐的脉冲表示。

Comments Accepted at Interspeech 2026. This version is a preprint

详情

AI中文摘要

连续声学信号与离散事件驱动处理之间的不匹配仍然是神经形态语音处理的基本瓶颈。当前系统通常依赖固定的脉冲编码器，迫使下游脉冲神经网络（SNN）补偿非自适应的输入表示。为了解决这个问题，我们提出了一种可学习的残差语音到脉冲编码器，与循环漏积分点火（R-LIF）骨干网络进行端到端联合训练。我们在Google Speech Commands v2（GSC-v2）基准上验证了该方法，达到了高达94.97%的准确率。值得注意的是，学习到的编码器仍然高度参数高效，其紧凑的35k参数变体达到了89.8%，匹配或超过了需要多一个数量级参数的先前基线。我们以编码器为中心的分析，包括线性探测和梯度残差检查，表明编码器并不追求忠实的信号重建，而是学习任务对齐的脉冲表示，增强了类别可分性。最后，我们通过比较直接反馈对齐（DFA）和替代梯度BPTT在相同架构和训练条件下的表现，对生物启发、硬件友好的信用分配进行了基准测试。我们发现DFA达到了91.5%的准确率，量化了生物启发学习规则在现代神经形态音频中的性能权衡。

英文摘要

The mismatch between continuous acoustic signals and discrete event-driven processing remains a fundamental bottleneck for neuromorphic speech processing. Current systems typically rely on fixed spike encoders, forcing downstream Spiking Neural Networks (SNNs) to compensate for non-adaptive input representations. To address this, we present a learnable residual speech-to-spike encoder jointly trained end-to-end with a Recurrent Leaky Integrate-and-Fire (R-LIF) backbone. We validate this approach on the Google Speech Commands v2 (GSC-v2) benchmark, achieving up to 94.97% accuracy. Notably, the learned encoder remains highly parameter-efficient with a compact 35k-parameter variant that reaches 89.8%, matching or exceeding prior baselines that require an order of magnitude more parameters. Our encoder-focused analysis, including linear probing and gradient-residual inspection, indicates that the encoder does not target faithful signal reconstruction but instead learns task-aligned spike representations that enhance class separability. Finally, we benchmark bio-inspired, hardware-friendly credit assignment by comparing Direct Feedback Alignment (DFA) with surrogate-gradient BPTT under identical architectures and training conditions. We find that DFA reaches 91.5% accuracy, quantifying the performance trade-off of bio-inspired learning rules for modern neuromorphic audio.

URL PDF HTML ☆

赞 0 踩 0

2606.19101 2026-06-18 eess.SP cs.LG 交叉投稿

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性：面向动力学学习的显式交互架构

Augusto Sarti

AI总结提出基于波启发交互结构的显式动力学单元，通过结构化组织而非非线性表达实现建模能力，在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情

AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近，通常需要高模型复杂度来捕获结构化行为。在这项工作中，我们提出了一种替代范式，其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发，所提出的单元采用严格的因果组织，消除了代数循环，产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验，我们表明即使在有限的参数优化下，深度也能提高表示质量和泛化能力。特别地，所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示，这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明，结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法，突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

URL PDF HTML ☆

赞 0 踩 0

2606.19168 2026-06-18 cs.AI cs.LG 交叉投稿

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

超越安全数据：具有正则安全反射的预训练阶段对齐

Jinhan Li, Kexian Tang, Yihan Xu, Zhuorui Ye, Kaifeng Lyu

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息研究院）

AI总结提出安全反射预训练方法，在预训练语料中插入安全反思，使模型具备自我监控能力，实验表明该方法能有效降低推理和微调攻击成功率。

详情

AI中文摘要

为了实现大型语言模型（LLMs）更深层次的安全对齐，最近的研究探讨了如何将安全干预措施提前到预训练阶段，主要通过过滤不安全数据或将其改写为更安全的形式。我们认为，预训练阶段的对齐应超越使数据安全：LLMs可能将看似良性的知识和能力组合成不安全的行为。为此，我们提出了安全反射预训练，一种预训练阶段的对齐方法，该方法定期在预训练语料中插入简短的安全反思，将自我监控直接集成到语言建模中，建立一种基础能力，随后通过兼容的后训练加以强化。我们在FineWeb-Edu上预训练的1.7B模型上的实验表明，安全反射预训练提高了安全分类准确性，并显著降低了推理阶段和微调攻击的成功率。除了真实世界实验，我们还引入了一个完全受控的合成环境MedSafetyWorld，其中包含清晰的安全定义和推理结构，模型可以轻松地从安全数据中泛化出不安全行为。在MedSafetyWorld中的消融实验进一步表明，与数据过滤和改写相比，安全反射预训练在防止模型根据安全数据泛化出的不安全行为方面具有明显优势。综合来看，我们的发现表明，预训练对齐不仅应使训练数据安全，还应塑造模型可能从安全数据中习得的行为。

英文摘要

To achieve deeper safety alignment for large language models (LLMs), recent efforts have studied how to push safety interventions earlier into the pretraining stage, primarily by filtering unsafe data or rewriting it into safer forms. We argue that pretraining-stage alignment should go beyond making the data safe: LLMs may compose seemingly benign knowledge and capabilities into unsafe behaviors. To this end, we propose Safety Reflection Pretraining, a pretraining-stage alignment method which regularly inserts short safety reflections into pretraining corpora to integrate self-monitoring directly into language modeling, establishing a foundational capability that is subsequently reinforced by compatible post-training. Our experiments with 1.7B models pretrained on FineWeb-Edu show that Safety Reflection Pretraining improves safety classification accuracy and substantially reduces the success rates of inference-stage and finetuning attacks. Complementary to our real-world experiments, we also introduce a fully controlled synthetic environment, MedSafetyWorld, with a clear definition of safety and a reasoning structure under which models can easily generalize unsafe behaviors from safe data. Ablations in MedSafetyWorld further demonstrate a clear advantage of Safety Reflection Pretraining in preventing models from acting on unsafe behaviors generalized from safe data, compared with data filtering and rewriting. Taken together, our findings suggest that pretraining alignment should not only make the training data safe, but also shape the behaviors that models are likely to acquire from safe data.

URL PDF HTML ☆

赞 0 踩 0

2606.19279 2026-06-18 cs.AI cs.LG cs.LO math.CT math.LO math.PR 交叉投稿

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch：神经符号学习中范畴语义的可微张量实现

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

发表机构 * University of Osnabrück（奥斯纳布吕克大学）

AI总结提出NeSyCat Torch框架，通过强单子和真值聚合结构统一神经符号语义，利用惰性对数张量单子实现可微训练，在MNIST加法任务上优于LTN和DeepProbLog。

详情

AI中文摘要

神经符号语义是碎片化的：经典、模糊、概率和神经系统的真值各自遵循其归纳规则。NeSyCat扩展了ULLER，将它们统一在一个单一的真值归纳定义下，该定义以强单子和真值上的聚合结构为参数。NeSyCat至今缺乏对由神经网络学习的谓词和函数的描述。我们提供NeSyCat Torch作为缺失的环节，通过神经网络解释计算符号，在概率编程和张量后端中实现该框架。我们使用分布单子作为参考语义和度量评估，并辅以一个用于数值稳定、可微训练的单子：对数半环上的惰性对数张量单子。为了高效批量训练，我们还采用了批处理单子。公理即源代码：一次性地用基于单子的do-notation编写，单子绑定执行边缘化，惰性地剪枝不需要的分支。在MNIST加法任务上，我们的HaskTorch、JAX和PyTorch实现在速度和准确性上优于LTN和DeepProbLog，同时几乎达到DeepStochLog的准确性。然而，与DeepStochLog不同，我们保持在一个统一的框架内，适用于许多一阶神经符号方法。即，该构造以单子为参数；例如，用Giry单子实例化它可将方法扩展到连续概率（在此留作未来工作）。

英文摘要

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

URL PDF HTML ☆

赞 0 踩 0

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； University of Notre Dame（诺丁汉大学）

AI总结提出Agentic Neural Network框架，将多智能体协作建模为分层神经网络，通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化，在七个基准数据集上超越现有方法。

详情

AI中文摘要

利用多个大型语言模型（LLM）已被证明对处理复杂、高维任务有效，但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制，我们提出Agentic Neural Network（ANN）框架，该框架将多智能体协作概念化为分层神经网络架构。在此设计中，每个智能体作为节点运行，每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略：（1）前向阶段——受神经网络前向传播启发，任务被动态分解为子任务，并逐层构建具有合适聚合方法的协作智能体团队。（2）反向阶段——模仿反向传播，我们通过迭代反馈优化全局和局部协作，使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队，在准确性和适应性方面带来显著提升。在七个基准数据集上，我们的工作在相同配置下超越了领先的多智能体基线，显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Pennsylvania（宾夕法尼亚大学）

AI总结通过设计结合连续上下文学习与离散关联召回的新玩具问题，发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制：一种依赖离散符号标签进行关联召回，另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情

AI中文摘要

我们引入了一类新的玩具问题，将线性回归风格的连续上下文学习（ICL）特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型，具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时，是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务，很明显模型必须执行两个功能：（1）识别应召回哪个系统的状态，并将该系统应用于其最后看到的状态；（2）继续应用正确的系统来预测后续状态。训练动态表明，第一个能力在模型训练中后期才出现。令人惊讶的是，第二个能力（继续预测恢复的序列）发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析，我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回，以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关，基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象（表现为不同的相变）不仅仅是玩具设置的人为产物，我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象：第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

URL PDF HTML ☆

赞 0 踩 0

2601.14968 2026-06-18 cs.LG cs.AI 版本更新

超越相似性：时间序列分析中的时序操作注意力

Jevon Twitty, Vinh Pham, Nitiwith Rotchanarak, Viresh Pati, Yubin Kim, Shihao Yang, Jiecheng Lu

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出时序操作注意力（TOA），通过引入可学习的操作符增强注意力机制，以更有效地处理时间序列数据中的符号和振荡变换，提升时间序列预测、异常检测和分类任务的性能。

详情

AI中文摘要

时间序列预测中存在一个持久性悖论：结构简单的MLP和线性模型往往优于高容量的Transformer。我们指出，这种差距源于序列建模基本原理的不匹配：尽管许多时间序列动态由全局时间操作符（如滤波和谐波结构）主导，标准注意力将每个输出视为输入的凸组合。这限制了其表示带符号和振荡变换的能力，这些能力对于时间信号处理至关重要。我们正式将这一限制定义为softmax注意力中的简单约束混合瓶颈，这对由操作符驱动的时间序列任务尤其限制性。为了解决这一问题，我们提出时序操作注意力（TOA），一种通过显式、可学习的序列空间操作符增强注意力的框架，使时间内的符号混合成为可能，同时保持输入依赖的适应性。为了使密集的N×N操作符实用化，我们引入了随机操作符正则化，一种高方差的dropout机制，它稳定了训练并防止了记忆性学习。在预测、异常检测和分类基准上，TOA在集成到标准骨干如PatchTST和iTransformer时始终提高了性能，尤其是在重建密集任务中表现尤为突出。这些结果表明，显式操作符学习是有效时间序列建模的关键要素。

英文摘要

A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while many time-series dynamics are governed by global temporal operators (e.g., filtering and harmonic structure), standard attention forms each output as a convex combination of inputs. This restricts its ability to represent signed and oscillatory transformations that are fundamental to temporal signal processing. We formalize this limitation as a simplex-constrained mixing bottleneck in softmax attention, which becomes especially restrictive for operator-driven time-series tasks. To address this, we propose $\textbf{Temporal Operator Attention (TOA)}$, a framework that augments attention with explicit, learnable sequence-space operators, enabling direct signed mixing across time while preserving input-dependent adaptivity. To make dense $N \times N$ operators practical, we introduce Stochastic Operator Regularization, a high-variance dropout mechanism that stabilizes training and prevents trivial memorization. Across forecasting, anomaly detection, and classification benchmarks, TOA consistently improves performance when integrated into standard backbones such as PatchTST and iTransformer, with particularly strong gains in reconstruction-heavy tasks. These results suggest that explicit operator learning is a key ingredient for effective time-series modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.01249 2026-06-18 cs.LG cs.CL 版本更新

Trust Region On-Policy Distillation

信任区域在线策略蒸馏

Xingrun Xing, Haoqing Wang, Boyan Gao, Ziheng Li, Yehui Tang

发表机构 * Samsung Research（三星研究院）； University of Oxford（牛津大学）； Peking University（北京大学）

AI总结提出信任区域在线策略蒸馏（TrOPD），通过信用分配策略和信任区域学习解决师生分布差异导致的训练不稳定问题，在数学推理、代码生成和通用基准上超越现有方法。

详情

AI中文摘要

在线策略蒸馏（OPD）是大型语言模型（LLM）高效后训练的基本技术，在智能体学习、多任务增强和模型压缩中具有广泛应用。然而，当教师和学生分布差异较大时，OPD训练变得不稳定，因为教师对学生生成token的监督可能产生不可靠的策略梯度，甚至导致优化失败。本文通过信用分配策略解决可靠的在线策略token级监督问题，并提出信任区域在线策略蒸馏（TrOPD）。它具有以下特点：1）信任区域在线策略学习：TrOPD仅在教师提供可靠监督的区域进行OPD，缓解了分布不匹配下K1反向KL估计的优化困难。2）异常值估计：对于异常区域，我们探索梯度裁剪、掩码和前向KL估计，以减少不可靠监督的不利影响。3）离策略引导：学生从教师前缀继续生成，并使用前向KL模仿离策略引导，鼓励向可靠区域进行在线策略探索。实验表明，TrOPD在数学推理、代码生成和通用领域基准上始终优于最先进的OPD基线，包括OPD、EOPD和REOPOLD。

英文摘要

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.06564 2026-06-18 cs.LG cs.AI 版本更新

HAARES Half-Split Residual Basis Routing for Deep Transformers

WAV：面向深度仅解码器Transformer的多分辨率块残差路由

Kehan Wang

发表机构 * Chongqing University（重庆大学）

AI总结提出WAV v1方法，通过为每个块增加方向性细节基（相位基和分裂基）来增强残差路由，在深层Transformer中优于现有方法，48层时在TinyStories和Text8上取得更低验证损失。

Comments 6 pages, 4 figures, 3 tables

详情

AI中文摘要

残差连接对于训练深度Transformer至关重要，但标准的PreNorm残差流以固定的单位权重聚合子层更新。最近的注意力残差用内容相关的深度路由替代了这种固定累积，而块注意力残差通过对块级残差摘要进行路由使机制高效。然而，单个块摘要仅存储块内的低频总残差位移，丢弃了方向性结构，例如注意力与MLP的不平衡以及早期与晚期块的动态。我们提出WAV v1，一种用于仅解码器Transformer的轻量级多分辨率残差路由方法。WAV v1不是仅通过累积残差和来表示每个块，而是为每个块增加两个方向性细节基：一个对比注意力和MLP更新的相位基，以及一个对比早期和晚期子层更新的分裂基。这些基与标准块摘要一起通过相同的深度softmax混合器进行路由，而负细节源初始化和分离的RMS匹配稳定了训练。在字符级TinyStories和Text8语言建模中，WAV v1显示出明显的深度相关优势。尽管在12层时并非始终有益，但在24层时变得有竞争力，并在48层时优于所有基线。在48层时，WAV v1将TinyStories上的验证损失从0.4960降至0.4738，Text8上从0.9363降至0.9305，且额外参数可忽略。这些结果表明，方向性残差细节（而不仅仅是块级和）对于在更深Transformer中扩展残差路由很重要。

英文摘要

Block-level residual routing makes learned residual aggregation practical by routing over block summaries, but each summary compresses an ordered sequence of attention and MLP updates into one cumulative vector. We propose \method{}, a lightweight residual basis router that keeps the cumulative block source and adds one half-split detail basis, computed as the difference between first-half and second-half residual updates. The detail basis is RMS-matched and updated online, exposing coarse intra-block trajectory information without dense sublayer-level routing. Across OpenWebText, cross-domain character-level benchmarks, and BPE-tokenized OpenWebText, the empirical pattern is depth-dependent: gains are small or mixed at shallow depth and most reliable in 48-layer models. In the 201M 48-layer setting, \method{} improves over Block AttnRes across all three seeds, while a 453M two-seed probe shows the same direction. Ablations rule out source duplication, random signed details, fixed detail-source biases, or block-count changes alone. Cost analysis shows that the method is FLOP-light but not wall-clock-free: it adds memory and routing overhead, yet its relative arithmetic cost is amortized as width grows and earlier convergence can reduce time-to-target.

URL PDF HTML ☆

赞 0 踩 0

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3：面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

发表机构 * NVIDIA

AI总结提出基于统一混合Transformer架构的全模态世界模型Cosmos 3，联合处理语言、图像、视频、音频和动作序列，在理解和生成任务上达到新最优，为具身智能体提供可扩展的通用骨干。

详情

AI中文摘要

我们介绍了Cosmos 3，一个全模态世界模型家族，设计用于在统一的混合Transformer架构中联合处理和生成语言、图像、视频、音频和动作序列。通过支持高度灵活的输入输出配置，Cosmos 3无缝统一了物理AI的关键模态——有效地将视觉语言模型、视频生成器、世界模拟器和世界动作模型整合到一个框架中。我们的评估表明，Cosmos 3在一系列多样化的理解和生成任务中确立了新的最优水平，展示了全模态世界模型作为具身智能体可扩展、通用骨干的能力。我们的后训练Cosmos 3模型在技术报告撰写时被Artificial Analysis评为最佳开源文本到图像和图像到视频模型，并被RoboArena评为最佳策略模型。为了加速物理AI领域的开放研究和部署，我们在Linux基金会的OpenMDW-1.1许可证下提供我们的代码、模型检查点、策划的合成数据集和评估基准，网址为https://this https URL License at this https URL }{ this http URL and this https URL。项目网站位于https://this https URL。

英文摘要

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

URL PDF HTML ☆

赞 0 踩 0

2606.18383 2026-06-18 cs.LG cs.CL 新提交

From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability

从稀疏特征到可信代理：认证基于SAE的可解释性

Dibyanayan Bandyopadhyay, Asif Ekbal

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Patna（印度理工学院巴特那分校计算机科学与工程系）

AI总结提出一种后验泛化框架，通过稀疏代理（SAE重建）认证语言模型，推导期望风险上界，并在GPT-2 Small等模型上验证非平凡界，揭示深层更易认证且特征分解区分语义对齐与统计稀疏性。

详情

AI中文摘要

稀疏自编码器（SAE）越来越多地被用于从语言模型（LM）中提取可解释特征，但一个核心问题仍然存在：基于SAE的解释何时可以被视为底层冻结LM的忠实视图？我们通过一个后验泛化框架来研究这个问题，该框架通过稀疏代理来认证LM，稀疏代理是通过将原生隐藏激活替换为其预训练的SAE重建而获得的。我们的框架使用四个可测量量推导出基础模型期望风险的上界：代理风险、SAE重建差距、概念池不匹配和稀疏复杂度。我们将此证书解释为解释忠实性的操作标准。特别地，非平凡界表明提取的稀疏特征保留了有意义的预测信息，而小的重建和匹配误差表明代理在行为上接近原始模型。实验上，我们展示了在GPT-2 Small、Gemma-2B和Llama-3-8B上，该界在实际样本量下变得非平凡。对Llama-3-8B的详细逐层分析揭示了强烈的深度依赖性，较深层变得更容易认证，这与更强的局部保真度和更弱的下游误差放大相关。最后，通过特征洗牌消融，我们展示了分解区分了真正的语义对齐与单纯的统计稀疏性，为基于SAE的解释何时变得不太可靠提供了有用的诊断。

英文摘要

Sparse autoencoders (SAEs) are increasingly used to extract interpretable features from language models (LMs), yet a central question remains: when can an SAE-based explanation be treated as a faithful view of an underlying frozen LM We study this through a post-hoc generalization framework that certifies the LM via a sparse proxy, obtained by replacing a native hidden activation with its pretrained SAE reconstruction. Our framework derives an upper bound on the base model's expected risk using four measurable quantities: proxy risk, SAE reconstruction gap, concept-pool mismatch, and sparse complexity. We interpret this certificate as an operational criterion for explanatory faithfulness. In particular, a non-vacuous bound indicates that the extracted sparse features retain meaningful predictive information, while small reconstruction and mismatch errors indicate that the proxy remains behaviorally close to the original model. Empirically, we show that the bound becomes non-vacuous on GPT-2 Small, Gemma-2B, and Llama-3-8B at practical sample sizes. A detailed layerwise analysis of Llama-3-8B reveals a strong depth dependence, with later layers becoming much easier to certify, associated with both stronger local fidelity and weaker downstream error amplification. Finally, through feature-shuffling ablations, we show that the decomposition distinguishes genuine semantic alignment from mere statistical sparsity, providing a useful diagnostic for when SAE-based explanations become less reliable.

URL PDF HTML ☆

赞 0 踩 0

2606.18390 2026-06-18 cs.LG q-bio.QM 新提交

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

MOLAR: 从噪声标签中学习多模态分子表示

Yingxu Wang, Kunyu Zhang, Nan Yin, Yu Li, Eran Segal

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Zhengzhou University（郑州大学）； The Education University of Hong Kong（香港教育大学）； The Chinese University of Hong Kong（香港中文大学）； Weizmann Institute of Science（魏茨曼科学研究所）

AI总结提出MOLAR框架，通过分离干净属性推断与标签观测，利用图与文本模态的残差证据，从噪声标签中学习多模态分子表示，在自然噪声和标签翻转基准上优于基线方法。

详情

AI中文摘要

动机：噪声标签是分子属性预测中的常见挑战，因为分子注释通常来自实验分析、 curated数据库或弱注释流程，而非直接观测到的干净生物状态。将记录标签视为可靠监督会导致模型记忆损坏的观测并学习误导性的分子证据。在多模态分子表示学习中，图-文本融合或对齐可能放大此问题，从而跨模态传播标签引起的错误。结果：我们提出MOLAR，一个从噪声标签中学习多模态分子表示的噪声感知框架。MOLAR将潜在干净属性推断与记录标签观测分离：图和文本视图为干净属性分布贡献残差证据，一个分类标签观测通道将此分布映射到记录标签用于训练。该公式从模型中推导出后验标签可靠性和模态特定的分子证据。在自然噪声分子基准和受控标签翻转基准上的实验表明，MOLAR始终优于代表性基线。可视化分析进一步表明MOLAR提供了可解释的可靠性和模态证据诊断。

英文摘要

Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2606.18688 2026-06-18 cs.LG cs.AI 新提交

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

双通道接地世界建模 (DCGWM)：通过异构外部接地与内向梯度流结构性防止目标干扰崩溃

Akshay Hazare

发表机构 * Independent Researcher（独立研究者）

AI总结提出双通道接地世界建模（DCGWM），通过分区潜空间和内向梯度流，结构性防止联合嵌入预测架构中多目标接地导致的目标干扰崩溃。

Comments Position paper. Experimental validation in progress

详情

AI中文摘要

联合嵌入预测架构（JEPAs）是世界模型表示学习的主要方法。我们识别出基于JEPA的世界模型在接地于两种性质不同的外部信号时存在一种失败模式：物理动力学（稀疏、高幅度、满足约束的梯度修正）和社会行为动力学（扩散、分布匹配的修正）。我们将其称为目标干扰崩溃（OIC）：我们认为在共享潜空间中的联合学习会导致主导通道系统地崩溃从属通道的表示子空间，且仅通过损失加权无法解决。我们提出双通道接地世界建模（DCGWM），通过分区潜空间（物理子空间Z_p，行为子空间Z_b）和内向梯度流，从结构上防止OIC。物理接地通道通过VICReg风格的对齐到物理测量仅更新Z_p；社会行为接地通道通过对齐到涌现多智能体模拟的轨迹仅更新Z_b。通道间接口模块在任务级别耦合子空间，而不产生跨子空间梯度。非对称接地 adherence 损失通过硬铰链惩罚物理违反和软KL惩罚行为发散来惩罚 rollout 漂移。生成渲染层在架构上与潜世界模型隔离。我们给出三个理论结果：分区消除了与OIC相关的梯度干扰路径；每个接地子空间从其对齐目标继承抗崩溃保证；在生成目标几何形状的假设下，生成隔离是必要的。本文建立了问题表述和架构；实验验证正在进行中，将在未来修订中报告。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this Objective Interference Collapse (OIC): we argue that joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone. We propose Dual-Channel Grounded World Modeling (DCGWM), designed to structurally prevent OIC through a partitioned latent space (physical subspace Z_p, behavioral subspace Z_b) with inward-only gradient flow. A Physical Grounding Channel updates only Z_p via VICReg-style alignment to physical measurements; a Social-Behavioral Grounding Channel updates only Z_b via alignment to trajectories from an emergent multi-agent simulation. An Inter-Channel Interface Module couples the subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model. We present three theoretical results: the partition removes the gradient-interference pathway implicated in OIC; each grounded subspace inherits anti-collapse guarantees from its alignment objective; and generative isolation is necessary under a stated assumption on the generative objective's geometry. This manuscript establishes the problem formulation and architecture; experimental validation is ongoing and will be reported in a future revision.

URL PDF HTML ☆

赞 0 踩 0

2606.18703 2026-06-18 cs.LG q-bio.QM 新提交

Transformer几何观测站TGO-I：谱几何观测站

Kaustubh Kapil, Kishor P. Upla

发表机构 * Sardar Vallabhai National Institute of Technology (SVNIT), Surat, India（印度苏拉特萨达尔·瓦拉巴伊国家理工学院（SVNIT））

AI总结提出TGO框架，通过分析ViT表示的谱几何（有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性等），发现训练过程中维度利用增加、各向异性降低、谱熵和参与比上升，最终CLS标记表示具有最高有效维度和最低各向异性。

详情

AI中文摘要

尽管Vision Transformers（ViTs）被广泛采用并在众多计算机视觉应用中取得成功，对其维度和表示几何的基本理解仍然相对未被充分探索。为了弥补这一差距，我们引入了Transformer几何观测站（TGO），这是一个系统的实验和分析流程框架，旨在研究Vision Transformers的表示几何和动态。TGO-I是该框架的第一部分，专注于ViT表示的谱几何。使用在ImageNet-100上训练的ViT-Small/16模型，我们分析了训练过程中的有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性、协方差结构、特征谱和奇异值谱。我们的结果揭示了维度利用的一致增加，伴随着各向异性降低、谱熵增加、参与比增加以及逐渐平坦的特征谱。与常见的直觉（即训练应将信息集中到少数主导方向）相反，我们观察到方差在表示维度上的逐渐重新分布。这一现象在最终的CLS标记表示中尤为明显，该表示在网络中表现出最高的有效维度和最低的各向异性。

英文摘要

Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored. To address this gap, we introduce Transformer Geometry Observatory (TGO), a systematic framework of experiments and analysis pipelines designed to investigate the representational geometry and dynamics of Vision Transformers. TGO-I, the first installment of the framework, focuses on the spectral geometry of ViT representations. Using a ViT-Small/16 model trained on ImageNet-100, we analyze Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout training. Our results reveal a consistent increase in dimensional utilization, accompanied by decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. Contrary to the common intuition that training should concentrate information into a small number of dominant directions, we observe a progressive redistribution of variance across representational dimensions. This phenomenon is particularly pronounced in the final CLS token representation, which exhibits the highest effective dimensionality and lowest anisotropy within the network.

URL PDF HTML ☆

赞 0 踩 0

2406.07775 2026-06-18 cs.LG 版本更新

入乡随俗：从异构智能体学习通用行为

Caleb Chang, Davin Win Kyi, Natasha Jaques, Karen Leung

发表机构 * University of Washington（华盛顿大学）； NVIDIA（英伟达）

AI总结提出GRID方法，从追求不同目标的异构示范者中提取通用奖励，训练通用智能体以学习环境通用能力，避免模式平均偏差，提升下游任务微调效率。

详情

AI中文摘要

人类通常通过观察他人来获取新技能，因为观察到的行为隐含地揭示了如何在环境中行动。然而，从异构群体中获得的观察会引入冲突的行为信号，使得难以确定哪些行为值得模仿。我们通过通用奖励推断与解耦（GRID）来解决这一挑战，这是一种从追求不同目标的异构示范者群体中提取普遍有用行为的社会学习方法。GRID将每个智能体的奖励函数分解为通用奖励（捕捉所有智能体共享的行为）和特定奖励（捕捉个体偏好和目标）。仅基于通用奖励进行训练提供了一种通用预训练的新范式。它产生了一个通用智能体，该智能体内化了通用的环境能力，如安全性和基本任务熟练度，而不会出现困扰标准从示范学习技术的模式平均偏差。这个通用智能体作为微调到下游任务（包括训练中未见过的偏好）的优越先验。在合成基函数分解、多智能体Craftax和连续自动驾驶模拟器（Highway-Env）上的实验证实，GRID以语义上有意义的方式成功解耦了奖励结构，优于标准的从示范学习基线，并实现了更高效和稳定的特化。

英文摘要

Humans often acquire new skills by observing others, since observed behaviors implicitly reveal how to act in an environment. However, observations drawn from a heterogeneous population introduce conflicting behavioral signals, making it difficult to determine which behaviors are worth imitating. We address this challenge with General Reward Inference and Disentanglement (GRID), a social learning method that extracts universally useful behaviors from a heterogeneous population of demonstrators pursuing different goals. GRID decomposes per-agent reward functions into a general reward, capturing behaviors shared across all agents, and specific rewards, capturing individual preferences and objectives. Training exclusively on the general reward provides a new paradigm of generalist pretraining. It yields a generalist agent that internalizes universal environmental competencies, such as safety and basic task proficiency, without the mode-averaging bias that afflicts standard learning from demonstration techniques. This generalist serves as a superior prior for fine-tuning to downstream tasks, including preferences unseen during training. Experiments across a synthetic basis function decomposition, multi-agent Craftax, and a continuous autonomous driving simulator (Highway-Env) confirm that GRID successfully disentangles reward structure in a semantically meaningful way, outperforms standard learning from demonstration baselines, and enables more efficient and stable specialization.

URL PDF HTML ☆

赞 0 踩 0

2606.18785 2026-06-18 cs.LG cs.AI 新提交

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

贝叶斯任意时间帕累托集识别用于多目标多臂老虎机

Lennert Saerens, Bram Silue, Eleni Litsa, Peter Vrancx, Pieter Libin

发表机构 * imec ； Data Science Institute, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, UHasselt（哈瑟尔特大学生物统计学与统计生物信息学跨大学研究所数据科学研究所）

AI总结提出首个任意时间多目标多臂老虎机算法Top-Two帕累托前沿汤普森采样(TTPFTS)，用于帕累托集识别，在合成环境和超大型分子库中验证有效性，并引入不确定性量化指标。

Comments 26 pages, 13 figures

详情

AI中文摘要

识别帕累托最优解对于支持多目标决策至关重要。我们首次提出了一种用于帕累托集识别问题的任意时间多目标多臂老虎机算法，采用贝叶斯方法：Top-Two帕累托前沿汤普森采样（TTPFTS）。我们在合成环境中将TTPFTS与最先进的固定预算帕累托集识别算法进行基准测试。接下来，我们通过高效探索超大型按需合成分子库，在具有挑战性的多目标分子发现场景中展示了其实用性。此外，我们引入了一种新颖的不确定性量化指标，用于估计算法在预测帕累托集上的置信度。我们证明该指标有效代理真实性能，为监控复杂环境中的学习进度提供了一种稳健的方法。最后，我们用算法渐近正确性的理论证明补充了这些实证发现。

英文摘要

Identifying Pareto optimal solutions is critical to support multi-objective decision-making. We introduce the first anytime Multi-Objective Multi-Armed Bandit algorithm for the Pareto Set Identification problem, taking a Bayesian approach: Top-Two Pareto Front Thompson Sampling (TTPFTS). We benchmark TTPFTS against state-of-the-art fixed-budget Pareto Set Identification algorithms on synthetic environments. Next, we demonstrate its practical utility in a challenging multi-objective molecular discovery setting by efficiently exploring an ultra-large synthesis-on-demand molecular library. Furthermore, we introduce a novel uncertainty quantification metric that estimates our algorithm's confidence in the predicted Pareto set. We demonstrate that this metric effectively proxies true performance, yielding a robust methodology for monitoring learning progress in complex settings. Finally, we complement these empirical findings with a theoretical proof of the algorithm's asymptotic correctness.

URL PDF HTML ☆

赞 0 踩 0

2606.18810 2026-06-18 cs.LG cs.AI 新提交

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

从自身解中学习：面向可验证奖励强化学习的自条件化信用分配

Yingyu Shan, Yuhang Guo, Zihao Cheng, Zeming Liu, Xiangrong Zhu, Xinyi Wang, Jiashu Yao, Wei Lin, Hongru Wang, Heyan Huang

发表机构 * Beijing Institute of Technology（北京理工大学）； Beihang University（北京航空航天大学）； Independent Researcher（独立研究者）

AI总结提出SC-GRPO方法，利用自条件化分布间的KL散度作为GRPO梯度的乘性权重，实现细粒度信用分配，在数学、代码和智能体任务上平均提升8.1%。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）在训练LLMs进行推理任务方面取得了显著进展，但代表性方法如GRPO对所有token分配统一信用，浪费了常规token上的梯度，同时低估了关键推理步骤。现有的token级信用分配方法需要超出模型自身rollout的资源。GRPO变体依赖于过程奖励模型或真实答案。知识蒸馏通过每个token的散度分配信用，但需要外部教师（在线策略蒸馏）或特权信息（在线策略自蒸馏）。然而，这些依赖性限制了在纯RLVR设置中的适用性。我们观察到，将模型以其自身验证过的轨迹为条件，会在原始分布和条件分布之间诱导出可测量的每token KL散度，并证明当存在多个验证过的轨迹时，从由验证过的轨迹构建的自教师进行蒸馏会导致不可行的加权平均解。我们提出SC-GRPO（自条件化GRPO），它使用前述KL散度作为GRPO梯度的乘性权重。在涵盖数学、代码和智能体任务的五个基准上，SC-GRPO一致优于GRPO 8.1%，优于DAPO 5.9%，并具有更强的分布外性能。此外，SC-GRPO实现了比OPD更高的性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.

URL PDF HTML ☆

赞 0 踩 0

2606.18812 2026-06-18 cs.LG cs.AI 新提交

Reinforcement Learning Foundation Models Should Already Be A Thing

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France（巴黎高等师范学院，PSL大学，法国巴黎）； Soda team, Inria Saclay, Palaiseau, France（Soda团队，法国国家信息与自动化研究所萨克雷中心，法国帕莱索）

AI总结提出通过合成MDP构建强化学习基础模型，利用固定大小的充分统计量使注意力架构适用，在线和离线实验均优于传统算法。

详情

AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动，而结构化领域（表格预测、时间序列预测、图学习、强化学习）则不然。替代方案是合成数据，它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中：TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先}，强化学习是明显的空白：采样一个合成MDP与采样一个合成表格数据集一样可行，然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次}，MDP允许一个固定大小的充分统计量，独立于观察到的回合且形状为表格形式，这使得它们直接适用于用于表格基础模型的基于注意力的架构，只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证，我们完全在合成MDP上训练一个模型，并表明，无需任务特定的调优，它就能在上下文中解决留出的表格基准，包括在线和离线：在线时，使用比UCB-VI和表格Q-learning少得多的回合；离线时，与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains (tabular prediction, time-series forecasting, graph learning, reinforcement learning) are not. The substitute is synthetic data, which shifts the burden from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train one model entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

URL PDF HTML ☆

赞 0 踩 0

2606.18820 2026-06-18 cs.LG cs.AI 新提交

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

成熟马尔可夫决策过程：信息增加与动作集缩小下的决策制定

Jiaxi Liu, Aiping Yang, Yuhang Yang, Shuqi Zhang, Zewei Dong, Jiangming Yang, Xuebin Chen

发表机构 * Ant International（蚂蚁国际）； School of Economics, Sichuan University（四川大学经济学院）； School of Economics, Fudan University（复旦大学经济学院）

AI总结针对决策过程中信息增加与动作集缩小的不对称性，提出成熟马尔可夫决策过程（MMDP）框架，并基于过期动作优先级原则开发结构感知强化学习方法，实验证明其能提升学习效率。

Comments 25 pages, 9 figures

详情

AI中文摘要

序列决策问题通常表现出信息和决策灵活性的不对称演化：随着决策周期的展开，智能体获得更丰富的信息，而由于操作截止、承诺或资源约束，可行动作逐渐过期。标准的MDP公式通常将这种结构扁平化为阶段相关的状态描述和动作掩码，从而掩盖了嵌套的信息-动作不对称性，而这种不对称性决定了哪些决策是紧急的、哪些可以推迟。我们引入了成熟马尔可夫决策过程（MMDP），这是一种围绕这种信息-动作不对称性构建的公式。我们通过一个过期动作优先级原则来刻画其关键后果之一，该原则识别出必须在下一阶段之前解决的动作。受此结构启发，我们开发了一个结构感知的强化学习框架，包括阶段感知的策略设计、过期动作抽象以及带有蒸馏的搜索增强学习。在受控的多供应商补货问题、复杂度递增的简化现金管理环境以及生产级模拟器上的实验表明，显式建模这种不对称性可以提高学习效率，并且随着决策问题的规模扩大，其价值日益增加。

英文摘要

Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information--action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information--action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

URL PDF HTML ☆

赞 0 踩 0

2606.18910 2026-06-18 cs.LG cs.CL 新提交

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

REVES：通过修订与验证增强的测试时扩展训练

Yuanxin Liu, Ruida Zhou, Xinyan Zhao, Amr Sharaf, Hongzhou Lin, Arijit Biswas, Mohammad Ghavamzadeh, Zhaoran Wang, Mingyi Hong

发表机构 * Northwestern University（西北大学）； Amazon AGI（亚马逊人工智能实验室）； Qualcomm AI Research（高通人工智能研究）； University of Minnesota（明尼苏达大学）

AI总结提出REVES框架，通过将中间步骤的“接近正确”答案转化为解耦的修订和验证提示，实现高效的离策略数据生成，提升大语言模型的多步推理能力，在LiveCodeBench上比强化学习基线高6.5分。

详情

AI中文摘要

通过顺序修订进行测试时扩展已成为增强大语言模型（LLM）推理能力的强大范式。然而，标准的后训练方法主要优化单次目标，与多步推理动态存在根本性不匹配。虽然最近的工作将其视为多轮强化学习（RL），但传统方法直接优化多步轨迹，未能进一步利用模型可以从纠正中学习的中间步骤中的高质量错误。我们提出了一个两阶段迭代框架，交替进行在线数据/提示增强和策略优化。通过将成功恢复轨迹中的中间步骤（“接近正确”答案）转化为解耦的修订和验证提示，我们的方法将训练集中在有效的答案转换和错误识别上。与标准的多轮RL相比，这种方法实现了高效的离策略数据生成，并减少了长程采样的计算开销。在LiveCodeBench上，使用公开可用的测试用例作为反馈，我们观察到比RL基线高6.5分，比标准多轮训练高4.0分。除了编码，我们的方法在圆填充问题上达到了先前报告的SOTA结果，同时使用了最小的基础模型（4B）和远少于更大进化搜索系统的采样次数。在真实验证下的数学结果进一步证实了改进的纠正能力。该方法还泛化到分布外的约束满足谜题，如n皇后和迷你数独，其中正确性完全由问题约束定义。代码可在该https URL获取。

英文摘要

Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a fundamental misalignment with multi-step inference dynamics. While recent work treats this as multi-turn reinforcement learning (RL), conventional approaches optimize over the multi-step trajectories directly, failing to further exploit the high-quality mistakes in intermediate steps that model can learn from correcting them. We propose a two-stage iterative framework that alternates between online data/prompt augmentation and policy optimization. By converting the intermediate steps (``near-miss'' answers) in the successful recovery trajectories into decoupled revision and verification prompts, our approach concentrates training on both effective answer transformation and error identification. This approach enables efficient off-policy data generation and reduces the computational overhead of long-horizon sampling compared to standard multi-turn RL. On LiveCodeBench, using publicly available test cases as feedback, we observe gains of +6.5 points over the RL baseline and +4.0 points over standard multi-turn training. Beyond coding, our approach matches the previously reported SOTA result on circle packing while using the smallest base model (4B) and far fewer rollouts than the much larger evolutionary search systems. Math results under ground-truth verification further confirm improved correction ability. It also generalizes to out-of-distribution constraint-satisfaction puzzles such as n\_queens and mini\_sudoku, where correctness is defined entirely by problem constraints. Code is available at https://github.com/yxliu02/REVES.git.

URL PDF HTML ☆

赞 0 踩 0

2606.18963 2026-06-18 cs.LG 新提交

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

无环境奖励的固定通道感知事件流在线奖惩学习

Zirong Li

发表机构 * Zirong Li（李 Cirong）

AI总结提出OHIRL框架，在无标量奖励下通过固定通道感知流进行在线奖惩学习，利用内部轨迹评估器推断感知维度的效价，在XOR任务和CartPole等控制任务中达到高准确率。

Comments 9 pages, 5 figures, 6 tables; 13-page technical supplement

详情

AI中文摘要

我们研究当环境不提供标量奖励或评估标签时的在线奖惩学习。在每一步，智能体仅接收一个固定通道的感知数据包，诸如疼痛、能量、接触、损伤或认知错误等量被视为感知维度，其效价必须从转移后果中推断。OHIRL分离了四个角色：M_psi学习下一数据包预测，D_omega建模残差动力学，C_eta是一个固定的内部转移后轨迹评估器，B_xi学习使用由此产生的价值证据进行后续策略更新和动作评分。C_eta采用恢复正性、持久/增长负性的残差调节取向；系数来源审计显示，等单元、原始等值和随机单调变体保留了超过92%的已发布顶级动作排名，而符号反转保留了0%。无奖励协议暴露观察转移，同时隐藏环境奖励、延迟外部评估器、成功标签和动作好坏标签。条件误差分解将B_xi的证据估计误差与残差策略优化误差分离。在2x2-XOR数据包任务中，药物和辣椒在视觉XOR上下文中获得相反的价值，并且相同的疼痛或辣度增加可能根据后果结构为正或负；B_xi达到0.952的平衡奖励符号准确率。在完整的在线交错审计中，M_psi达到留出R2=0.907，B_xi达到0.940的符号准确率，策略达到0.979的最优动作准确率，而即时数据包分数、预测误差奖励、打乱目标、零奖励和误差减少控制均崩溃。隐藏奖励的CartPole和Taxi控制、公共上下文无泄漏审计以及模块角色消融进一步测试了信息边界和组件必要性。

英文摘要

We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptual dimensions whose valence must be inferred from transition consequences. OHIRL separates four roles: M_psi learns next-packet prediction, D_omega models residual dynamics, C_eta is a fixed internal post-transition trajectory evaluator, and B_xi learns to use the resulting value evidence for later policy updates and action scoring. C_eta uses a recovery-positive and persistence/growth-negative residual-regulation orientation; a coefficient-origin audit shows that equal-unit, raw-equal, and random monotone variants preserve more than 92% of the released top-action rankings, while sign inversion preserves 0%. The reward-free protocol exposes observation transitions while withholding environment rewards, delayed external evaluators, success labels, and action-goodness labels. A conditional error decomposition separates B_xi evidence-estimation error from residual policy-optimization error. In a 2x2-XOR packet task, medicine and chili acquire opposite value under visual XOR contexts, and the same pain or spice increase can be positive or negative depending on consequence structure; B_xi reaches 0.952 balanced reward-sign accuracy. In a full online-interleaved audit, M_psi reaches holdout R2=0.907, B_xi reaches 0.940 sign accuracy, and the policy reaches 0.979 optimal-action accuracy, while immediate packet scores, prediction-error rewards, shuffled targets, zero reward, and error-reduction controls collapse. Hidden-reward CartPole and Taxi controls, public-context no-leakage audits, and module-role ablations further test information boundaries and component necessity.

URL PDF HTML ☆

赞 0 踩 0

2606.19134 2026-06-18 cs.LG cs.AI 新提交

Pareto Q-Learning with Reward Machines

带奖励机的帕累托Q学习

Arnaud Lequen, Clément Legrand-Lixon, Léo Saulières

AI总结提出PQLRM算法，结合帕累托Q学习和奖励机，在多目标强化学习中高效逼近帕累托前沿，并处理非马尔可夫奖励。

Comments Accepted at the ICAPS 2026 Workshop on Bridging the Gap Between AI Planning and (Reinforcement) Learning (PRL)

2606.19199 2026-06-18 cs.LG cs.AI 新提交

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

预测关键因素：面向决策的强化学习用于未知离开时间的受控电动汽车充电

Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder

发表机构 * Ghent University -- imec（根特大学 -- imec）

AI总结针对电动汽车充电中离开时间未知导致强化学习策略效果差的问题，提出面向决策的强化学习框架，联合训练预测器与控制器，实现端到端优化，使总奖励提升14%，未供应能量减少55%。

Comments ACM e-Energy 2026 5 pages, 1 figure, 1 table

详情

DOI: 10.1145/3744255.3811736

AI中文摘要

近年来电动汽车的普及给电力系统带来了挑战，包括峰值需求增加和潜在的电网不稳定。基于强化学习的智能充电控制可以通过从历史数据中学习时间和上下文模式来缓解这些问题。然而，在现实场景中，关键特征（如离开时间）通常不可用。这使得强化学习智能体更难学习和执行有效的充电策略。为了减轻这种不确定性，训练好的预测器可以从可用数据中近似未知特征。然而，由于这些预测模型通常针对准确性（而非对下游智能体决策质量的影响）进行训练，它们的误差可能会传播并阻碍使用预测的控制器的整体性能。为了避免这种情况，我们提出了一种面向决策的强化学习框架，其中预测器是端到端训练的，即通过强化学习智能体采取的充电策略动作的反馈。这种预测器和控制器的联合训练最终产生了更高质量的动作：与没有离开时间预测的强化学习方法相比，我们提出的面向决策的强化学习方法产生了更优的充电决策，总奖励提高了14%，未供应能量（即由于电动汽车已离开而未能进行的充电）减少了55%。

英文摘要

The recent growth of EV adoption poses challenges for power systems, including increased peak demand and potential grid instability. Smart control of EV charging -- e.g., based on reinforcement learning (RL) -- can alleviate these issues by learning temporal and contextual patterns from historical data. Yet, in real-world scenarios, key features, such as departure time, often are unavailable. This, in turn, makes it harder for an RL agent to learn and execute an effective charging policy. To mitigate this uncertainty, a trained forecaster can approximate the unknown features from available data. However, since these forecasting models are typically trained for accuracy (rather than their impact on a downstream agent's decision quality), their errors may propagate and hinder the overall performance of a controller that is using the forecasts. To avoid this, we propose a decision-focused RL (DF-RL) framework in which the forecaster is trained end-to-end, i.e., with feedback from the charging policy actions taken by the RL agent. Such joint training of both the forecaster and controller ultimately results in higher-quality actions: our proposed DF-RL method yields superior charging decisions compared to other baselines, achieving up to a 14% improvement in total reward and a 55% reduction of unsupplied energy (i.e., charging that failed to happen because the EV already left), relative to the RL method without departure time forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.19236 2026-06-18 cs.LG cs.AI cs.CL 新提交

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE: 基于惊讶度的令牌级优势重加权以实现策略熵稳定性

Haipeng Luo, Qingfeng Sun, Songli Wu, Can Xu, Wenfeng Deng, Han Hu, Yansong Tang

发表机构 * Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Tencent Hunyuan（腾讯混元）

AI总结针对GRPO等RL算法中策略熵崩溃问题，提出STARE方法，通过惊讶度分位数识别熵关键令牌并重加权其优势，结合目标熵闭环门控稳定熵，在1.5B-32B模型和多种任务上实现稳定训练，AIME24/25准确率提升4%-8%。

Comments LLM, Reinforcement Learning

详情

AI中文摘要

基于可验证奖励的强化学习算法（如GRPO）已成为LLMs复杂推理的主流后训练范式，但通常在训练中遭受策略熵崩溃。我们对GRPO下的令牌级熵动态进行一阶梯度分析，识别出令牌级信用分配不匹配：每个令牌的熵变化分解为轨迹级优势与下一个令牌分布上的熵敏感函数的乘积，产生优势-惊讶度四象限结构和近临界性质。受此启发，我们提出STARE（基于惊讶度的令牌级优势重加权以实现策略熵稳定性），该方法通过批次内惊讶度分位数识别熵关键令牌子集，选择性重加权其有效优势，并引入目标熵闭环门控以实现稳定的熵调节。在1.5B至32B的模型规模以及三个任务族（短思维链、长思维链和多轮工具使用）上，STARE在数千步内维持稳定的RL训练，同时将策略熵保持在目标带内。在AIME24和AIME25上，STARE在平均准确率上比DAPO和其他竞争基线高出4%-8%，反思令牌和响应长度同步增长，表明持续探索-利用平衡进一步释放了RL训练潜力。代码可在https://github.com/xxxx获取。

英文摘要

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

URL PDF HTML ☆

赞 0 踩 0

2606.19328 2026-06-18 cs.LG cs.AI cs.RO 新提交

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

UBP2: 不确定性平衡的偏好规划用于高效基于偏好的强化学习

Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart

发表机构 * Learning, Embodied Autonomy, and Forecasting (LEAF) Lab, University of Toronto（多伦多大学学习、具身自主与预测（LEAF）实验室）

AI总结提出UBP2方法，通过联合推理奖励、动力学和值函数的不确定性来主动引导探索，在Meta-World基准上显著提高了样本效率。

详情

AI中文摘要

基于偏好的强化学习提供了一种从行为的成对比较中学习奖励模型的方法，绕过了显式奖励设计的需求。然而，现有方法通常依赖于被动数据收集，并且在学习的早期阶段样本效率低下。我们引入了一种基于模型的方法，通过联合推理奖励、动力学和值函数的不确定性来主动引导探索。我们的方法，不确定性平衡的偏好规划（UBP2），使用奖励、动力学和值函数模型的集成，根据结合了期望奖励、终值认知不确定性的统一评分来评估候选轨迹。在此目标下的规划产生了利用和信息获取之间的显式权衡，无需临时的探索启发式。在标准正则性假设下，我们为有限时域和无限时域设置建立了次线性遗憾保证。实验上，在Meta-World基准上的实验表明，UBP2比无模型的基于偏好的方法和非乐观的基于模型的基线方法实现了更高的样本效率。

英文摘要

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introduce a model-based approach that actively directs exploration by jointly reasoning over uncertainties in the reward, dynamics, and value functions. Our method, Uncertainty-Balanced Preference Planning (UBP2), uses ensembles of reward, dynamics, and value function models to evaluate candidate trajectories according to a unified score that combines expected reward, terminal value, and epistemic uncertainty. Planning under this objective yields an explicit tradeoff between exploitation and information acquisition without requiring ad hoc exploration heuristics. Under standard regularity assumptions, we establish sublinear regret guarantees for both finite-horizon and infinite-horizon settings. Empirically, experiments on the Meta-World benchmark show UBP2 achieves substantially higher sample efficiency than model-free preference-based methods and non-optimistic model-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.18438 2026-06-18 math.OC cs.LG 交叉投稿

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

基于学习优化的临时工顺序雇佣

Chris Lee, Xiuli Chao, Izak Duenyas

发表机构 * Department of Industrial and Operations Engineering, University of Michigan（工业与运营工程系，密歇根大学）； Ross School of Business, University of Michigan（罗斯商学院，密歇根大学）

AI总结针对临时工场景中工人产能和劳动力供给的不确定性，提出DR-UCB策略，通过学习周期顺序决策替换与雇佣，实现累积利润最大化，并证明其遗憾下界匹配。

详情

AI中文摘要

在本文中，我们研究了临时工场景下存在工人产能和劳动力供给不确定性的顺序劳动力管理问题。企业通过维持固定规模的活跃团队并随时间学习工人生产力，以最大化累积利润。我们强调该问题中的两个关键运营摩擦：替换工人成本高昂，且工人可能因先前工作承诺、日程限制或入职流程等原因无法立即雇佣。因此，雇佣决策仅在随机延迟后生效。我们将该问题建模为具有昂贵切换和延迟动作的随机多臂赌博机，并开发了一种基于学习的雇佣策略DR-UCB（延迟替换-UCB），该策略通过学习周期顺序做出替换和雇佣决策。在每个周期中，该策略使用实时生产数据确定何时启动劳动力变更以及替换和雇佣哪些工人。我们证明，所提策略的前沿遗憾在其对时间范围的依赖上匹配下界。数值实验表明，DR-UCB优于基准策略。

英文摘要

In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

URL PDF HTML ☆

赞 0 踩 0

2606.18514 2026-06-18 cs.RO cs.LG 交叉投稿

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

N(CO)$^2$: 基于机会约束的神经组合优化求解随机定向问题

Anas Saeed, Marcos Abel Zuzuárregui, Stefano Carpin

发表机构 * Department of Computer Science and Engineering, University of California, Merced（加州大学默塞德分校计算机科学与工程系）

AI总结提出N(CO)$^2$框架，结合强化学习求解随机定向问题，无需手工启发式，在不确定环境下优化路径选择，性能媲美MILP。

详情

Journal ref: In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), 2025

AI中文摘要

神经组合优化（NCO）通过学习启发式，为求解复杂图优化问题提供了一种有前景的替代传统启发式方法的方法。这类问题在自动化领域频繁出现，可用于建模多种应用。虽然NCO在确定性组合优化问题上已被广泛研究，但只有少数工作旨在解决随机组合优化问题。本文提出N(CO)$^2$：基于机会约束的神经组合优化，用于求解随机定向问题（SOP），无需手工设计的启发式。通过集成强化学习（RL）框架，模型在不确定性下优化路径选择，有效平衡探索与利用。实验结果表明，我们的方法在多种SOP实例上具有良好的泛化能力，与最先进的混合整数线性规划（MILP）相比性能具有竞争力。所提方法减少了启发式设计的人力投入，同时在不确定环境中实现自适应和高效的决策。

英文摘要

Neural combinatorial optimization (NCO) offers a promising alternative to traditional heuristic-based methods for solving complex graph optimization problems by proposing to learn heuristics through data. This class of problems frequently arises in automation, as it can be used to model a variety of applications. While NCO has been extensively studied for deterministic combinatorial optimization problems, there are only a few works that aim to solve stochastic combinatorial optimization problems. In this work, we present N(CO)$^2$: Neural Combinatorial Optimization with Chance cOnstraints to solve the Stochastic Orienteering Problem (SOP) without the use of hand-crafted heuristics. By integrating a reinforcement learning (RL) framework, the model optimizes path selection under uncertainty, effectively balancing exploration and exploitation. Empirical results demonstrate that our method generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP) for the task. The proposed approach reduces human effort in heuristic design while enabling adaptive and efficient decision-making in uncertain environments.

URL PDF HTML ☆

赞 0 踩 0

2606.18531 2026-06-18 stat.ML cs.LG 交叉投稿

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习？

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结本文研究离线强化学习中仅使用轨迹级结果（如累积回报或偏好）进行策略优化的统计理论，提出OPAC算法并证明其样本复杂度，同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情

AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析，然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置，其中目标仍是期望累积奖励，但每个离线轨迹仅提供一个标量标签，其条件均值是累积回报。我们提出OPAC，一种悲观演员-评论家算法，它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界，刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈，在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后，我们研究广义基于结果的离线强化学习，其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习：对于全成功目标，即使具有确定性转移和常数可集中性，任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域，这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失，在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制，以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

URL PDF HTML ☆

赞 0 踩 0

2606.18598 2026-06-18 cs.AI cs.LG 交叉投稿

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

在地质、需求和定价不确定性下优化锂生产决策：多目标决策的POMDP框架

Anna C. Edmonds, Mansur M. Arief, Robert J. Moss, Mykel J. Kochenderfer, Jef Caers

发表机构 * Computer Science Department, Stanford University（斯坦福大学计算机科学系）； Aeronautics and Astronautics Department, Stanford University（斯坦福大学航空与航天系）； Earth and Planetary Sciences Department, Stanford University（斯坦福大学地球与行星科学系）

AI总结提出POMDP框架，通过信念状态规划优化锂矿开采决策，动态适应价格不确定性，实现更高需求满足和更平衡的经济环境效益。

Comments 24 pages, 14 tables, 4 figures

详情

AI中文摘要

锂生产中的决策制定具有挑战性，无论是从投资者角度还是战略生产角度。决定开采哪些矿山以及何时开采，不仅涉及地质和价格不确定性，还涉及提取方法选择的复杂性，从直接锂提取到硬岩开采。先前的工作探索了该问题的模型和优化采矿决策的不同方法；这些模型没有考虑定价不确定性、需求不确定性或提取锂的不同采矿技术。将不同的定价模型和提取技术纳入这些模型，可以制定更稳健的策略，不仅决定何时何地开采矿山，还决定采用哪种生产方法。我们将问题表述为部分可观测马尔可夫决策过程（POMDP），并使用信念状态规划方法求解以获得最优决策。在我们的研究中，我们表明POMDP求解器通过信念状态规划和显式不确定性管理，动态适应变化的锂价格机制（静态、线性、指数和随机），优于人类启发式启发法。通过优化勘探、生产和技术选择的顺序，该框架在所有不同的定价和矿床情景下，在项目生命周期内实现了更高的需求满足和更平衡的经济环境结果。

英文摘要

Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.19069 2026-06-18 eess.SY cs.LG cs.SY 交叉投稿

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

面向弹性信息物理系统的无模型强化学习控制

Hugo O. Garcés, Alejandro J. Rojas, Bernardo A. Hernández, Andrés Escalona, Jonathan M. Palma, Md. Rezwan Parvez, Bhushan Gopaluni, Sirish L. Shah

发表机构 * Departmento de Ingenier\'ia El\'ectrica, Universidad de Concepci\'on, Concepci\'on, Chile (e-mail: ) ； Department of Electrical \& Computer Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: ) ； Department of Chemical ； Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada ( ) ； Department of Chemical \& Materials Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: )

AI总结本文比较了无模型控制器在非线性系统遭受网络攻击（虚假数据注入和拒绝服务攻击）下的性能，分析了四种强化学习奖励类型，发现Lyapunov奖励在低跟踪误差下弹性最佳，指数奖励在中等训练条件下提供良好折衷，渐进和线性奖励收敛快但鲁棒性差。

Comments Accepted to the 23rd IFAC World Congress 2026

详情

AI中文摘要

本文比较了无模型控制器在遭受网络攻击（包括虚假数据注入和拒绝服务攻击）的非线性系统上的性能。分析了四种强化学习奖励类型的准确性、成本和弹性。结果表明，Lyapunov奖励在低跟踪误差下提供最佳弹性。指数模式在中等训练条件下也提供了良好的折衷，具有可接受的弹性。渐进和线性奖励收敛更快，但鲁棒性较差。强化学习模型预测控制器（RL-MPC）表现出强稳态弹性，但需要更长的训练时间；强化学习比例-积分-微分控制器（RL-PID）更快，训练时间显著减少。近端策略优化（PPO）优于深度确定性策略梯度（DDPG），关键绩效指标（KPI）方差显著降低。本研究旨在强调精心设计的强化学习奖励如何提高性能和对网络威胁的弹性。

英文摘要

This paper compares the performance of model-free controllers on a nonlinear system under cyberattacks, including false data injection and denial-of-service attacks. Four RL reward types are analyzed for accuracy, cost, and resilience. Results show that the Lyapunov reward offers the best resilience with low tracking error. Exponential mode also provides good trade-offs with acceptable resilience under moderate training conditions. Progressive and linear rewards converge faster but are less robust. RL-MPCs show strong steady-state resilience but require longer training times; RL-PID controllers are faster with significantly less training time. Proximal Policy Optimization outperforms Deep Deterministic Policy Gradient with a significant reduction in KPI variance. This study serves to highlight how well-designed RL rewards can improve performance and resilience against cyber threats.

URL PDF HTML ☆

赞 0 踩 0

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)（软件竞争力中心哈根贝格）； Institut für Strömungsmechanik und Wärmeübertragung, TU Wien（流体力学与传热研究所，维也纳技术大学）； CERBSim GmbH（CERBSim公司）

AI总结提出基于强化学习的自适应优化算法，通过代理模型和演员-评论家策略评估的MCMC方法，冻结部分参数以降低维度，加速气动外形优化，并在简单流体动力学问题上验证了特征重要性解释能力。

详情

AI中文摘要

我们引入了一种基于强化学习（RL）的自适应优化算法，用于气动外形优化，重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法，允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量，并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化，如果（a）参数必须驻留的局部邻域足够大，能够与网格大小的步长及其大量模拟相竞争，并且（b）对这些邻域所需的奖励和成本估计足够准确，以实现良好的逐步参数自适应，则可以加速全局优化。我们给出了一个简单流体动力学问题的例子，在该问题上，该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

URL PDF HTML ☆

赞 0 踩 0

2604.03208 2026-06-18 cs.LG 版本更新

Hierarchical Planning with Latent World Models

基于潜在世界模型的分层规划

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

发表机构 * FAIR at Meta（Meta旗下的FAIR）； New York University（纽约大学）； Mila - Québec AI Institute（魁北克AI研究院）； Brown University（布朗大学）

AI总结提出HWM架构，通过多时间尺度潜在世界模型和潜在匹配实现分层模型预测控制，解决长时域任务中单层规划失败和计算爆炸问题。

详情

AI中文摘要

世界模型是通过规划实现零样本具身控制的一条有前景的路径。然而，现有的世界模型规划器在长时域、多阶段任务中面临困难：预测误差累积，且朴素搜索的复杂度随规划时域呈指数增长。分层方法通过将任务分解为更短、可处理的子问题来缓解这两个问题；然而，先前的分层方法要么将控制摊销为任务特定的策略（分层强化学习），要么假设低维状态和已知动力学（经典分层MPC）。我们提出了基于潜在世界模型的分层规划（HWM），这是一种直接在仅通过下一潜在预测训练的视觉世界模型上进行分层模型预测控制（MPC）的架构和规划范式。HWM在共享潜在空间内学习多个时间尺度的世界模型，因此长时域模型的预测通过潜在匹配作为短时域模型的子目标，无需任务特定的奖励、技能学习或分层策略。为了保持长时域搜索的可处理性，HWM学习了一个动作编码器，将原始动作块压缩为潜在宏动作。在真实世界的Franka操作中，HWM从单个目标图像中完成拾取和放置的成功率为70%，而单层规划的成功率为0%。在模拟的推操作和迷宫导航任务中，HWM在长时域任务上持续提升性能，同时所需规划计算量最多减少3倍。

英文摘要

World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

URL PDF HTML ☆

赞 0 踩 0

2605.22142 2026-06-18 cs.LG cs.AI 版本更新

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移：在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

AI总结本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题，提出了一种基于神经符号价值决策的方法，通过在长期插入前决定保留或丢弃观察到的三元组，从而提升记忆效率，并在RoomKG基准测试中优于符号和神经基线方法。

详情

AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息，但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程，将其建模为一个神经符号价值决策问题：对于每个观察到的三元组，智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区，我们采用了一种每项Q学习设计，使用共享参数和实际的时间差分更新，跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中，学习到的转移决策优于符号和神经基线，包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中，一个轻量级的本地短期-only变体表现最佳，且在步骤层面行为显示，策略保留导航和查询相关的事实，同时丢弃低价值的候选事实，支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.12808 2026-06-18 cs.LG cs.AI 版本更新

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

SymQNet: 低延迟自适应哈密顿量学习的摊销获取

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出SymQNet，一种摊销强化学习方法，通过离线学习后验条件获取策略，在线快速前向传播，显著降低自适应哈密顿量学习的获取延迟。

详情

AI中文摘要

自适应哈密顿量学习对于校准和表征量子设备至关重要。在自适应控制器中，选择下一个实验本身就是一个计算。贝叶斯设计规则在每次后验更新后重新计算，这一步可能需要几秒钟。在数百次试验中，这些秒数成为自适应性的显著墙钟成本。我们引入SymQNet，一种用于低延迟自适应哈密顿量学习的摊销强化学习方法。SymQNet离线学习后验条件获取策略，然后在线使用快速策略前向传播，同时保留贝叶斯后验反馈。在横向场伊辛基准测试中，相对于有界Fisher信息搜索和有界两步贝叶斯主动学习（BALD），SymQNet显著降低了获取延迟。在五量子比特时，相对于这些在线基线，它仅获取决策延迟降低了$47.1\ imes$和$72.6\ imes$；在十二量子比特时，SymQNet的完整模拟步骤需要$1.02$秒，而有界两步BALD需要$13.27$秒。总体而言，我们表明学习获取可以使自适应哈密顿量学习对于重复的低延迟工作负载变得实用。

英文摘要

Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

URL PDF HTML ☆

赞 0 踩 0

2511.00802 2026-06-18 cs.SE cs.CL cs.LG 版本更新

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker: 使用代码修改型LLM代理的自动离线策略评估优化

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

发表机构 * Michigan Technological University, Houghton（密歇根技术大学）； Birmingham City University（伯明翰城市大学）； University of British Columbia, Kelowna（不列颠哥伦比亚大学, 肯洛纳）

AI总结提出GrowthHacker基准，利用LLM代理自动迭代修改代码以优化离线策略评估（OPE）实现，在Open Bandit Pipeline和Scope-RL上评估多种框架，证明基于LLM的代理可作为自动增长黑客持续改进OPE系统。

Comments Accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情

DOI: 10.1145/3815588

AI中文摘要

随着数据驱动开发的广泛采用，在线A/B测试已成为衡量新技术效果的既定方法。然而，部署在线实验需要设计、实现和部署资源，并可能对用户产生负面影响（例如，不安全或不道德的结果），同时需要数周的数据收集。为了解决这一问题，离线策略评估（OPE）或离线A/B测试这一日益增长的研究领域，使用先前收集的日志数据离线评估新技术。OPE也是强化学习中的一个基本问题，在在线测试昂贵或风险高的领域（如医疗保健、推荐系统、教育和机器人技术）中非常重要。尽管代码生成大语言模型（LLM）和代理工作流取得了进展，但关于LLM和基于LLM的代理是否以及如何自动优化OPE实现，我们知之甚少。我们提出了GrowthHacker，这是一个基准测试，用于在大规模公共数据集上评估基线LLM和基于LLM的代理。GrowthHacker自主迭代修改代码，运行OPE，并使用指标指导后续优化。我们在Open Bandit Pipeline（OBP）和Scope-RL上评估方法，并开发了一个双代理框架，该框架解决了现有框架的局限性，同时降低了复杂性。在两个库中，双代理显示出最高的可靠性（98.1%-100%成功率）和正向结果率（78%），正向结果的中位改进为4.4%；CrewAI实现了最高的平均改进（37.9%），并且是唯一没有极端值失败的框架。AutoGen和Default各达到65%的正向结果率。这些结果证明了使用基于LLM的代理作为自动“增长黑客”持续改进OPE系统的可行性，对在手动优化成本高昂的情况下扩展数据驱动决策具有重要意义。

英文摘要

With data-driven development now widely adopted, online A/B testing is an established method for measuring the effects of new technologies. However, deploying online experiments demands resources for design, implementation, and deployment, and may negatively impact users (e.g., unsafe or unethical outcomes) while requiring weeks of data collection. To address this, the growing research area of off-policy evaluation (OPE), or offline A/B testing, assesses new technologies offline using previously collected logged data. OPE is also a fundamental problem in reinforcement learning and is important where online testing is expensive or risky, such as healthcare, recommender systems, education, and robotics. Despite advances in code-generation large language models (LLMs) and agentic workflows, little is known about whether and how LLMs and LLM-based agents can automatically optimize OPE implementations. We propose GrowthHacker, a benchmark that evaluates baseline LLMs and LLM-based agents on large-scale public datasets. GrowthHacker autonomously and iteratively modifies code, runs OPE, and uses the metrics to guide subsequent optimization. We evaluate methods on Open Bandit Pipeline (OBP) and Scope-RL, and develop a two_agent framework that addresses limitations of existing frameworks while reducing complexity. Across both libraries, two_agent shows the highest reliability (98.1%-100% success rate) and positive-outcome rate (78%), with a median improvement of 4.4% among positive outcomes; CrewAI achieves the highest average improvement (37.9%) and is the only framework with zero extreme-value failures. AutoGen and Default each reach 65% positive-outcome rates. These results establish the feasibility of using LLM-based agents as automated "growth hackers" to continuously improve OPE systems, with implications for scaling data-driven decision-making where manual optimization is expensive.

URL PDF HTML ☆

赞 0 踩 0

2606.18509 2026-06-18 cs.LG stat.ML 新提交

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

概念调制模型：可识别性与外推的统一框架

Soheun Yi, Yizhou Lu, Chandler Squires, Pradeep Ravikumar

发表机构 * Department of Statistics and Data Science, Carnegie Mellon University（卡内基梅隆大学统计与数据科学系）； Machine Learning Department, Carnegie Mellon University（卡内基梅隆大学机器学习系）

AI总结提出概念调制模型（CMMs），通过属性势统一条件潜变量模型的可识别性与外推分析，将基于转移的可识别性提升至条件设置，并导出代数外推准则。

详情

AI中文摘要

条件潜变量模型中的可靠泛化需要理解可识别性和外推：观测属性间的变化如何决定潜在结构，以及该结构如何决定未见属性上的分布。然而，现有的可识别性和外推保证大多是模型特定的，在非线性ICA、因果表示学习、扰动建模及相关条件潜变量模型中分别进行分析。我们引入概念调制模型（CMMs），这是一类属性索引的条件生成模型，其结构为$A\to \Lambda \to C\to X$，其中属性选择调制器，调制器诱导潜在概念法则，概念生成观测特征。CMMs通过展示观测属性上的特征一致性诱导受CMM类约束的潜在概念转移，将基于转移的可识别性提升至条件设置。我们通过属性势（属性条件概念法则之间的对数密度比）表达这些约束，将通用提升步骤与模型特定的刚性论证分离。相同的势控制外推：当且仅当传输的属性势恒等式扩展到这些属性时，未见属性上的一致性成立。这导出了代数外推准则，识别出几个现有可识别性和外推结果背后的共同基于势的证明对象，并且当与这些工作中的模型特定刚性论证结合时，恢复了它们所述的结论。

英文摘要

Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to Λ\to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

URL PDF HTML ☆

赞 0 踩 0

2606.18898 2026-06-18 cs.LG 新提交

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

基于潜在随机微分方程的稀疏不规则多元时间序列异常检测

Martin Uray, Dominik Geng, Florian Graf, Stefan Huber, Roland Kwitt

发表机构 * Josef Ressel Centre for Intelligent and Secure Industrial Automation, University of Applied Sciences, Salzburg, Austria（约瑟夫·雷斯尔智能与安全工业自动化中心，应用科学大学，萨尔茨堡，奥地利）； University of Salzburg, Austria（萨尔茨堡大学，奥地利）

AI总结针对现实世界中稀疏、不规则采样的多元时间序列，提出基于潜在随机微分方程的生成方法，将观测投影到连续时间随机动力系统，处理缺失和不规则采样，并捕获循环行为，在六个基准数据集上取得最优结果。

Comments Preprint

详情

AI中文摘要

多元时间序列异常检测（MTSAD）在工业监控、网络安全或医疗保健等广泛应用领域至关重要。现实世界的数据通常是稀疏的、不规则采样的或部分观测的，但现有方法假设时间序列均匀采样。我们提出了一种基于潜在随机微分方程的生成方法，将观测到的时间序列投影到一个连续时间随机动力系统上，能够直接处理缺失观测和不规则采样，同时自然捕获许多现实世界用例固有的可能循环行为。在六个异常基准数据集上的实验表明，我们提出的方法在现有最先进基线中排名第一。我们进一步证明，在严重数据稀疏性下，我们的方法保持鲁棒性，而测试的基线方法性能显著下降。这些结果突显了潜在随机微分方程作为多元时间序列异常检测的自然归纳偏置，尤其是在存在现实世界不规则性的情况下。

英文摘要

Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

URL PDF HTML ☆

赞 0 踩 0

2606.18997 2026-06-18 cs.LG 新提交

DIPHINE: Diffusion-based $Φ$-ID Neural Estimator

DIPHINE: 基于扩散的 $\Phi$ID 神经估计器

Simon Pedro Galeano Munoz, Mustapha Bounoua, Giulio Franzese, Pietro Michiardi, Maurizio Filippone

发表机构 * KAUST（卡塔尔科学与技术部）； EURECOM（欧雷康）

AI总结提出首个基于扩散模型的神经估计器 DIPHINE，用于计算连续非高斯动力系统的集成信息分解（$\Phi$ID），通过单个摊销网络联合估计所有互信息项，并利用 Möbius 逆变换恢复十六个原子。

详情

AI中文摘要

揭示真实世界复杂系统的真实信息架构需要厘清其组件如何随时间独特存储、冗余共享和协同整合信息。集成信息分解（$\Phi$ID）是一个框架，用于将多变量系统的信息动态分解为十六个非重叠原子，这些原子表征冗余、独特和协同的信息存储、传输和整合模式。现有的计算 $\Phi$ID 的方法仅限于高斯或离散系统，阻碍了其在连续非高斯动力系统中的应用。我们通过提出 DIPHINE（基于扩散的 $\Phi$ID 神经估计器）来解决这一限制，这是首个利用基于分数的扩散模型从单个摊销网络中联合估计 $\Phi$ID 所需的所有互信息项的神经估计器，并通过 Möbius 逆变换恢复十六个原子。我们提供了通过逆变换的误差传播的理论分析，表明从互信息到原子的映射的雅可比矩阵是整数值的，并且协同到协同原子被证明是最难估计的。我们在合成基准上展示了准确恢复真实原子，与已建立的互信息估计器相比具有优越性能，并在涉及真实数据的应用中无需任何分布假设即可提取生理上可解释的信息动态结构。

英文摘要

Uncovering the true informational architecture of real-world complex systems requires disentangling how their components uniquely store, redundantly share, and synergistically integrate information over time. Integrated Information Decomposition ($Φ$ID) is a framework for decomposing the information dynamics of multivariate systems into sixteen non-overlapping atoms that characterize redundant, unique, and synergistic modes of information storage, transfer, and integration. Existing methods to compute $Φ$ID are restricted to Gaussian or discrete systems, preventing its application to continuous non-Gaussian dynamical systems. We address this limitation by proposing DIPHINE (Diffusion-based $Φ$-ID Neural Estimator), the first neural estimator that leverages score-based diffusion models to jointly estimate all the mutual information terms required by $Φ$ID from a single amortized network, recovering the sixteen atoms through Möbius inversion. We provide a theoretical analysis of error propagation through the inversion, showing that the Jacobian of the mapping from mutual informations to atoms is integer-valued and that the synergy-to-synergy atom is provably the hardest to estimate. We demonstrate accurate recovery of ground-truth atoms on synthetic benchmarks, superior performance compared to established mutual information estimators, and the ability to extract physiologically interpretable information-dynamic structure on an application involving real data without any distributional assumptions.

URL PDF HTML ☆

赞 0 踩 0

2606.19162 2026-06-18 cs.LG cs.CV 新提交

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

奖励一直就在你的数据中：用判别器引导的强化学习纠正流匹配

Nicolas Beltran-Velez, Felix Friedrich, Zhang Xiaofeng, Reyhane Askari-Hemmat, Xiaochuang Han, Adriana Romero-Soriano, Michal Drozdzal

发表机构 * FAIR at Meta（Meta FAIR）； Columbia University（哥伦比亚大学）； McGill University（麦吉尔大学）； Canada CIFAR AI Chair（加拿大CIFAR人工智能主席）

AI总结针对流匹配模型因损失函数与样本质量不匹配导致的视觉缺陷，提出判别器引导的强化学习（DRL），利用预训练空间中判别器的logit作为奖励，显著提升无引导FID和语义FD，并改善偏好对齐。

Comments 84 pages, including appendices

详情

AI中文摘要

得分匹配和流匹配模型通常依赖基于偏好的强化学习来实现两个目的：与主观偏好对齐，以及令人惊讶地恢复视觉真实性和连贯对象结构等属性——而这些属性本应通过匹配训练从数据本身学习。我们认为这反映了结构上的不匹配。匹配损失衡量训练时边缘分布下速度或得分场的$\ell_2$回归误差，这一代理指标与决定推理时样本质量的视觉和语义属性对齐不良。给定一个与这些属性对齐的奖励，强化学习通过评估模型自身生成的样本并直接遵循奖励景观来规避不匹配。挑战在于如何在不依赖人类偏好的情况下获得这样的奖励，因为人类偏好昂贵且会将数据真实性与标注者倾向混为一谈。我们提出判别器引导的强化学习（DRL）。DRL训练一个判别器，在预训练表示空间中区分数据样本和基础模型样本，并将其logit作为KL正则化强化学习中的奖励。预训练空间将判别器限制在感知有意义的方向上，而logit估计数据与模型之间的对数似然比，这是针对数据分布的最优奖励。在SiT、JiT、REPA和RAE上，DRL降低了无引导FID（例如，SiT上从9.38降至2.62）和语义空间FD（例如，SiT上DINOv3从88.2降至19.3），在所有骨干网络上均有一致提升，并且在没有经过偏好奖励训练的情况下改善了人类偏好奖励。在后续基于偏好的后训练中，DRL还在偏好奖励与图像保真度之间产生了更好的帕累托前沿，在提高对齐度的同时减少了过饱和和过亮等低级伪影。

英文摘要

Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure $\ell_2$ regression error on the velocity or score field under training-time marginals, a proxy poorly aligned with the visual and semantic properties that determine sample quality at inference. Given a reward aligned with these properties, RL sidesteps the mismatch by evaluating the model on its own samples and following the reward landscape directly. The challenge is to obtain such a reward without relying on human preferences, which are expensive and conflate data realism with annotator inclinations. We propose Discriminator-Guided RL (DRL). DRL trains a discriminator to separate data from base-model samples in a pretrained representation space and uses its logit as the reward in KL-regularized RL. The pretrained space restricts the discriminator to perceptually meaningful directions, and the logit estimates the log-likelihood ratio between data and model, which is the optimal reward for targeting the data distribution. Across SiT, JiT, REPA, and RAE, DRL reduces guidance-free FID (e.g., $9.38 \to 2.62$ on SiT) and semantic-space FD (e.g., $88.2 \to 19.3$ on DINOv3 for SiT), with consistent gains across all backbones, and improves human-preference rewards without training on them. It also yields a better Pareto frontier between preference reward and image fidelity under subsequent preference-based post-training, increasing alignment while reducing low-level artifacts such as oversaturation and excessive brightness.

URL PDF HTML ☆

赞 0 踩 0

2606.19264 2026-06-18 cs.LG cs.CL 新提交

Structured Inference with Large Language Gibbs

大语言吉布斯结构化推理

Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

AI总结提出大语言吉布斯方法，利用大语言模型的条件分布作为转移算子进行结构化概率推理，通过迭代重采样变量避免顺序偏差，在合成分布、一致性推理和贝叶斯结构学习中验证有效性。

Comments Code: https://github.com/hyeok9855/large-language-gibbs

详情

AI中文摘要

大型语言模型（LLMs）中编码的知识可以作为描述复杂世界变量的结构化推理的基础，但以概率一致的方式访问这些知识构成了一个困难的推理问题。我们提出了大语言吉布斯，一种结构化概率推理方案，它使用LLM的条件分布作为转移算子。不是通过单次自回归生成来采样结构化对象，而是利用LLM的下一个标记条件分布，在给定其他变量的条件下迭代地重采样单个变量。这种方法避免了顺序依赖偏差，并产生一个反映所有局部条件分布之间折衷的平稳分布。我们将这种方法应用于从合成分布中采样、一致性推理任务和贝叶斯结构学习。结果表明，在通过噪声LLM条件分布可访问的世界先验下，MCMC中使用LLM条件分布是用于结构化概率推理的一次性生成的实际替代方案。

英文摘要

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilistic inference that uses conditional distributions of an LLM as transition operators. Rather than sampling structured objects through single-pass autoregressive generation, we iteratively resample individual variables conditioned on others using an LLM's next-token conditionals. This approach avoids order-dependent biases and produces a stationary distribution that reflects a compromise between all local conditionals. We apply this approach to sampling from synthetic distributions, consistent reasoning tasks, and Bayesian structure learning. The results suggest that the use of LLM conditionals in MCMC is a practical alternative to one-pass generation for structured probabilistic inference under a world prior accessible through noisy LLM conditionals.

URL PDF HTML ☆

赞 0 踩 0

2606.19315 2026-06-18 cs.LG 新提交

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Diffusion-Proof：超越自回归生成的正式定理证明配方

Ruida Wang, Rui Pan, Pengcheng Wang, Shizhe Diao, Tong Zhang

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； NVIDIA（英伟达）

AI总结提出Diffusion-Proof框架，首次将扩散语言模型应用于形式定理证明，通过全证明生成和局部校正方法，在ProofNet和MiniF2F上分别提升1.61%和6.14%，并解决了一个DeepSeek-Prover-V2-7B无法解决的IMO问题。

详情

AI中文摘要

近年来，增强大型语言模型（LLMs）的形式数学推理能力已成为数学和计算机科学社区的关键焦点。虽然在使用最先进的自回归（AR）LLMs进行形式定理证明方面取得了显著进展，但这些模型存在固有局限性。它们的下一个词预测生成方法可能因长程连贯性挑战和长序列错误累积而导致次优性能。最近，扩散LLMs（dLLMs）通过多词块的迭代去噪生成文本，提供了一种有前景的替代方案。然而，dLLMs在形式数学中的应用（其中保持长程连贯性至关重要）仍然研究不足。为解决上述挑战，我们提出了**Diffusion-Proof**，据我们所知，这是第一个训练和应用dLLMs进行形式定理证明的框架。我们的框架包含两种模型的训练和推理方法。第一个是*dLLM-Prover-7B*，它执行具有长程连贯策略使用的全证明写作。第二个是*dLLM-Corrector-7B*，这是一种新颖的大块扩散校正模型。它利用dLLMs的填充能力，使用双向信息进行局部证明校正。大量实验表明，**Diffusion-Proof**相对显著优于在同一数据集上训练的AR LLM基线。与基线相比，**Diffusion-Proof**在ProofNet-Test和MiniF2F-Test基准上分别实现了**1.61%**和**6.14%**的绝对提升。值得注意的是，**Diffusion-Proof**成功解决了一个更先进的思考模型DeepSeek-Prover-V2-7B无法解决的IMO问题，展示了dLLMs在形式定理证明中的独特优势。

英文摘要

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progress has been made in using state-of-the-art Auto-Regressive (AR) LLMs for formal theorem proving, these models suffer from inherent limitations. Their next-token prediction generation methods may yield suboptimal performance due to the challenges of long-range coherence and the compounding of errors over long sequences. Recent advancements in diffusion LLMs (dLLMs), which generate text through iterative denoising of a multi-token block, offer a promising alternative. However, the application of dLLMs to formal mathematics, where maintaining long-range coherence is critical, remains largely understudied. To address the challenges above, we propose **Diffusion-Proof**, to the best of our knowledge, the first framework to train and apply dLLMs for formal theorem proving. Our frameworks contain training and inference methods for two models. The first one is *dLLM-Prover-7B*, which performs whole-proof writing with long-range coherent tactic usage. The second one is *dLLM-Corrector-7B*, which is a novel large block diffusion-based correction model. It leverages the in-filling capabilities of dLLMs to perform local proof correction using bi-directional information. Extensive experiments demonstrate that **Diffusion-Proof** relatively significantly outperforms the AR LLM baseline trained under the same dataset. **Diffusion-Proof** achieves an absolute improvement of **1.61%** on ProofNet-Test and **6.14%** on MiniF2F-Test benchmarks compare to the baseline. Notably, **Diffusion-Proof** successfully resolves one IMO problem that more advanced thinking model DeepSeek-Prover-V2-7B could not solve, showcasing the unique advantage of dLLMs in formal theorem proving.

URL PDF HTML ☆

赞 0 踩 0

2606.18290 2026-06-18 cond-mat.stat-mech cs.LG eess.SP 交叉投稿

Stochastic Thermodynamics and SDE-based Generative Models

随机热力学与基于SDE的生成模型

Yaowen Zhang

发表机构 * GitHub

AI总结本文在随机热力学框架下，为基于SDE的生成模型（如扩散模型和薛定谔桥）定义了轨迹层面的功、热和熵产生，并推广了Jarzynski恒等式和类第二定律不等式。

2606.18354 2026-06-18 eess.IV cs.LG 交叉投稿

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University（Fairleigh Dickinson大学）； University of Colorado at Colorado Springs（科罗拉多州立大学）

AI总结针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题，本文扩展Med-DDPM条件扩散模型，以解剖分割掩膜为条件生成3D结构MRI，实验表明合成数据训练的模型Dice分数与真实数据相当，混合数据训练则显著提升性能。

详情

DOI: 10.1109/MIPR67560.2025.00037
Journal ref: 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像，为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而，由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化，合成阿尔茨海默病（AD）的高质量结构MRI数据仍然具有挑战性。在本文中，我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展，以生成专门针对AD的3D结构MRI。我们采用Med-DDPM，因为与其他生成模型相比，它具有稳定的结构和保真度，特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件，将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型，系统评估了合成图像的质量和实用性。实验结果表明，仅在合成数据上训练的分割模型达到了与真实数据训练（0.6513）相当的Dice分数（0.6532），同时召回率显著提高。值得注意的是，在混合数据集（混合真实和合成图像）上训练的模型优于真实和纯合成基线，Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用，并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

URL PDF HTML ☆

赞 0 踩 0

2606.18790 2026-06-18 cs.SD cs.AI cs.LG 交叉投稿

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

闭环：用于符号音乐生成中可解释激活引导的PID反馈控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics and Business（雅典经济与商业大学）； Orfium Research（Orfium 研究）； Hellenic Mediterranean University（希腊地中海大学）； Archimedes / Athena Research Center（阿基米德/雅典娜研究中心）

AI总结提出基于PID反馈控制的推理时激活引导框架，通过差分均值法提取音高和时长潜在方向，并利用Gram-Schmidt正交化解耦多属性引导，实现符号音乐生成中细粒度、可解释的属性调制。

Comments Accepted at Learning to Listen: ICML 2026 Workshop on Machine Learning for Audio (43rd International Conference on Machine Learning - ICMLMLA26), 4 pages main (11 total), 2 figures

详情

AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展，但在实现对离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer（MMT）的机制可解释性，并提出了一种无需重新训练即可通过推理时激活引导实现确定性属性调制的框架。利用差分均值（DiffMean）方法，我们在残差流中分离出信号属性（特别是音高和时长）的潜在方向。我们验证了该领域的线性表示假设，实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题，我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明，与朴素向量加法相比，这种几何解耦减少了概念干扰和信号退化，即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

URL PDF HTML ☆

赞 0 踩 0

2606.18856 2026-06-18 cs.CL cs.LG 交叉投稿

Approximate Structured Diffusion for Sequence Labelling

近似结构化扩散用于序列标注

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh

发表机构 * Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN（巴黎北大学 Sorbonne、法国国家科学研究中心、巴黎北信息学实验室、LIPN）

AI总结提出一种基于扩散的条件随机场（CRF）训练方法，通过引入标签噪声条件来捕捉长距离依赖，结合近似推理在词性标注任务上实现16.5%的错误率降低。

2606.19005 2026-06-18 cs.CL cs.LG 交叉投稿

离散扩散模型的维度无关收敛性：伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

发表机构 * Department of Mathematics（数学系）； Oden Institute School of Data Science and Society（数据科学与社会学院）； UCLA（加州大学洛杉矶分校）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； UNC Chapel Hill（北卡罗来纳大学教堂山分校）； Computational and Applied Sciences Group（计算与应用科学组）； Department of Mathematics and Statistics（数学与统计学系）； SRI International（SRI国际）； University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结本文提出了一种基于伴随方程的统一框架，实现了任何积分概率度量（IPM）下的维度无关收敛保证，克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情

AI中文摘要

离散扩散已成为生成建模中的领先框架，广泛应用于语言、视觉和生物学等领域。然而，现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散，而总变差（TV）的界依赖于状态空间大小S，并在现代语言任务中变得无效，因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架，建立了任何积分概率度量（IPM）下的维度无关收敛保证。到目前为止，我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是，我们的理论仅依赖于一个标准的速率矩阵正则性假设，并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进：通过伴随方程在可观测空间中工作而不是直接处理概率测度，一种产生任何IPM界正则性分析，一种耦合论证在均匀转移下去除S依赖性，以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此，我们的框架与先前分析显著不同，并避免了路径空间-KL和现有TV方法的不足。除了收敛界外，我们的框架还提供了一种灵活的工具包，用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.30920 2026-06-18 cs.LG 版本更新

Unsupervised Diffusion Solver for Combinatorial Optimization via Combinatorial Adjoint Matching

通过组合伴随匹配实现组合优化的无监督扩散求解器

Shengyu Feng, Tarun Suresh, Yiming Yang

发表机构 * Language Technologies Institute, Carnegie Mellon University（卡内基梅隆大学语言技术研究所）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出组合伴随匹配（CAM）框架，利用离散伴随动力学和随机控制公式，实现无监督训练离散扩散求解器，在多种组合优化问题上达到与监督方法竞争的性能。

Comments ICML26

详情

AI中文摘要

基于扩散的神经求解器在组合优化（CO）中显示出强大潜力，但现有方法通常依赖于使用大量近最优解进行监督训练。在这项工作中，我们将基于伴随的轨迹优化方法扩展到离散组合域。我们将基于扩散的CO表述为连续时间马尔可夫链上的随机控制问题，并引入离散伴随动力学，用于通过离散生成轨迹传播优化信号。基于这一表述，我们提出了组合伴随匹配（CAM），一种用于离散扩散求解器的无监督训练框架，具有结构化和低方差的轨迹级优化信号。实验上，CAM在多种组合优化问题上始终优于现有的无监督扩散基线，并与强大的监督扩散求解器甚至传统求解器性能相当。我们的代码可在 https://github.com/Shengyu-Feng/CAM 获取。

英文摘要

Diffusion-based neural solvers have shown strong promise for combinatorial optimization (CO), but existing methods typically rely on supervised training with large collections of near-optimal solutions. In this work, we extend adjoint-based trajectory optimization methods to discrete combinatorial domains. We formulate diffusion-based CO as a stochastic control problem over Continuous-Time Markov Chains and introduce discrete adjoint dynamics for propagating optimization signals through discrete generative trajectories. Building on this formulation, we propose Combinatorial Adjoint Matching (CAM), an unsupervised training framework for discrete diffusion solvers with structured and low-variance trajectory-level optimization signals. Empirically, CAM consistently outperforms existing unsupervised diffusion baselines and achieves performance competitive with strong supervised diffusion solvers and even traditional solvers across diverse combinatorial optimization problems. Our code is available at https://github.com/Shengyu-Feng/CAM.

URL PDF HTML ☆

赞 0 踩 0

2606.10466 2026-06-18 cs.LG cs.AI 版本更新

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

UPLOTS: 一种用于约束时间序列生成的统一预训练语言模型

Du Yin, Hao Xue, Jinliang Deng, Yang Yang, Shuang Ao, Arian Prabowo, Flora Salim

发表机构 * University of New South Wales（新南威尔士大学）； HKUST(GZ)（香港科技大学（广州））； BUAA（北京航空航天大学）

AI总结提出UPLOTS，一种基于统一预训练语言模型和提示引导的框架，通过动态多数据集损失重加权和提示到模式映射，实现跨领域约束时间序列生成，在四个基准上验证了其泛化性和数据增强效果。

详情

AI中文摘要

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

发表机构 * Arakne S.r.l.（阿拉克内公司）

AI总结提出三角参考薛定谔桥框架，通过区间冻结的退化扩散参考和层次化潜在波动率结构，实现时间序列的保守生成，并保持熵最小化的变分核心。

详情

AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥（TR-SBTS），这是SBTS框架的一种保守扩展，其中布朗参考被替换为区间冻结的、可能退化的扩散参考，在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影，变分约束在时间和潜在水平上联合施加，并通过相对熵的分解层次展开。SBTS的变分核心得以保留：熵最小化器是参考的h-变换，在每个冻结区间上，最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式，即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现，该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核，以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤，其中每个潜在水平为上一水平产生动态参考，并由协方差描述符总结；该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the $h$-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form $b^\star(t,x)=A\,\nabla\log H(t,x)$, intrinsic to the active covariance directions when the frozen covariance $A$ is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.28690 2026-06-18 quant-ph cs.LG 版本更新

Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

潜在条件参数化量子电路作为量子态分布的通用近似器

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited（Fujitsu 研究所量子实验室， Fujitsu 有限公司）

AI总结提出潜在条件参数化量子电路（LPQC），通过经典神经网络将潜在变量映射到量子电路参数，证明其在1-Wasserstein距离下是密度算子概率测度的通用近似器，并引入多模态潜在先验和专家混合电路架构缓解贫瘠高原问题。

Comments 21 pages, 11 figures (fix the proof and update appendix for barren plateaus analysis)

详情

AI中文摘要

量子模拟、量子化学和量子机器学习中的许多应用不仅需要单个量子态，还需要表征目标系统异质性的量子态系综。在变分和容错设置中，逐个状态地准备这样的系综是不可行的，这激发了生成式建模方法。我们引入了潜在条件参数化量子电路（LPQC），这是一种混合量子-经典框架，其中经典神经网络将从先验分布中采样的潜在变量映射到参数化量子电路的参数。我们证明了LPQC在1-Wasserstein距离下是密度算子概率测度的通用近似器，将经典通用近似定理扩展到量子分布设置。我们还引入了多模态潜在先验和专家混合电路架构，并表明它在优化过程中经验性地缓解了贫瘠高原问题。数值实验在合成多簇混合量子态系综和QM9衍生的3D分子结构系综上验证了该框架。在这些任务中，LPQC优于最近的量子生成基线，同时与典型的经典基线相比，在输出维度大幅降低的情况下保持竞争力。通过利用潜在空间中的经典表达能力，LPQC为量子生成建模提供了一条可行的途径。

英文摘要

Many applications in quantum simulation, quantum chemistry, and quantum machine learning require not a single quantum state but an ensemble of states characterizing the heterogeneity of a target system. Preparing such ensembles state-by-state is prohibitive in both variational and fault-tolerant settings, thereby motivating a generative modeling approach. We introduce latent-conditioned parameterized quantum circuits (LPQCs), a hybrid quantum-classical framework in which classical neural networks map a latent variable sampled from a prior distribution to the parameters of a parameterized quantum circuit. We prove that LPQCs are universal approximators for probability measures over density operators in the 1-Wasserstein distance, extending classical universal approximation theorems to the quantum-distribution setting. We additionally introduce a multimodal latent prior and a mixture-of-experts circuit architecture, and show empirically that the latent-conditioned parameterization alleviates the barren plateau problem during optimization, a behavior for which we provide rigorous partial guarantees. Numerical experiments validate the framework on a synthetic multi-cluster ensemble of mixed quantum states and on a QM9-derived ensemble of 3-D molecular structures. In these tasks, LPQC outperforms recent quantum generative baselines and matches the generation quality of a classical neural-network baseline, while requiring an output dimension that grows only linearly with the number of qubits rather than exponentially. By leveraging classical expressivity in the latent space, LPQCs offer a tractable route to quantum generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.17491 2026-06-18 stat.ML cs.LG stat.ME 版本更新

测量噪声限制了非线性模型在生物医学预测中相对于线性模型的优势

Marc-Andre Schulz, Kerstin Ritter

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen（赫蒂人工智能脑健康研究所，图宾根大学）； Tübingen AI Center, University of Tübingen（图宾根人工智能中心，图宾根大学）； Department of Psychiatry and Neurosciences, Charité – Universitätsmedizin Berlin（精神病学与神经科学系，柏林夏里特医学院）； Bernstein Center for Computational Neuroscience, Berlin（伯恩斯坦计算神经科学中心，柏林）； German Center for Mental Health (DZPG), partner site Tübingen（德国心理健康中心（DZPG），图宾根合作站点）

AI总结本文指出，在生物医学表格数据中，测量噪声会削弱非线性结构，导致非线性模型与线性模型性能相当，并提出了一个精确的超额风险恒等式，揭示了测量可靠性、样本量和特征表示三个条件必须同时满足才能体现非线性优势。

详情

AI中文摘要

在生物医学表格数据上，诸如深度网络、梯度提升树和核方法等灵活模型，在给定相同特征的情况下，反复被线性回归和逻辑回归匹配或击败。通常的反应是将其视为模型方面的不足，需要通过更多数据、更好的架构或调参来修复，假设非线性结构存在而模型未能捕捉到。我们认为，当限制因素是测量而非模型时（这在生物医学中经常发生），这些修复无法奏效。加性噪声模糊了群体最优预测器，并且由于模糊在去除函数的广泛形状之前先去除精细、快速变化的细节，它比线性结构更快地抹去非线性结构。一个k阶交互作用被特征可靠性的k次幂衰减，而线性部分只衰减一次。在生物医学测量典型的可靠性下，即使底层生物学是强非线性的，非线性优势也可能消失，并且噪声所移除的部分无法通过更大的队列或更灵活的模型恢复，只能通过更好的测量。非线性是隐藏的，而非缺失，线性模型与灵活模型之间的平局本身并不能对生物学做出定论。这些片段是经典的，来自测量误差统计、心理测量学和高斯分析，我们将它们组合成一个精确的超额风险恒等式。测量可靠性是与样本量和特征表示并列的三个条件之一，必须对齐才能使灵活模型发挥作用，而它们共同只留下一个狭窄的窗口，大多数生物医学任务落在此窗口之外。在140个英国生物银行任务中，灵活模型与线性模型之间的差距（如果存在）带有预测的噪声特征，并且这三个条件可以通过干预而非仅通过基准测试来分离。

英文摘要

On biomedical tabular data, flexible models such as deep networks, gradient-boosted trees, and kernel methods are repeatedly matched or beaten by linear and logistic regression given the same features. The usual reaction is to treat this as a model-side shortfall, to be fixed with more data, a better architecture, or tuning, on the assumption that the nonlinear structure is there and the model has failed to capture it. We argue that these fixes cannot help when the binding limit is the measurement rather than the model, as it frequently is in biomedicine. Additive noise blurs the population-optimal predictor, and because blurring removes a function's fine, rapidly varying detail before its broad shape, it erases nonlinear structure faster than linear structure. A degree-$k$ interaction is attenuated by the $k$-th power of feature reliability, while the linear part is attenuated only once. At the reliabilities typical of biomedical measurement, the nonlinear advantage can vanish even when the underlying biology is strongly nonlinear, and what the noise removes cannot be recovered by a larger cohort or a more flexible model, only by better measurement. The nonlinearity is hidden, not absent, and a tie between linear and flexible models is not by itself a verdict on the biology. These pieces are classical, drawn from measurement-error statistics, psychometrics, and Gaussian analysis, and we assemble them into an exact excess-risk identity. Measurement reliability is one of three conditions, alongside sample size and feature representation, that must align for a flexible model to help, and together they leave only a narrow window that most biomedical tasks fall outside. Across 140 UK Biobank tasks, the gap between flexible and linear models, where it exists, carries the predicted noise signature, and the three conditions can be separated by intervention but not by a benchmark alone.

URL PDF HTML ☆

赞 0 踩 0

2606.18465 2026-06-18 cs.LG cs.AI 新提交

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

权重范数在Grokking中控制什么？交叉熵下的对数尺度中介作用

Truong Xuan Khanh

发表机构 * H&K Research Studio, Clevix LLC

AI总结本文通过固定权重范数并改变输出温度，发现Grokking延迟主要由对数尺度（logit scale）决定，权重范数仅通过影响对数尺度间接起作用。

Comments 16 papges, 10 tables and 4 figures. Code and data to reproduce all numbers, tables, and figures: https://github.com/ClevixLab/grokking-logit-scale

详情

AI中文摘要

Grokking，即从记忆到泛化的延迟跳跃，通常与权重范数相关：范数越小，泛化越早。我们探究范数实际控制什么。通过钳位固定权重范数并仅改变输出温度，我们在交叉熵下将Grokking延迟滑动到其整个范数诱导范围；将有效对数尺度匹配回基线可恢复两个模数下约85%的延迟。在范数和温度的网格上，延迟仅由对数尺度决定（R2 = 0.97），范数仅额外贡献1-2%。该效应依赖于损失函数：在均方误差下，对数尺度被固定，范数通过不同路径起作用。记忆控制、float64 softmax崩溃审计和无LayerNorm的Transformer均指向同一通道。从同一状态分叉，延迟遵循钳位的范数值而非钳位操作本身，这排除了重缩放伪影。近端变量是对数尺度及其驱动的softmax饱和；权重范数仅是上游手柄。所有数字、表格和图表均可从发布的代码和数据中复现。

英文摘要

Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying only an output temperature, we slide the grokking delay across its entire norm-induced range under cross-entropy; matching the effective logit scale back to baseline recovers about 85% of the delay at two moduli. Across a grid of norms and temperatures the delay collapses onto the logit scale alone (R2 = 0.97), with the norm adding 1-2% beyond it. The effect is loss-dependent: under mean-squared error the logit scale is pinned and the norm acts through a different route. A memorization control, a float64 softmax-collapse audit, and a no-LayerNorm transformer point to the same channel. Forking arms from one identical state, the delay follows the held norm value and not the clamp operation, which closes a rescaling-artifact concern. The proximal variable is the logit scale and the softmax saturation it drives; the weight norm is only an upstream handle. All numbers, tables, and figures reproduce from released code and data.

URL PDF HTML ☆

赞 0 踩 0

2606.18538 2026-06-18 cs.LG stat.ML 新提交

Effects of sparsity and superposition on loss in simple autoencoders

稀疏性与叠加对简单自编码器损失的影响

Mriganka Basu Roy Chowdhury, Eric McLaughlin Weiner

发表机构 * Department of Statistics, UC Berkeley（伯克利大学统计学系）； Department of Materials Science, UC Berkeley（伯克利大学材料科学系）

AI总结研究神经网络中多语义性源于叠加现象，通过数学分析稀疏输入下自编码器的L2重构损失上下界，验证并扩展了Elhage等人的实证结果。

Comments 16 pages, 3 figures

详情

AI中文摘要

神经网络机械可解释性的主要困难之一是出现多语义性，即每个神经元通常负责多个不同任务，阻碍了对其功能的清晰解释。Elhage等人（2022）的开创性论文认为，这是由于叠加现象，即神经网络将不同特征表示为低维空间中的非正交方向，这种策略可以在不牺牲保真度的情况下实现更大的数据压缩，因为输入向量具有特征稀疏性。Elhage等人（2022）在一个相当自然且简单的具有稀疏输入的自编码器中实证验证了这些假设。本文的贡献在于分析叠加现象发生和最优性的数学基础，同时严格证实了他们的一些发现。特别地，我们为幂激活函数提供了L2重构损失的上界和下界，在非常稀疏的情况下是紧的。文末还包含一个简短的开放问题列表。

英文摘要

One of the major difficulties in the mechanistic interpretability of neural networks is the occurrence of polysemanticity, which suggests that each neuron is typically responsible for multiple different tasks, impeding a clean interpretation of their function. The seminal paper of Elhage et al. (2022) argues that this occurs due to superposition, a phenomenon where the neural network represents distinct features as non-orthogonal directions in a lower-dimensional space, a strategy that allows much greater compression of the data without sacrificing fidelity due to the feature sparsity of input vectors. Elhage et al. (2022) empirically validates these hypotheses in a rather natural and simple autoencoder with sparse inputs. The contribution of the present work is to analyze the mathematical basis for the occurrence and optimality of superposition, while rigorously corroborating some of their findings. In particular, we provide upper and lower bounds for the L2 reconstruction loss, tight in the very sparse regime, for power activation functions. A short list of interesting open problems are also included at the end.

URL PDF HTML ☆

赞 0 踩 0

2606.18778 2026-06-18 cs.LG stat.ML 新提交

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

漂移与腐败下基于潜在簇几何的在线分布预测

Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan

发表机构 * Indian Institute of Technology, Bombay（印度理工学院，孟买）

AI总结针对非平稳流中的在线分布预测问题，提出一种基于潜在簇几何的吉布斯准后验方法，通过可逆跳跃MCMC采样变维后验，并引入重启变体应对漂移，在亚线性腐败预算和运输代价下实现亚线性Wasserstein遗憾。

详情

AI中文摘要

非平稳流中的在线学习通常被表述为跟踪点估计，但许多应用需要预测完整的数据生成分布。我们研究漂移和对抗性腐败下的在线分布预测。我们的方法通过潜在簇几何表示每个候选律：一个可变大小的中心配置，组织概率质量并诱导预测分布。这些配置上的吉布斯准后验通过后验平均产生在线预测器，所得变维后验可通过可逆跳跃MCMC采样。因此，该方法避免了指定参数化流律，同时保留了用于不确定性、正则化和比较的结构化潜在空间。我们通过累积Wasserstein-1遗憾相对于时变真实律来评估性能。分析分离了两种效应：腐败扰动基于损失的后验更新，而漂移使长时域后验记忆过时。我们通过一个重启变体来解决后者，该变体在时间上局部化相同的准贝叶斯更新。所得的高概率界分解为PAC-Bayesian复杂度项、腐败敏感的后验扰动项以及由$A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)$驱动的动态最优传输项。在有界支撑、稳定潜在几何、预测映射正则性、预言可实现性、局部化重启窗口、亚线性传输作用和亚线性腐败预算下，重启预测器实现了亚线性累积Wasserstein遗憾。这些保证不需要对流、漂移机制或腐败过程进行参数化建模。

英文摘要

Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distributional prediction under drift and adversarial corruption. Our approach represents each candidate law through a latent cluster geometry: a variable-size configuration of centers that organizes probability mass and induces a predictive distribution. A Gibbs quasi-posterior over these configurations yields an online predictor by posterior averaging, and the resulting variable-dimensional posterior can be sampled with reversible-jump MCMC. The method therefore avoids specifying a parametric streaming law while retaining a structured latent space for uncertainty, regularization, and comparison. We evaluate performance by cumulative Wasserstein-1 regret against the time-varying true law. The analysis separates two effects: corruption perturbs the loss-based posterior update, whereas drift makes long-horizon posterior memory stale. We address the latter with a restarted variant that temporally localizes the same quasi-Bayesian update. The resulting high-probability bounds decompose into a PAC-Bayesian complexity term, a corruption-sensitive posterior perturbation term, and a dynamic optimal-transport term driven by $A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)$. Under bounded support, stable latent geometry, predictive-map regularity, oracle realizability, localized restart windows, sublinear transport action, and sublinear corruption budget, the restarted predictor achieves sublinear cumulative Wasserstein regret. These guarantees require no parametric model for the stream, drift mechanism, or corruption process.

URL PDF HTML ☆

赞 0 踩 0

2606.18834 2026-06-18 cs.LG 新提交

Identifying Structural Biases from Causal Mechanism Shifts

从因果机制变化中识别结构性偏差

Praharsh Nanavati, Jilles Vreeken, David Kaltenpoth

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍茨信息安全中心）

AI总结提出利用环境间机制变化识别隐藏混淆和选择偏差，基于互信息构建可检验准则，并设计StruBI算法，在合成和真实数据上显著优于现有方法。

详情

AI中文摘要

因果发现方法通常假设所有数据独立同分布（i.i.d.），且系统中没有未测量的变量影响。在实践中，这些假设经常被违反，导致推断不准确。在本文中，我们研究如何从因果机制变化中识别隐藏混淆和选择偏差。特别地，我们表明结构性偏差会导致依赖的机制变化。也就是说，通过考虑在不同环境下的数据中哪些变量的机制发生了变化，我们可以判断哪些变量是无偏的，哪些受到隐藏混淆的影响，哪些正在经历选择偏差。我们将此形式化为一个基于互信息的经验可检验准则，并展示在哪些条件下它能识别结构性偏差。为了判断哪些节点受到何种偏差的影响，我们引入了StruBI算法。在合成和真实数据上的实验表明，StruBI在实践中表现良好，准确恢复了受影响的变量集和偏差类型，以较大优势超越了现有技术水平。

英文摘要

Causal discovery methods commonly assume that all data is independently and identically distributed (i.i.d.) and that there are no unmeasured variables affecting the system. In practice, these assumptions are often violated, leading to inaccurate inference. In this paper, we study how to identify hidden confounding and selection biases from causal mechanism shifts. In particular, we show that structural biases lead to dependent mechanism shifts. That is, by considering for which variables the mechanisms change given data from different environments, we can tell which variables are unbiased, which are subject to hidden confounding, and which are undergoing selection bias. We formalize this into an empirically testable criterion based on mutual information, and show under which conditions it identifies structural biases. To tell which nodes are subject to what kind of bias, we introduce the StruBI algorithm. Experiments on synthetic and real-world data show that StruBI works well in practice, accurately recovering affected variable sets and types of biases, outperforming the state-of-the-art by a wide margin.

URL PDF HTML ☆

赞 0 踩 0

2606.18918 2026-06-18 cs.LG cs.CC 新提交

Some Complexity Results for Robustness Verification for Binarized Neural Networks

二值化神经网络鲁棒性验证的一些复杂性结果

Harshit Goyal, Sudakshina Dutta

发表机构 * Indian Institute of Technology Goa（印度理工学院Goa）

AI总结本文通过从布尔可满足性问题归约证明二值化神经网络的可满足性是NP完全的，并利用均匀遮挡导致的网络输出分段常数结构，提出多项式时间鲁棒性检查算法。

2606.19036 2026-06-18 cs.LG 新提交

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

稀疏混合专家模型中不连续性的几何与随机分析

Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen

发表机构 * Department of Mathematics, National University of Singapore, Singapore（新加坡国立大学数学系）； Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, Ho Chi Minh City, Vietnam（胡志明市技术大学计算机科学与工程学院）

AI总结本文对稀疏混合专家模型中的不连续性进行几何与随机分析，分类不连续阶数，建立渐近体积估计，证明随机路径几乎必然击中一阶不连续，并提出低开销平滑机制以提升性能。

Comments ICML 2026 Spotlight

详情

AI中文摘要

稀疏混合专家（SMoE）架构现已广泛应用于最先进的语言和视觉模型中，其中条件路由允许扩展到非常大的网络。然而，正是这种Top-$k$专家选择使得条件路由成为可能，同时也导致SMoE映射本质上不连续。在这些不连续曲面附近，即使任意接近的输入也可能激活截然不同的专家集，从而产生显著不同的输出。本文对这些不连续性进行了严格的几何和随机分析。首先，我们根据切换事件中并列专家的数量对不连续性进行阶数分类。利用测度论切片论证，我们建立了加厚不连续曲面的渐近体积估计，表明低阶不连续集占主导地位，而高阶不连续集占据的体积相对极小。接着，通过扩散过程对输入空间中的随机扰动建模，我们证明路径最终会遇到不连续，并且首次击中几乎必然发生在阶数为1的不连续上，同时给出了显式的有限时间概率界。我们进一步推导了占据时间界，量化了随机路径在每个不连续阶数邻域内停留的时长。这些理论结果表明输入更可能位于低阶不连续附近。受此启发，我们提出一种简单的平滑机制，可直接应用于现有SMoE，在接近不连续处软性地整合专家；我们的分析保证增加的额外计算开销很小，同时在不连续附近提供局部平滑，跨语言和视觉任务的实验表明，平滑不仅增强了SMoE映射的连续性，还提升了经验性能。

英文摘要

Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that enables conditional routing also renders the SMoE map inherently discontinuous. In the vicinity of these discontinuity surfaces, even inputs that are arbitrarily close may activate substantially different sets of experts resulting in significantly different outputs. In this work we give a rigorous geometric and stochastic analysis of these discontinuities. We first classify them by order, determined by the number of tied experts at a switching event. Using measure-theoretic slicing arguments, we establish asymptotic volume estimates for the thickened discontinuity surfaces, showing that lower-order discontinuity sets dominate, whereas higher-order ones occupy a vanishingly small relative volume. Next, modeling random perturbations in the input space via a diffusion process, we prove that the path eventually encounter a discontinuity, and moreover that the first hit almost surely occurs on an order-1 discontinuity with explicit finite-time probability bounds. We further derive occupation-time bounds that quantify the duration the random path spend in the neighborhoods of each discontinuity order. These theoretical results imply that inputs are more likely to lie near lower order discontinuities. Motivated by this insight, we propose a simple smoothing mechanism that can be directly applied to existing SMoEs, softly incorporating experts near discontinuities; our analysis guarantees that the added computational overhead remains small while providing localized smoothing near discontinuities, and experiments across language and vision tasks show that smoothing not only enforces continuity of the SMoE map but also enhances empirical performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19105 2026-06-18 cs.LG stat.ML 新提交

Smoothness-Based Derandomization of PAC-Bayes Bounds

基于光滑性的PAC-Bayes去随机化

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering（计算机科学与软件工程系）； Université Laval（拉瓦尔大学）

AI总结利用损失和预测器的光滑性，将Gibbs预测器去随机化为后验均值处的确定性预测器，通过Jensen间隙类的Rademacher复杂度控制泛化界，并导出涉及参数雅可比和海森矩阵的正则化器。

详情

AI中文摘要

我们研究光滑损失函数的PAC-Bayes去随机化。我们的目标是通过利用损失和预测器类的光滑性，获得对确定性预测器以高概率成立的泛化界。我们表明，从Gibbs预测器到后验均值处的确定性预测器的转换有一个精确的代价，由Jensen间隙类的泛化间隙给出。我们通过其Rademacher复杂度控制该类，从而得到涉及以参数雅可比和得分图的海森矩阵表示的平坦度量的确定性预测器界。该框架适用于有界和无界光滑损失函数，并将结果专门应用于线性预测器和光滑神经网络。最后，理论中出现的雅可比和海森矩阵量激发了一个实用的正则化器。对于BatchNorm网络，我们通过将BatchNorm变换折叠到相邻的仿射权重中，相对于有效的BatchNorm权重计算该正则化器。在CIFAR-10上的实验说明了该正则化器在不同批量大小下的行为。

英文摘要

We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.

URL PDF HTML ☆

赞 0 踩 0

2606.19145 2026-06-18 cs.LG cs.AI cs.SY eess.SY 新提交

OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical Systems

OrthoReg：混合符号-神经动力系统的正交正则化

Till Richter, Niki Kilbertus

发表机构 * Technical University of Munich（慕尼黑工业大学）； Helmholtz Munich（亥姆霍兹慕尼黑中心）

AI总结针对混合建模中神经部分可能重复学习符号结构导致模型冗余的问题，提出正交正则化方法OrthoReg，直接惩罚符号与神经组件间的重叠，实现互补分解，提升符号恢复和分布外行为。

详情

AI中文摘要

动力系统是建模自然世界的基础，然而建模过程中存在持续的权衡：手动指定的机械模型设计上可解释但通常过于简单且设定错误；相反，灵活的数据驱动神经方法缺乏物理洞察。混合建模旨在通过结合指定的或基于符号的物理组件与灵活的神经网络来兼顾两者优势。然而，一个关键挑战是神经组件可能重新学习机械部分，产生冗余且不可解释的模型，特别是当符号结构本身是从数据中发现时。基于标准$L^2$正则化的现有方法依赖于投影论证，但当符号组件通过稀疏发现学习时，该论证失效，允许神经增强与符号结构重叠。我们引入\textbf{OrthoReg}（正交正则化），直接惩罚符号与神经组件之间的重叠，防止符号结构被神经残差吸收。这产生互补分解：符号部分捕捉库能表达的内容，神经部分捕捉剩余内容。在存在部分库不匹配的基准动力系统上，OrthoReg改善了符号恢复和分布外行为。

英文摘要

Dynamical systems are fundamental to modeling the natural world, yet modeling them involves a persistent trade-off: manually prescribed mechanistic models are interpretable by design but often overly simplistic and misspecified; in contrast, flexible data-driven neural methods lack physical insight. Hybrid modeling aims for the best of both worlds by combining a prescribed or symbolic, physics-based component with a flexible neural network. A critical challenge, however, is that the neural component may relearn mechanistic parts, yielding redundant and uninterpretable models, especially when the symbolic structure itself is discovered from data. Existing methods based on standard $L^2$ regularization rely on a projection argument that breaks when the symbolic component is learned through sparse discovery, allowing the neural augmentation to overlap with symbolic structure. We introduce \textbf{OrthoReg} (Orthogonal Regularization), which directly penalizes overlap between the symbolic and neural components, preventing symbolic structure from being absorbed by the neural residual. This yields a complementary decomposition: the symbolic part captures what the library can express, and the neural part captures what remains. On benchmark dynamical systems with partial library mismatch, OrthoReg improves symbolic recovery and out-of-distribution behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.19179 2026-06-18 cs.LG cs.AI math.OC stat.ML 新提交

学习增强的精确指数时间算法

Tatiana Belova, Yuriy Dementiev, Danil Sagunov

发表机构 * ITMO University（ITMO大学）

AI总结提出一种通用方法，利用略优于随机猜测的噪声预测器，可证明地减少NP难子集选择问题的搜索空间，运行时间加速随预测质量平滑扩展，且仅需预测的成对独立性或无需知道预测器精度。

详情

AI中文摘要

学习增强算法领域已经证明，机器学习预测可以在广泛的问题中绕过最坏情况下的下界。然而，到目前为止，关注点几乎完全集中在多项式时间算法上，其中预测改进了竞争比、近似保证或运行时间。在本文中，我们提出了一个问题：预测能否推动NP难问题的精确指数时间算法的前沿？我们通过提出一种通用方法对此问题给出肯定回答，该方法增强了一整类用于各种子集选择问题的最先进精确算法。我们表明，一个仅略优于随机猜测的噪声预测器足以可证明地减少搜索空间，并且由此产生的运行时间加速随预测质量平滑扩展。重要的是，我们的算法仅需要预测的成对独立性，或者，不需要知道预测器的精度——这两种设置都比通常假设的更弱且更现实。

英文摘要

The field of learning-augmented algorithms has demonstrated that machine-learned predictions can bypass worst-case lower bounds across a wide range of problems. So far, however, the focus has been almost exclusively on polynomial-time algorithms, where predictions improve competitive ratios, approximation guarantees, or running times. In this paper, we raise the question of whether predictions can push the frontier of exact exponential-time algorithms for NP-hard problems. We answer this question affirmatively by proposing a general approach that augments an entire family of state-of-the-art exact algorithms for a variety of subset selection problems. We show that a noisy predictor that is only marginally better than random guessing suffices to provably reduce the search space, and that the resulting runtime speedup scales smoothly with the prediction quality. Importantly, our algorithms require only pairwise independence of predictions or, alternatively, do not require the knowledge of the predictor's accuracy - both strictly weaker and more realistic settings than typically assumed.

URL PDF HTML ☆

赞 0 踩 0

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 交叉投稿

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结提出一种对估计误差更鲁棒的序列条件独立性检验方法，通过自适应优化核条件独立性统计量、归一化及截断平移校准，在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情

AI中文摘要

检验条件独立性是基础但本质上困难的问题：在没有额外假设的情况下，通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差，但现有的序列条件独立性检验通常要求精确知道Model-X条件分布，这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法，对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量，并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀，同时在高维合成基准和现实世界公平性任务中保持了高功效，优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

URL PDF HTML ☆

赞 0 踩 0

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 交叉投稿

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结针对分布值结果，提出基于Wasserstein重心和效用泛函的策略学习框架，使用IPW和DR估计器，证明遗憾率由策略类复杂度主导，并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情

AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略（个体化治疗规则），作为从协变量到治疗的映射，以最大化定义为标量值潜在结果均值的经验福利。在本文中，我们研究具有分布值结果的离线策略学习，其中每个潜在结果是$\mathbb{R}$上的概率测度，奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权（IPW）和双稳健（DR）估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差，我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下，并在所述正则条件下，主导遗憾率仍由策略类复杂度控制。此外，我们提供了一个极小化下界，建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

可扩展的批量贝叶斯优化：基于子空间采集函数

Dawei Zhan, Zhaoxi Zeng, Shuoxiao Wei, Ping Wu

发表机构 * School of Computing and Artificial Intelligence（计算与人工智能学院）

AI总结提出通过从原始问题的轴对齐子空间中各选一点来扩展贝叶斯优化至大规模批量评估，显著加速收敛，与十种批量算法相比极具竞争力。

详情

DOI: 10.1145/3820495
Journal ref: ACM Transactions on Evolutionary Learning and Optimization, 2026

AI中文摘要

将贝叶斯优化扩展到批量评估可以使设计者充分利用并行计算技术。然而，当前大多数批量方法在批量大小增大时扩展性不佳，优化效率往往下降。为解决此问题，本文提出一种简单高效的方法，将贝叶斯优化扩展到大规模批量评估。与现有批量方法不同，新方法的思想是从原始问题中抽取一批轴对齐子空间，并使用现有采集函数从每个子空间中选择一个点。数值实验表明，与顺序贝叶斯优化算法相比，我们提出的方法显著加速收敛，并且与十种批量贝叶斯优化算法相比表现非常有竞争力。我们提出的方法的实现可在此 https URL 获取。

英文摘要

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their optimization efficiencies often deteriorate as the batch size increases. To address this issue, we propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation in this work. Different from existing batch approaches, the idea of the new approach is to draw a batch of axis-aligned subspaces of the original problem and select one point from each subspace using existing acquisition functions. Numerical experiments show that our proposed approach speedups the convergence significantly when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with ten batch Bayesian optimization algorithms. The implementation of our proposed approach is available at https://github.com/zhandawei/SubSpace_Acquisition_Functions.

URL PDF HTML ☆

赞 0 踩 0

2506.08764 2026-06-18 cs.LG 版本更新

On the Stability of the Jacobian Matrix in Deep Neural Networks

深度神经网络中雅可比矩阵的稳定性

Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

AI总结本文利用随机矩阵理论，建立了深度神经网络中雅可比矩阵谱稳定性的通用定理，适用于稀疏和非独立同分布权重，扩展了初始化方案的理论基础。

Comments 21 pages, 28 figures; the main theorem was wrong (again) and is now corrected

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251（波尔多大学 CNRS，波尔多 INP，IMB，UMR 5251）

AI总结提出一种无需超参数调优的随机梯度自适应步长策略，利用一阶随机Oracle的局部几何信息，理论证明收敛性，实验与调优基线竞争。

2602.14789 2026-06-18 cs.LG stat.ML 版本更新

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性：超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍兹信息安全中心）

AI总结研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响，推导了多元设置下稳定振荡的精确条件，并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情

AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如，梯度下降（GD）的稳定解对应于平坦极小值，而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性，但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明，GD可能在线性不稳定的极小值附近稳定振荡，并在步长衰减后收敛，这表明线性分析可能具有误导性。在这项工作中，我们明确研究了非线性项的影响。具体而言，我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数，推广了现有结果。将分析扩展到随机梯度下降（SGD），我们表明即使单个批次不稳定，非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定，而非线性分析所暗示的平均效应。最后，我们证明如果所有批次都是线性稳定的，则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

URL PDF HTML ☆

赞 0 踩 0

2605.04267 2026-06-18 cs.LG cs.NE math.OC 版本更新

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

QUIVER: 代理辅助多目标进化优化中的成本自适应偏好查询

Florian A. D. Burnat

发表机构 * University of Warwick（沃里克大学）； Warwick Business School（沃里克商学院）

AI总结提出QUIVER方法，通过自适应选择目标评估与异质偏好查询（成对偏好陈述与无差异调整），在代理辅助多目标优化中最小化决策遗憾，实验显示在WFG难题上效用遗憾降低25%。

Comments Accepted at Genetic and Evolutionary Computation Conference (GECCO '26)

详情

DOI: 10.1145/3795095.3805174

AI中文摘要

交互式多目标优化系统面临预算分配困境：资源可用于昂贵的目标评估，或用于引出决策者偏好以识别帕累托集的相关区域。此外，偏好引出本身跨越具有不同信息内容和认知负担的模态，从廉价、嘈杂的成对偏好陈述（PS）到更丰富但成本更高的无差异调整（IA）。我们研究了未知标量化下的成本感知优化，并引入了QUIVER（查询信息价值估计遗憾），这是一种代理辅助的进化多目标优化器，可自适应地在目标评估和异质偏好查询之间进行选择。在每一步，QUIVER通过最大化每单位总成本的预期决策质量改进来选择下一个动作。在合成决策者模型下的DTLZ和WFG基准测试中，QUIVER在具有挑战性的WFG问题上实现了最低的最终效用遗憾（WFG4上效用遗憾为2.14，WFG9上为2.82：比基线提高25%），优于所有单模态基线。我们分析了PS和IA的最优混合如何适应问题难度：在简单问题（DTLZ2）上，QUIVER选择80%的PS查询；在困难问题（WFG9）上，它转向35%的IA查询。这种自适应模态选择展示了成本感知偏好学习的实际应用。

英文摘要

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA). We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

URL PDF HTML ☆

赞 0 踩 0

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

CODEBLOCK: 学习在正确的粒度上监督代码

Zhijie Deng, Ling Li, Jinlong Pang, Kaiqin Hu, Qi Xuan, Zhaowei Zhu, Jiaheng Wei

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； UC Santa Cruz（加州大学圣克鲁兹分校）； Ant Group（蚂蚁集团）； BAIA, ZJUT（浙江工业大学智能信息处理实验室）； D5Data.ai

AI总结提出CodeBlock框架，通过选择结构完整的代码块而非孤立token进行稀疏监督，在仅使用1.9%监督token的情况下，在六个代码生成基准上取得优于全token微调的效果。

详情

AI中文摘要

代码大语言模型的监督微调通常对所有响应token应用统一的交叉熵损失，隐含假设每个token提供同等有用的学习信号。最近的token级选择方法通过仅监督高价值token挑战了自然语言SFT中的这一假设。然而，直接将token级掩码迁移到代码可能会破坏语法和语义连贯的程序单元，因为代码依赖于结构完整性和定义-使用关系。因此，我们提出CodeBlock，一个结构感知的稀疏监督框架，选择结构完整的代码证据而非孤立token。CodeBlock首先选择高质量的指令-响应对，然后将代码响应划分为语法连贯的编码项，通过聚合核心逻辑token上的广义交叉熵来估计其效用，并使用数据流可达性和桥接信号重新排序，以优先传播或连接重要程序依赖的块。在训练期间，完整响应仍作为上下文可用，但损失仅应用于选定的代码项和信息性自然语言token。在六个代码生成基准上的实验表明，CodeBlock在仅使用1.9%的监督响应token的情况下，实现了比全tokenSFT和竞争性选择基线更强的平均pass@1。

英文摘要

Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge this assumption in natural-language SFT by supervising only high-value tokens. However, directly transferring token-level masking to code can break syntactically and semantically coherent program units, because code depends on structural completeness and definition-use relations. We therefore propose CodeBlock, a structure-aware sparse supervision framework that selects structure-complete code evidence rather than isolated tokens. CodeBlock first selects high-quality instruction-response pairs, then partitions code responses into syntactically coherent coding items, estimates their utility by aggregating generalized cross-entropy over core logic tokens, and reranks them with data-flow reach and bridge signals to prioritize blocks that propagate or connect important program dependencies. During training, the full response remains available as context, while loss is applied only to selected code items and informative natural-language tokens. Experiments on six code-generation benchmarks show that CodeBlock achieves stronger average pass@1 than full-token SFT and competitive selection baselines, while using only 1.9% of supervised response tokens.

URL PDF HTML ☆

赞 0 踩 0

2606.18304 2026-06-18 cs.LG cs.AI 新提交

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

基于归因引导和覆盖最大化的结构MoE剪枝

Yifu Ding, Jiacheng Wang, Ge Yang, Yongcheng Jing, Jinyang Guo, Xianglong Liu, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Beihang University（北京航空航天大学计算机科学与工程学院）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Nanyang Technological University（南洋理工大学）

AI总结针对MoE模型专家级剪枝粒度粗、冗余识别不足的问题，提出基于归因引导和覆盖最大化的结构剪枝框架，将剪枝分配转化为通道分数覆盖优化问题，在50%剪枝率下结合4位量化保持精度，内存减少5.27倍。

Comments 9 pages, 5 figures. Submitted to ICML 2026

详情

AI中文摘要

混合专家（MoE）模型在计算上高效扩展，但由于其巨大的内存占用和推理开销，部署成本仍然很高。先前的压缩方法主要在专家级别操作，要么移除整个专家，要么通过粗粒度的重要性分数对专家进行排序。然而，这种专家级别的决策通常过于粗糙，无法捕捉细粒度的冗余，导致剪枝预算分配不当和压缩效果有限。为了解决这个问题，我们观察到MoE专家内的信息高度集中在一小部分通道中，即使在被认为重要的专家中也存在大量冗余。基于这一观察，我们提出了一种针对MoE模型量身定制的结构剪枝框架。我们的方法将剪枝比例分配重新表述为通道分数覆盖最大化问题，并使用基于归因的近似方法高效求解。在DeepSeek和Qwen MoE模型上的实验表明，我们的方法在结合4位量化时，在50%或25%的结构化剪枝下仍能保持模型精度。在Qwen3-30B-A3B上，我们的方法将内存占用减少了5.27倍，并在各种基准测试中持续优于最先进的基线方法。

英文摘要

Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.18431 2026-06-18 cs.LG cs.DC 新提交

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

超越预测：面向LLM推理的尾延迟感知调度

Yueying Li, Yuanfan Chen, Jiayang Chen, Esha Choukse, Haoran Qiu, G. Edward Suh, Rodrigo Fonseca, Ziv Scully, Udit Gupta

发表机构 * Cornell University, Computer Science Department（康奈尔大学计算机科学系）； Cornell University, Electrical and Computer Engineering Department（康奈尔大学电气与计算机工程系）； Cornell University, Operations Research and Information Engineering Department（康奈尔大学运筹学与信息工程系）； Microsoft Azure System Research（微软Azure系统研究）； NVIDIA Corporation（英伟达公司）

AI总结针对LLM推理中长度预测调度在分布偏移和尾延迟控制上的脆弱性，提出无预测的分布感知调度框架，通过轻量统计信号实现软优先级提升，结合缓存感知抢占，在多种工作负载下将P99 TTLT降低35-50%，TTFT降低34-47%。

详情

Journal ref: Forty-Third International Conference on Machine Learning (2026)

AI中文摘要

LLM服务表现出极端的长度可变性，使得基于大小的调度在实践中变得困难。最近的LLM调度器使用预测的解码长度或排名来近似SJF/SRPT，并主要报告均值中心指标如TTFT和TBT。我们表明，这些预测驱动的策略在分布偏移、突发到达和GPU内存压力下可能脆弱，同时对主导用户体验的尾延迟（P90-P99）控制有限，即使拥有完美的解码长度知识。我们引入了一个分布感知、无预测的调度框架，用由轻量统计信号驱动的软优先级提升取代显式长度预测。我们的设计协同优化调度和缓存感知抢占，以考虑跨工作负载混合的内存耦合解码动态。在生产环境和开源轨迹上的评估表明，相对于具有完美长度知识的SRPT，我们的方法将P99 TTLT降低了高达35-50%，并在各种工作负载（包括推理密集型和聊天密集型任务）上将TTFT降低了34-47%。这些结果证明了在在线LLM服务中优化尾延迟的稳健替代方案。

英文摘要

LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such as TTFT and TBT. We show that these prediction-driven policies can be fragile under distribution shifts, bursty arrivals, and GPU memory pressure, while offering limited control over the tail latency (P90-P99) that dominates user experience, even with perfect decode-length knowledge. We introduce a distribution-aware, prediction-free scheduling framework that replaces explicit length prediction with soft priority boosting driven by lightweight statistical signals. Our design co-optimizes scheduling and cache-aware preemption to account for memory-coupled decode dynamics across workload mixes. Evaluated on production and open-source traces, our method reduces P99 TTLT by up to 35-50% relative to SRPT with perfect length knowledge and reduces TTFT by 34-47% across workloads, including reasoning-heavy and chat-heavy tasks. These results demonstrate a robust alternative for optimizing tail latency in online LLM serving.

URL PDF HTML ☆

赞 0 踩 0

2606.18650 2026-06-18 cs.LG 新提交

FoMoE: 打破全副本壁垒的专家混合联邦系统

Lorenzo Sani, Zeyu Cao, Meghdad Kurmanji, Alex Iacob, Andrej Jovanovic, Yan Gao, Wanru Zhao, Nicholas D. Lane

发表机构 * DeepSeek-AI

AI总结提出FoMoE系统，通过跨工作节点分区专家层打破全副本范式，结合部分专家复制和跳跃令牌机制，显著降低通信开销并提升吞吐量。

详情

AI中文摘要

预训练大型语言模型（LLMs）通常需要大规模基础设施，配备紧密耦合的硬件加速器。虽然增加模型和数据集规模仍是性能的主要驱动力，但专家混合（MoE）架构最近通过将参数数量与计算成本解耦，取得了最先进的结果。这种效率使得在受限计算预算下训练大规模模型成为可能，但通常需要单个数据中心的高速互连。为了克服这些物理限制，最近的方法如DiLoCo和Photon使用低通信数据并行方法，使得能够在地理分布、弱连接的数据中心之间进行扩展。然而，这些方法存在根本性的低效问题：它们需要在每个站点拥有完整的模型副本，这带来了高昂的内存约束和通信开销。在这项工作中，我们引入了FoMoE，一个通过跨工作节点分区专家层来打破全副本范式的系统。我们证明FoMoE：（I）通过部分专家复制，在所研究的场景中，相比高效基线降低了高达1.42倍的通信成本，相比DDP降低了45.44倍；（II）通过一种新颖的跳跃令牌机制，实现了高达1.4倍的经验吞吐量加速；（III）在训练代理场景中展示了稳定的路由，并通过系统建模将通信/内存优势推广到100B规模的配置。

英文摘要

Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance, Mixture-of-Experts (MoEs) architectures have recently achieved state-of-the-art results by decoupling parameter count from computational cost. This efficiency enables training massive models on constrained compute budgets, yet it typically requires the high-speed interconnects of a single datacenter. To overcome these physical limits, recent approaches such as DiLoCo and Photon use low-communication data-parallel methods to enable scaling across geographically distributed, weakly connected data centers. However, these methods suffer from a fundamental inefficiency: they require full model replicas at every site, which imposes prohibitive memory constraints and communication overheads. In this work, we introduce FoMoE, a system that breaks the full-replica paradigm by partitioning expert layers across workers. We demonstrate that FoMoE: (I) reduces communication costs by up to 1.42x over efficient baselines and 45.44x over DDP via partial expert replication in the studied regimes; (II) achieves empirical throughput speedups of up to 1.4x through a novel skip-token mechanism; and (III) shows stable routing in the trained proxy regimes and projects the communication/memory benefits to 100B-scale configurations through system modelling.

URL PDF HTML ☆

赞 0 踩 0

2606.19150 2026-06-18 cs.LG 新提交

Complementary Attention Head Pruning for Efficient Transformers

互补注意力头剪枝用于高效Transformer

Yaniv Livertovsky, Shahar Somin, Gonen Singer

发表机构 * Bar-Ilan University（巴伊兰大学）

AI总结提出CAHP框架，将注意力头选择建模为全局图论问题，通过图聚类和信息论距离保留互补头，自动确定剪枝数量，在SST-5和MNLI上优于现有方法。

Comments 9 pages, 4 figures, 3 tables. Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2026

详情

AI中文摘要

基于Transformer的模型在自然语言处理中的显著成功源于架构的规模化，这导致大量参数并阻碍了在资源受限环境中的部署。虽然结构化剪枝提供了一条压缩路径，但现有的最先进方法通常依赖于基于梯度的重要性排序或随机门控，这些方法存在不稳定性、结构退化以及需要大量手动超参数调整的问题。在本文中，我们引入了CAHP（互补注意力头剪枝），一种新颖的事后框架，将头选择重新定义为全局图论问题。CAHP不是孤立地评估头，而是利用基于图的聚类结合信息论距离度量来识别并保留一组拓扑多样化的互补注意力头。无需预定义稀疏度或剪枝比例，该框架通过识别递减的边际性能曲线自动确定各层中保留的注意力头数量，其中根据所选多项式次数，剪除额外头会导致性能急剧下降。在SST-5和MNLI基准上跨不同Transformer模型规模的广泛评估表明，CAHP始终优于竞争基线，特别是在高压缩率情况下。此外，我们的结构分析表明，CAHP避免了基于梯度的剪枝方法的“邻近偏差”（倾向于主要保留靠近输出层的头），而是保留了模型中间层中功能关键的注意力头集合。

英文摘要

The remarkable success of Transformer-based models in natural language processing stems from architectural scaling, which leads to a large number of parameters and hinders deployment in resource-constrained environments. While structured pruning offers a pathway to compression, existing state-of-the-art methods often rely on gradient-based importance ranking or stochastic gating, which suffer from instability, structural degeneration, and the need for extensive manual hyperparameter tuning. In this paper, we introduce CAHP (Complementary Attention Head Pruning), a novel post-hoc framework that redefines head selection as a global graph-theoretical problem. Rather than evaluating heads in isolation, CAHP utilizes graph-based clustering combined with information-theoretic distance measures to identify and preserve a topologically diverse subset of complementary attention heads. Without requiring a predefined sparsity level or pruning ratio, the framework automatically determines the number of selected attention heads across layers by identifying a diminishing marginal performance curve, where pruning additional heads leads to a sharp degradation in performance, as determined by the chosen polynomial degree. Extensive evaluations on the SST-5 and MNLI benchmarks, across different Transformer model scales, demonstrate that CAHP consistently outperforms competitive baselines, particularly in high-compression regimes. Furthermore, our structural analysis shows that CAHP avoids the "proximity bias" of gradient-based pruning methods, which tend to preserve heads mainly in layers close to the output, and instead retains a functionally critical set of attention heads in the model's intermediate layers.

URL PDF HTML ☆

赞 0 踩 0

2606.16290 2026-06-18 cs.LG cs.AI 新提交

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

一种经济实惠的硬件感知神经架构搜索，用于在超低功耗计算平台上部署卷积神经网络

Andrea Mattia Garavagno, Edoardo Ragusa, Antonio Frisoli, Paolo Gastaldo

发表机构 * University of Genoa（热那亚大学）； Scuola Superiore Sant’Anna（圣安娜高等研究学院）

AI总结提出一种轻量级硬件感知神经架构搜索方法，生成可在超低功耗微控制器上运行的微型CNN，在保持分类精度的同时降低搜索成本。

详情

DOI: 10.1109/LSENS.2024.3387056
Journal ref: IEEE Sensors Letters, vol. 8, no. 5, pp. 1-4, May 2024

AI中文摘要

硬件感知神经架构搜索（HW-NAS）通过自动设计能够满足预置硬件约束的神经架构，使得卷积神经网络（CNN）能够集成到微控制器设备中。然而，最先进的HW-NAS针对的是高性能微控制器，其功耗无法满足传感节点的要求。本文提出了一种HW-NAS方法，生成可在超低功耗微控制器上运行的微型CNN，其搜索过程轻量级，甚至可以在嵌入式设备上执行。在三个著名的微型计算机视觉基准测试上的实证结果表明，所提出的HW-NAS能够在保持最先进分类精度的同时生成微型CNN。

英文摘要

Hardware-aware neural architecture search (HW-NAS) allows the integration of Convolutional Neural Networks (CNNs) in microcontrollers devices by automatically designing neural architectures that can fit prearranged hardware constraints. However, state-of-the-art HW-NAS target high-performance microcontrollers, whose power consumption does not meet sensing nodes requirements. This work presents a HW-NAS generating tiny CNNs that can run on ultra-low-power microcontrollers, featuring a lightweight search procedure enabling its execution even on embedded devices. Empirical results on three well-known benchmarks for tiny computer vision proved that the proposed HW-NAS was able to generate tiny CNNs while preserving state-of-the-art classification accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.18463 2026-06-18 cs.DC cs.LG cs.NA math.NA stat.ML 交叉投稿

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

面向GPU上广义线性模型的混合精度通信避免SGD

Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi

发表机构 * Department of Computer Science, Wake Forest University（沃杰福大学计算机科学系）； Department of Computer Science, Cornell University（康奈尔大学计算机科学系）

AI总结提出混合精度通信避免SGD（CA-SGD），通过分析有限精度误差将精度选择分解为九个独立部分，在NVIDIA GPU上实现5.1-6.8倍加速，且损失与FP32 SGD匹配。

详情

AI中文摘要

分布式随机梯度下降（SGD）受限于通信而非计算，因为每次迭代都需要跨进程进行AllReduce。通信避免SGD（CA-SGD）通过将$s$次连续的AllReduce替换为单个$sb\ imes sb$ Gram矩阵的AllReduce，将通信开销分摊到$s$次迭代中，以更多的计算和带宽换取更少的同步点。现代GPU配备矩阵硬件和低精度格式，通过加速Gram GEMM和缩减BF16流量来抵消这一开销。我们研究了NVIDIA GPU上针对广义线性模型的混合精度CA-SGD。我们的有限精度分析将一次CA-SGD外迭代的局部舍入误差分解为九个独立的精度选择，仅通过低精度单元舍入误差依赖于硬件，因此所得方案原则上可跨GPU代际迁移。该方案将输入矩阵和边缘向量以低精度存储，从低精度输入计算Gram矩阵并采用高精度累加，以高精度通信该矩阵，并以高精度执行内部递推和权重更新。在NERSC Perlmutter A100 GPU上，混合精度CA-SGD在逻辑回归、线性回归和泊松问题上的损失与FP32 SGD相差在0.5%以内，并在epsilon、SUSY、HIGGS、synth和Poisson-synth数据集上达到5.1-6.8倍于FP32 SGD的加速。我们的软件可在以下网址获取：this https URL

英文摘要

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

URL PDF HTML ☆

赞 0 踩 0

2606.19004 2026-06-18 cs.DC cs.AI cs.LG 交叉投稿

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

Spotlight: 协同种子探索与抢占式GPU用于DiT强化学习后训练

Ruiqi Lai, Dakai An, Wei Gao, Ju Huang, Siran Yang, Jiamang Wang, Lin Qu, Dmitrii Ustiugov, Wei Wang

发表机构 * NTU Singapore（南洋理工大学）； Hong Kong University of Science and Technology（香港科技大学）； Alibaba Group（阿里巴巴集团）

AI总结针对DiT强化学习后训练成本高的问题，提出Spotlight系统，通过利用探索对旧权重的容忍性和SP组快速重配置，在抢占式GPU上实现高效训练，加速4倍并降低成本1.4-6.4倍。

详情

AI中文摘要

扩散Transformer（DiT）的强化学习（RL）后训练成本极高，需要数千块高端GPU。现有工作探索了两个降低成本的方向：种子探索通过选择高对比度样本来改善训练收敛，但增加了关键路径的计算量；抢占式GPU提供69-77%的成本降低，但在训练期间处于空闲状态，因为DiT rollout几乎同时完成，这阻止了类似LLM的rollout与训练流水线化。抢占式GPU的抢占进一步破坏了序列并行（SP）组，导致GPU拓扑碎片化。我们提出了Spotlight，这是第一个利用抢占式GPU进行DiT RL后训练的系统。Spotlight基于我们设计的两个关键洞察：（1）我们证明探索可以容忍过时的模型权重，因为使用前一次迭代模型权重的探索保留了随机种子的相对排序，允许探索在训练期间在空闲的抢占式GPU上运行。（2）SP重配置可以重用节点内状态，将组恢复时间从分钟级缩短到亚秒级启动。基于这些洞察，Spotlight引入了三种技术：基于bandit的探索规划器，在训练时间预算内最大化奖励方差；弹性序列并行，通过持久调度器和节点内权重复制动态重配置SP组；以及抢占感知的拉取式请求调度器，平衡负载并在抢占时提交进行中的状态。我们在开源RL平台ROLL上实现了Spotlight，并在Qwen-Image后训练上进行了评估。Spotlight达到相同目标验证分数的速度比基线快4倍，总成本降低1.4-6.4倍，同时在分辨率512×512和1280×1280的DeepSeek-OCR和Geneval数据集上实现了更优的图像质量。

英文摘要

Reinforcement learning (RL) post-training of Diffusion Transformers (DiTs) is prohibitively expensive, requiring thousands of high-end GPUs. Existing works explore two directions to reduce cost: seed exploration improves training convergence by selecting high-contrast samples, yet adds compute to the critical path; spot GPUs offer 69--77\% lower cost, yet sit idle during training because DiT rollouts finish nearly simultaneously, which prevents LLM-style pipelining of rollout with training. Spot preemptions further break Sequence Parallelism (SP) groups, fragmenting GPU topology. We present Spotlight, the first system that harvests spot GPUs for DiT RL post-training. Spotlight rests on two key insights we devise: (1)~we show that exploration can tolerate stale model weights because exploration that uses the model weights from the previous iteration preserves the relative ranking of random seeds, allowing exploration to run on idle spot GPUs during training. (2)~SP reconfiguration can reuse on-node state, reducing group recovery from minutes to sub-second launches. Built on these insights, Spotlight introduces three techniques: a bandit-based exploration planner that maximizes reward variance within the training time budget, elastic sequence parallelism that reconfigures SP groups on the fly via persistent schedulers and intra-node weight copying, and a preemption-aware pull-based request scheduler that balances load and commits in-flight state upon preemption. We implement Spotlight on the open-source RL platform ROLL and evaluate it on Qwen-Image post-training. Spotlight reaches the same target validation score $4\times$ faster than baselines, reducing total cost by $1.4$-$6.4\times$ while achieving superior image quality on DeepSeek-OCR and Geneval datasets with resolution $512\times512$ and $1280\times1280$.

URL PDF HTML ☆

赞 0 踩 0

2606.14824 2026-06-18 cs.AR cs.AI cs.LG 交叉投稿

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

在512MB内存下的嵌入式设备上运行硬件感知的神经架构搜索

Andrea Mattia Garavagno, Edoardo Ragusa, Paolo Gastaldo, Antonio Frisoli

发表机构 * University of Bologna（博洛尼亚大学）； Politecnico di Milano（米兰理工学院）

AI总结提出一种在资源受限的嵌入式设备上直接运行的硬件感知神经架构搜索方法，生成针对低端MCU的微型CNN，在Visual Wake Word数据集上达到最先进水平。

详情

DOI: 10.1109/ICCE59016.2024.10444268
Journal ref: 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2024, pp. 1-2

AI中文摘要

本文提出了一种新颖的硬件感知神经架构搜索（HW NAS）方法，该方法考虑了运行它的计算平台上的可用资源，使其能够在各种嵌入式设备上执行。所提出的HW NAS生成针对低端微控制器单元（MCU）的微型卷积神经网络（CNN），这些MCU通常用于物联网（IoT）或可穿戴机器人领域，从而开辟了新的应用场景。网关可以运行它来根据获取的数据定制CNN的架构，而无需使用外部服务器，从而确保隐私。所提出的技术在Visual Wake Word数据集（一个标准的TinyML基准）上的多个人体识别任务中，在多个嵌入式设备上取得了最先进的结果。

英文摘要

This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

URL PDF HTML ☆

赞 0 踩 0

2509.22020 2026-06-18 cs.LG 版本更新

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University（中山大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； The Chinese University of Hong Kong（香港中文大学）； National Supercomputing Center in Shenzhen（深圳国家超算中心）； Huawei Technologies Co., Ltd（华为技术有限公司）； Tsinghua University（清华大学）

AI总结提出WeatherPEFT框架，通过任务自适应动态提示和随机Fisher引导自适应选择，在天气下游任务上以更少参数达到全微调性能。

详情

AI中文摘要

尽管机器学习的最新进展使天气基础模型（WFM）在多种下游任务中具备了强大的泛化能力，但随着模型规模扩大，计算需求不断攀升，实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调（PEFT）方法无法应对天气下游任务的独特挑战，如变量异质性、分辨率多样性和时空覆盖变化，导致在WFM上性能欠佳。为弥补这一差距，我们提出WeatherPEFT，一种新颖的PEFT框架，包含两项协同创新。首先，在前向传播中，任务自适应动态提示（TADP）通过内部和外部模式提取，将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌，实现针对特定下游任务的上下文感知特征重校准。其次，在反向传播中，随机Fisher引导自适应选择（SFAS）不仅利用Fisher信息识别并更新最关键的任务参数，从而保留不变的预训练知识，还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率，现有PEFT方法与全微调相比存在显著差距，而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

URL PDF HTML ☆

赞 0 踩 0

2601.21626 2026-06-18 cs.LG cs.AI 版本更新

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

HeRo-Q: 通过Hessian条件化实现稳定低比特量化的通用框架

Jinhao Zhang, Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Science and Technology of China（中国科学技术大学）； Zhejiang Lab（浙江实验室）； Peng Cheng Laboratory（鹏城实验室）

AI总结针对后训练量化中“低误差、高损失”的矛盾，提出HeRo-Q算法，通过轻量可学习的旋转压缩矩阵重塑损失景观，降低最大Hessian特征值，增强对量化噪声的鲁棒性，在Llama和Qwen模型上优于现有方法。

详情

AI中文摘要

后训练量化（PTQ）是一种主流的模型压缩技术，但由于其仅专注于最小化量化误差，常常导致矛盾的“低误差、高损失”现象。根本原因在于LLM损失景观的Hessian矩阵：少数高曲率方向对扰动极其敏感。为了解决这个问题，我们提出了Hessian鲁棒量化（HeRo Q）算法，该算法在量化前对权重空间应用一个轻量级、可学习的旋转压缩矩阵。这个联合框架通过降低最大的Hessian特征值并减小其最大特征值来重塑损失景观，从而显著增强对量化噪声的鲁棒性。HeRo-Q不需要修改架构，计算开销可忽略不计，并且可以无缝集成到现有的PTQ流程中。在Llama和Qwen模型上的实验表明，HeRo Q在标准W4A8设置下不仅持续优于包括GPTQ、AWQ和SpinQuant在内的最先进方法，而且在极具挑战性的W3A16超低比特场景中表现出色，将Llama3 8B在GSM8K上的准确率提升至70.15%，并有效避免了激进量化中常见的逻辑崩溃。

英文摘要

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

URL PDF HTML ☆

赞 0 踩 0

2602.00161 2026-06-18 cs.LG cs.AI cs.CL quant-ph 版本更新

LLM Compression by Block Removal with Constrained Binary Optimization

通过带约束二进制优化的块移除进行LLM压缩

David Jansen, Roman Rausch, Ali Hashemi, David Montero, Román Orús

发表机构 * Multiverse Computing（多维计算公司）； Donostia International Physics Center（多斯蒂亚国际物理中心）； Ikerbasque Foundation for Science（伊克尔巴斯克科学基金会）

AI总结提出将大语言模型块移除压缩问题建模为约束二进制优化，映射到Ising玻璃系统，实现高效排序和高质量非连续块移除，在50%压缩时MMLU提升近23个百分点，且计算高效、通用性强。

Comments 16 pages, 3 figures

详情

AI中文摘要

在本文中，我们将通过最优删除Transformer块（“块移除”）来压缩大语言模型（LLM）的问题，表述为一个约束二进制优化（CBO）问题，该问题可以映射到物理系统（Ising玻璃），其能量是下游模型性能的强代理。这种表述使得能够高效地对大量候选块移除配置进行排序，产生许多高质量、非平凡的解决方案，而不仅仅是移除连续区域。我们的方法在深度压缩场景中表现强劲，例如在Llama-3.3-70B-Instruct的50%压缩中，与其他最先进的块移除方法相比，我们在MMLU基准上取得了近23个百分点的提升。对于较轻的压缩，它在多个基准上与这些方法表现相当，适用于Llama-3.1-8B-Instruct、Qwen3-14B（重训练前后）以及Llama-3.3-70B-Instruct。该方法计算效率高，仅需在校准数据集上对少数活跃参数进行前向和反向传播。此外，我们证明，当无法精确求解CBO问题时，使用良好的启发式求解器可以在可忽略的运行时间内提供在下游任务上表现良好的解决方案。该方法可以轻松应用于任何架构。我们在最近的NVIDIA-Nemotron-3-Nano-30B-A3B-FP8模型上展示了这种通用性，该模型具有高度不均匀且具有挑战性的块结构，并且在移除2个注意力层或3个混合专家层时，我们在AIME25和GPQA上超越了最先进水平。

英文摘要

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

URL PDF HTML ☆

赞 0 踩 0

2512.12850 2026-06-18 cs.AR cs.LG cs.SY eess.SY hep-ex 版本更新

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

KANELÉ：基于Kolmogorov-Arnold网络的高效LUT评估

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结提出KANELÉ框架，利用Kolmogorov-Arnold网络（KAN）的独特性质，通过量化与剪枝协同优化，首次系统实现FPGA上的高效LUT映射，相比先前方法加速高达2700倍并节省大量资源。

Comments International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)

详情

DOI: 10.1145/3748173.3779202

AI中文摘要

低延迟、资源高效的FPGA神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表（LUT）的神经网络是一种常见解决方案，结合了强大的表示能力和高效的FPGA实现。在这项工作中，我们介绍了KANELÉ，一个利用Kolmogorov-Arnold网络（KAN）独特性质进行FPGA部署的框架。与传统的多层感知器（MLP）不同，KAN使用可学习的一维样条作为边缘激活函数，其域固定，这种结构天然适合离散化和高效的LUT映射。我们提出了第一个在FPGA上实现KAN的系统设计流程，通过量化与剪枝协同优化训练，以实现紧凑、高吞吐量和低延迟的KAN架构。我们的结果表明，与先前的KAN-on-FPGA方法相比，加速高达2700倍，并节省了数量级的资源。此外，KANELÉ在广泛使用的基准测试中匹配或超越了其他基于LUT的架构，特别是在涉及符号或物理公式的任务中，同时平衡了FPGA硬件上的资源使用。最后，我们通过将框架扩展到实时、高能效的控制系统，展示了其多功能性。

英文摘要

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

URL PDF HTML ☆

赞 0 踩 0

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT（麻省理工学院）

AI总结针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求，提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性，在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情

AI中文摘要

PSyGenTAB：通过约束优化生成合成临床表格数据的隐私保护框架

Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni

发表机构 * San Diego State University（圣地亚哥州立大学）； University of California, Irvine（加利福尼亚大学尔湾分校）

AI总结提出PSyGenTAB框架，将合成医疗数据生成建模为约束优化问题，通过增强拉格朗日方法嵌入可配置隐私约束，在保证隐私阈值的同时最大化临床数据效用，实验表明合成数据训练的模型性能与真实数据相当。

Comments 20 pages

详情

AI中文摘要

由于机构壁垒和严格的隐私法规（如HIPAA和GDPR），医疗AI的发展受到高质量临床数据获取限制。合成数据生成提供了一种潜在解决方案，但现有方法缺乏明确管理隐私-效用权衡的原则性机制，常常退化临床有意义的模式或面临患者重识别风险。我们提出PSyGenTAB，一个隐私保护生成框架，将合成医疗数据生成建模为使用增强拉格朗日方法求解的约束优化问题。通过将可配置的隐私约束直接嵌入模型训练，PSyGenTAB在最大化临床数据效用的同时强制执行最低隐私阈值。在多个临床驱动的基准测试中，PSyGenTAB保留了可靠健康AI所需的特征间临床关系和少数类诊断模式。使用“合成训练、真实测试”和“真实训练、合成测试”协议的下游评估表明，在合成数据上训练的模型达到了与真实患者记录训练模型相当的性能。隐私审计进一步证明了精确记录复制的减少和对成员推理攻击的强大抵抗力。这些结果确立了PSyGenTAB作为平衡合成医疗数据中隐私保护和临床效用的原则性框架，支持安全的跨机构AI开发。

英文摘要

The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, but existing methods lack principled mechanisms to explicitly manage the privacy-utility trade-off, often degrading clinically meaningful patterns or risking patient re-identification. We present PSyGenTAB, a privacy-preserving generative framework that formulates synthetic healthcare data generation as a constrained optimization problem solved using the Augmented Lagrangian Method. By embedding configurable privacy constraints directly into model training, PSyGenTAB enforces minimum privacy thresholds while maximizing clinical data utility. Across multiple clinically motivated benchmarks, PSyGenTAB preserves inter-feature clinical relationships and minority-class diagnostic patterns essential for reliable health AI. Downstream evaluation using Train-on-Synthetic, Test-on-Real and Train-on-Real, Test-on-Synthetic protocols shows that models trained on synthetic data achieve performance comparable to those trained on real patient records. Privacy auditing further demonstrates reduced exact record reproduction and strong resilience to membership inference attacks. These results establish PSyGenTAB as a principled framework for balancing privacy protection and clinical utility in synthetic healthcare data, supporting secure cross-institutional AI development.

URL PDF HTML ☆

赞 0 踩 0

2606.18773 2026-06-18 cs.LG cs.AI 新提交

Private Learning with Public Feature Conditioning

基于公共特征条件化的私有学习

Shuli Jiang, Walid Krichene, Nicolas Mayoraz

发表机构 * Microsoft（微软）； Google Research（谷歌研究院）

AI总结针对标签差分隐私回归问题，提出Cond-DP方法，利用公共特征矩阵的结构信息构造条件化矩阵以加速优化，在凸、强凸和非凸设置下提供收敛保证，并在线性回归中实现比DPSGD更快的收敛速度。

Comments Proceedings of the 43rd International Conference on Machine Learning (ICML 2026). 26 pages, 9 figures

详情

AI中文摘要

我们研究了每个数据样本包含公共、非敏感特征的设置下的差分隐私（DP）回归问题——这在推荐和广告系统等应用中很常见。虽然这种标签DP或半敏感特征设置主要在分类背景下进行了探索，但有效的回归方法仍未被充分研究。我们提出了Cond-DP，一种DPSGD的条件化变体，它利用公共特征矩阵的结构来改善隐私约束下的优化。受这些公共特征通常表现出快速衰减谱的观察启发，Cond-DP引入了一个数据驱动的条件化矩阵来重塑优化景观并加速收敛。我们为凸、强凸和非凸设置提供了收敛保证，并将标准DPSGD作为条件化矩阵为单位矩阵时的特例。我们展示了如何直接从公共特征为Cond-DP构造有效的条件化矩阵，从而在私有线性回归中实现比DPSGD更快的收敛速度，且不增加额外的隐私成本。实验表明，在标签DP下，使用该条件化矩阵的Cond-DP在多种数据集和模型架构上持续优于最先进的基线方法，展示了强大且稳健的实际性能。

英文摘要

We study differentially private (DP) regression in settings where each data sample includes public, non-sensitive features -- common in applications such as recommendation and advertising systems. While such label-DP or semi-sensitive-feature settings have been primarily explored in the context of classification, effective approaches for regression remain underexplored. We introduce Cond-DP, a conditioned variant of DPSGD that leverages the structure of public feature matrices to improve optimization under privacy constraints. Motivated by the observation that these public features often exhibit rapidly decaying spectra, Cond-DP incorporates a data-driven conditioning matrix to reshape the optimization landscape and accelerate convergence. We provide convergence guarantees for convex, strongly convex, and non-convex settings, and recover standard DPSGD as a special case when the conditioning matrix is the identity. We show how to construct an effective conditioning matrix for Cond-DP directly from public features, enabling provably faster convergence than DPSGD in private linear regression without incurring additional privacy cost. Empirically, Cond-DP with this conditioning matrix consistently outperforms state-of-the-art baselines across a wide range of datasets and model architectures under label DP, demonstrating strong and robust performance in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.19220 2026-06-18 cs.LG cs.AI 新提交

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

面向网络入侵数据集的XGBoost模型机器遗忘

Diana Magalhães, Eva Maia, João Vitorino, Isabel Praça

发表机构 * GECAD, ISEP, Polytechnic of Porto（波尔图理工学院工程学院GECAD研究所）

AI总结针对XGBoost模型提出XGBoost-Forget遗忘方法，在表格型网络入侵数据集上实现高效遗忘，保持模型性能的同时显著提升遗忘速度。

Comments 12 pages, 7 tables, WorldCist'26 Conference

2606.19222 2026-06-18 cs.LG cs.AI 新提交

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

机制引导的选择性遗忘：针对RLVR诱导的推理

Chenyu Zhou, Qiliang Jiang, Shuning Wu, Xu Zhou

发表机构 * School of Engineering, Institute of Science Tokyo, Japan（东京科学大学工学院）； College of Control Science and Engineering, Zhejiang University, China（浙江大学控制科学与工程学院）； Department of Electrical and Computer Engineering, National University of Singapore, Singapore（新加坡国立大学电气与计算机工程系）

AI总结提出MAST方法，通过机制引导选择性更新参数，在遗忘RLVR诱导的推理行为时，显著降低对保留性能的附带损害。

Comments 15 pages, 4 figures, 7 tables

详情

AI中文摘要

我们提出MAST（机制对齐选择性目标），一种机制引导的方法，用于遗忘RLVR诱导的推理，其附带损害远低于标准全参数更新。在Qwen2.5-Math-1.5B和Qwen3-1.7B-Base的匹配SFT/RLVR检查点上，SFT到RLVR的增量在token级delta-log-probability上与SFT更新显著不同，而全参数梯度上升仅通过破坏保留的MATH和GSM8K来实现遗忘。MAST根据离主能量、更新幅度和遗忘梯度耦合幅度对注意力投影张量进行排序，然后仅更新排名最高的子集。在主模型上，MAST诱导了统计上显著的目标遗忘（MATH遗忘从45/150降至37/150；McNemar p=0.0078），同时保留了GSM8K（+0.8个百分点）和MATH保留（-0.5个百分点）。该优势在不同种子、NPO/SimNPO目标以及Qwen3上均得到复现，在Qwen3上MAST保留了GSM8K，而全参数遗忘导致其崩溃。

英文摘要

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.

URL PDF HTML ☆

赞 0 踩 0

2606.19262 2026-06-18 cs.LG 新提交

Detecting Hidden ML Training With Zero-Overhead Telemetry

使用零开销遥测检测隐藏的机器学习训练

Robi Rahman, Sabiha Tajdari

发表机构 * Machine Intelligence Research Institute（机器智能研究所）； University of Virginia（弗吉尼亚大学）

AI总结本文评估了仅使用零开销、隐私保护的NVML遥测（内容无关信号）对GPU工作负载分类的对抗鲁棒性，开发了一个分类器，在识别训练工作负载时达到98.2%的二元准确率，并对最具挑战性的意外工作负载达到43-87%的准确率。

Comments Technical AI Governance Research workshop at ICML 2026

2606.18312 2026-06-18 cs.CR cs.DC cs.LG 交叉投稿

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

TIGER：通过嵌入子空间距离优化反转Transformer梯度

William Kalikman, Ivo Petrov, Dimitar I. Dimitrov, Martin Vechev

发表机构 * ETH Zürich（苏黎世联邦理工学院）； INSAIT, Sofia University "St. Kliment Ohridski"（索菲亚大学"圣克莱门特·奥赫里茨基"）

AI总结提出TIGER攻击，通过将子空间信号转化为可微目标，直接优化令牌嵌入以最小化到子空间的距离，在编码器模型上提升重建质量和速度，在解码器模型上增强对差分隐私的鲁棒性。

Comments 16 pages, 13 pages main text,

详情

AI中文摘要

联邦学习允许多个客户端通过向中央服务器发送梯度更新来联合训练共享模型，同时保持原始输入在本地。然而，先前的梯度反转攻击表明，这些更新可以泄露足够的信息来重建客户端输入。现有的针对Transformer的攻击要么优化虚拟输入以匹配真实的客户端更新，这对于现代模型来说成本高昂且不稳定；要么利用注意力梯度的低秩性来识别包含真实层嵌入的子空间，然后对候选令牌进行离散成员测试。然而，这种令牌测试在数值噪声（例如来自量化或差分隐私）下很脆弱，并且对于具有非因果注意力的编码器模型扩展性差。我们引入了TIGER，一种连续的梯度反转攻击，它将这种子空间信号转化为可微目标。TIGER不是搜索令牌或匹配完整梯度，而是直接优化令牌嵌入以最小化它们到子空间的距离。我们的实验表明，在仅编码器模型上，TIGER在重建质量和运行时间上均显著优于现有攻击；而在解码器模型上，TIGER比先前基于子空间的攻击更鲁棒，从而在受差分隐私保护的联邦学习设置中实现了首次成功的重建。

英文摘要

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

URL PDF HTML ☆

赞 0 踩 0

2606.19023 2026-06-18 cs.CR cs.LG 交叉投稿

P$^2$CE: 模型无关的可行帕累托最优反事实解释

Arthur Hendricks Mendes de Oliveira, Giovani Valdrighi, Marcos Medeiros Raimundo

AI总结提出P$^2$CE算法，利用隔离森林异常检测和SHAP值，生成可行且帕累托最优的反事实解释，平衡可行性、合理性和计算效率。

Comments Under review in the Machine Learning journal

详情

AI中文摘要

机器学习算法在社会应用中的日益普及引发了对公平性和透明度的担忧，从而推动了反事实解释的发展。这些解释通过提供可操作的输入特征更改，帮助个人理解并可能改变在贷款申请、工作选择等领域的不利决策。现有方法往往难以平衡可行性、合理性和计算效率。为此，我们提出了P$^2$CE，一种生成可行帕累托最优反事实解释的算法，为用户提供不同可行性概念之间的多样化最优权衡。P$^2$CE使用辅助隔离森林异常检测器确保解释符合数据分布，并利用SHAP值在短时间内获得最优结果，与底层模型无关。我们在三个数据集上进行了实证评估，结果表明，与相关技术相比，该算法在解决方案质量和计算效率方面均表现出优越性能。

英文摘要

The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

URL PDF HTML ☆

赞 0 踩 0

2606.18430 2026-06-18 cs.LG cs.CR 新提交

使用Tsetlin机器的目标置信度追索：TRUST

K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats

发表机构 * Group Research and Development Det Norske Veritas (DNV)（挪威船级社（DNV）集团研发部）

AI总结提出TRUST框架，通过概率Tsetlin机器和贝叶斯优化直接搜索满足用户指定置信度目标的最小输入变化，生成更稳健和可解释的反事实解释。

详情

AI中文摘要

反事实解释被广泛用于高风险决策系统中的算法追索。大多数现有方法寻求最小化改变输入以翻转模型决策。然而，决策者通常不仅依赖预测标签，还依赖置信度阈值和风险边际。刚好越过决策边界的反事实在噪声或模型变化下可能脆弱且不稳定。本文提出使用Tsetlin机器的目标置信度追索（TRUST），一种用户明确指定追索所需预测置信度的框架。TRUST不是先生成反事实再评估置信度，而是直接搜索满足用户定义置信度目标的最小变化，从而在成本、置信度和鲁棒性方面比较追索选项。我们使用概率Tsetlin机器（PTM）结合贝叶斯优化实例化TRUST。PTM基于概率子句的结构将预测置信度与决策规则的稳定性联系起来。我们表明，满足相同规则的反事实在可靠性上可能差异很大，取决于它们满足这些规则的安全程度，揭示了决策是由稳健还是脆弱的子句激活支持的。在合成和真实数据集上的实验表明，目标置信度反事实比传统的基于边界的方法产生更稳健和可解释的追索。在多个基准测试中，TRUST实现了完美的鲁棒性，同时保持较低的追索成本，包括在Haberman数据集上以0.92置信度达到0.10的L2距离。通过显式控制置信度和暴露规则级稳定性，TRUST为高风险决策支持提供了可操作的追索。

英文摘要

Counterfactual explanations are widely used to provide algorithmic recourse in high-stakes decision-making systems. Most existing methods seek the smallest change to an input that flips a model's decision. However, decision-makers often rely not only on predicted labels but also on confidence thresholds and risk margins. Counterfactuals that barely cross a decision boundary can be fragile and unstable under noise or model variation. In this paper, we propose Target-confidence Recourse Using tSeTlin machines (TRUST), a framework in which users explicitly specify the desired prediction confidence for recourse. Rather than generating counterfactuals and evaluating confidence afterward, TRUST directly searches for minimal changes that satisfy a user-defined confidence target, enabling comparison of recourse options in terms of cost, confidence, and robustness. We instantiate TRUST using a Probabilistic Tsetlin Machine (PTM) combined with Bayesian optimization. The probabilistic clause-based structure of PTM links prediction confidence to the stability of decision rules. We show that counterfactuals satisfying the same rules can still differ substantially in reliability depending on how securely they satisfy those rules, revealing whether decisions are supported by robust or fragile clause activations. Experiments on synthetic and real-world datasets demonstrate that target-confidence counterfactuals produce more robust and interpretable recourse than conventional boundary-based approaches. Across multiple benchmarks, TRUST achieves perfect robustness while maintaining low recourse cost, including an L2 distance of 0.10 on the Haberman dataset at 0.92 confidence. By explicitly controlling confidence and exposing rule-level stability, TRUST provides actionable recourse for high-stakes decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.18839 2026-06-18 cs.LG cs.CV 新提交

Semantic Robustness Certification for Vision-Language Models

视觉语言模型的语义鲁棒性认证

Peiyu Yang, Paul Montague, Feng Liu, Andrew C. Cullen, Amardeep Kaur, Christopher Leckie, Sarah M. Erfani

发表机构 * School of Computing \& Information Systems, University of Melbourne, Australia

AI总结提出首个无需额外数据即可认证视觉语言模型在语义层面（如形状、大小、风格）鲁棒性的框架，通过文本提示作为语义代理并量化决策边界，确保预测类别在语义变换下不变。

Comments Accepted to ICML

详情

AI中文摘要

视觉语言模型（VLM）现在被广泛用于下游任务。然而，现实世界的应用常常使VLM面临由语义变化（例如形状、大小和风格）引起的分布偏移。鲁棒性认证确定当对输入应用变换时模型的预测是否改变。虽然大多数认证框架研究输入的几何或像素级变换，但本文提出了一种新颖的框架，能够在语义级变换下认证VLM的鲁棒性。利用VLM的开放词汇能力，我们使用文本提示作为语义代理来构建由控制语义变化程度的范围参数化的变换。通过以封闭形式表征VLM决策边界，我们的框架定量地认证了在语义变换下预测类别保持不变的范围区间。我们的框架是第一个在语义级变化下认证VLM鲁棒性而无需为每种变化提供额外数据的框架，使其易于应用。在合成数据和真实数据上的实验表明，我们的框架能够在各种场景下认证针对多种语义变化的鲁棒性。

英文摘要

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.18867 2026-06-18 cs.LG cs.CY stat.ML 新提交

RUB: 评估未学习模型中的残留知识

Hao Xuan, Xingyu Li

发表机构 * Electrical and Computer Engineering University of Alberta（电气与计算机工程大学阿尔伯塔大学）

AI总结提出鲁棒未学习原则及统一基准RUB，通过未学习映射攻击（UMA）检测残留信息，揭示现有方法在对抗评估下的脆弱性。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026, pages 8550-8559

AI中文摘要

机器未学习（MUL）已成为隐私保护和内容监管的关键机制，然而当前技术往往无法保证完全移除敏感信息。虽然现有工作大多关注验证未学习的执行，但它们忽略了模型在面对对抗性恢复遗忘知识尝试时是否保持鲁棒性的关键问题。在这项工作中，我们倡导鲁棒未学习原则，要求模型既与重新训练的模型不可区分，又能抵御多样化的对抗威胁。为实例化这一原则，我们提出了一个统一基准RUB（鲁棒未学习基准），系统评估未学习算法在分类、图像到图像重建和文本到图像合成中的鲁棒性。在此框架内，我们引入未学习映射攻击（UMA）作为检测残留信息的通用方法，并展示现有攻击策略如何适应此框架，只要它们符合通用UMA框架。我们在判别式和生成式任务上的实验表明，最先进的未学习方法在这些评估下仍然脆弱，即使通过了标准验证指标。通过将鲁棒性定位为核心标准并提供对抗评估基准，我们希望RUB能为更可靠和安全的未学习实践铺平道路。RUB中的代码库和模型检查点将公开发布。

英文摘要

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

URL PDF HTML ☆

赞 0 踩 0

2505.03646 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Revealing Hidden Vulnerabilities in Autoencoders through Gradient Signal Restoration

通过梯度信号恢复揭示自编码器中的隐藏漏洞

Chethan Krishnamurthy Ramanaik, Arjun Roy, Tobias Callies, Eirini Ntoutsi

发表机构 * University of the Bundeswehr Munich（联邦国防军理工大学）

AI总结针对自编码器对抗攻击中梯度消失导致鲁棒性被高估的问题，提出GRILL框架恢复梯度信号，显著提升攻击效果，暴露隐藏漏洞。

详情

AI中文摘要

深度自编码器（AE）的对抗鲁棒性受到的关注远少于判别模型，尽管其压缩的潜在表示会导致病态映射，从而放大小的输入扰动并破坏重建稳定性。现有的AE白盒攻击通过优化范数有界的对抗扰动以最大化重建损失，往往收敛到次优扰动，从而可能高估AE的鲁棒性。我们表明，这种限制与通过病态层反向传播时对抗损失梯度消失有关，这些病态层的中间权重矩阵具有接近零的奇异值。为了解决这个问题，我们提出了GRILL（病态层中的梯度信号恢复）框架，旨在减轻梯度退化并提高编码器-解码器架构中对抗鲁棒性评估的可靠性。GRILL旨在缓解优化过程中的对抗梯度退化，使攻击能够在固定范数约束下更好地逼近高失真扰动。通过在多种AE架构上的广泛实验，包括样本特定和通用攻击，以及标准和自适应攻击设置，我们表明GRILL显著提高了攻击有效性，从而暴露了现有攻击限制所隐藏的漏洞。除了AE之外，我们提供了初步证据表明现代多模态编码器-解码器架构也存在类似的漏洞。

英文摘要

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize reconstruction damage, often converge to suboptimal perturbations, thereby potentially overstating AE robustness. We show that this limitation is linked to vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, associated with near-zero singular values in their intermediate weight matrices. To address this, we propose GRILL (Gradient Signal Restoration in Ill-Conditioned Layers), a framework designed to mitigate gradient degradation and improve the reliability of adversarial robustness evaluation in encoder-decoder architectures. GRILL is designed to mitigate adversarial gradient degradation during optimization, enabling attacks to better approximate high-distortion perturbations under fixed norm constraints. Through extensive experiments across multiple AE architectures, under both sample-specific and universal attacks, as well as standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, thereby exposing vulnerabilities hidden by existing attack limitations. Beyond AEs, we provide preliminary evidence that modern multimodal encoder-decoder architectures exhibit similar vulnerabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.16214 2026-06-18 cs.LG cs.AI 版本更新

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

贝叶斯深度学习中的校准无采样不确定性估计

Tobias Jan Wieczorek, Leon de Andrade, Thomas Möllenhoff, Marcus Rohrbach

发表机构 * TU Darmstadt & hessian.AI, Darmstadt, Germany（达姆施塔特工业大学 & hessian.AI，德国达姆施塔特）； RIKEN Center for Advanced Intelligence Project, Tokyo, Japan（日本理化学研究所革新智能研究中心，日本东京）

AI总结提出校准方差传播（CVP），通过新型归一化层传播方法、激活函数处理技术及轻量校准步骤，在单次前向传播中高效估计不确定性，在Transformer和CNN上达到与MC采样相当的精度，成本显著降低。

详情

AI中文摘要

现代深度学习模型仍然以过度自信而闻名，限制了它们在高风险应用中的可靠性。贝叶斯方法通过学习模型参数的分布来应对这一问题，最近的进展使得在大规模架构上以与AdamW相当的成本实现这一目标成为可能。然而，测试时仍存在一个挑战：预测必须对从后验中采样的权重进行多次前向传播的平均，这代价高昂。方差传播提供了一种高效的替代方案，在单次前向传播中计算每层不确定性的解析近似。虽然此类技术对MLP有效，但由于现代架构的深度增加和层类型多样性，其扩展仍然具有挑战性。为填补这一空白，我们提出了校准方差传播（CVP），它引入了一种新的归一化层传播方法，结合了处理激活函数的近期技术，并通过轻量校准步骤吸收残差误差。CVP在Transformer和CNN上产生与MC采样相当准确的不确定性估计，而成本仅为极小部分。与先前的方差传播工作相比，CVP在BEiT-3上对视觉推理（NLVR2）的$0.5\%$风险覆盖率从$8.2\%$提高到$14.6\%$，在ViLT上对VQAv2从$2.6\%$提高到$10.8\%$，且增益扩展到卷积架构。

英文摘要

Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.

URL PDF HTML ☆

赞 0 踩 0

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结研究半随机模型下植入子图检测问题，证明存在对抗者时强次对数密度子图检测在信息论上不可能，而对数以上密度子图统计极限不变，并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情

AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究，产生了丰富的刻画统计和计算阈值的结果。然而，大多数先前的工作假设纯随机生成模型，使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型，其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是，统计学家仍然不知道哪些边被移除，这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限，揭示了尖锐的二分性。具体而言，对于具有强次对数最大密度的植入子图，在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是，对于具有超对数密度的子图，统计极限基本保持不变；我们证明最优（尽管计算上不可行）的似然比检验仍然是鲁棒的。在这些统计边界之外，我们设计了一种新的计算高效且鲁棒的检测算法，并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架，并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

URL PDF HTML ☆

赞 0 踩 0

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少，而且何处：将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结针对安全关键分类中认知不确定性度量无法区分类别的问题，提出将互信息分解为每类向量$C_k$，通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制，在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

详情

Journal ref: Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

AI中文摘要

在安全关键分类中，失败的代价往往是不对称的，然而贝叶斯深度学习用单个标量——互信息（MI）来总结认知不确定性，这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$，其中$\mu_k{=}\mathbb{E}[p_k]$，$\sigma_k^2{=}\mathrm{Var}[p_k]$，计算基于后验样本。该分解来自熵的二阶泰勒展开；$1/\mu_k$加权校正了边界抑制，使$C_k$在稀有类别和常见类别之间具有可比性。根据构造，$\sum_k C_k \approx \mathrm{MI}$，并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后，我们在三个任务上验证了它：（i）糖尿病视网膜病变的选择性预测，其中关键类别的$C_k$相比MI降低了34.7%的选择性风险，相比方差基线降低了56.2%；（ii）临床和图像基准上的分布外检测，其中$\sum_k C_k$取得了最高的AUROC，并且每类视角暴露了MI无法察觉的不对称偏移；（iii）受控的标签噪声研究，其中在端到端贝叶斯训练下，$\sum_k C_k$对注入的偶然噪声的敏感性低于MI，而在迁移学习下两种度量均退化。在所有任务中，后验近似的质量对不确定性的影响至少与度量选择本身一样强，这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

URL PDF HTML ☆

赞 0 踩 0

2606.18317 2026-06-18 cs.LG 新提交

Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

使用K跳高斯扩散增强图神经网络

Xuling Zhang, Peng Wang, Daiyan Li, Aoran Huang, Zeiwei Chen, Yongkui Yang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（中国科学院深圳先进技术研究院）； Southern University of Science and Technology（南方科技大学）

AI总结提出K跳高斯扩散核作为预处理模块，通过多跳扩散和高斯权重平衡局部与全局信息，在噪声或结构复杂图中优于传统消息传递和现有扩散方法。

Comments 5page, 3 figures

详情

DOI: 10.1109/ICASSP55912.2026.11462070

AI中文摘要

大多数图神经网络核心依赖于图卷积，通常实现为直接（单跳）邻居之间的消息传递。在许多现实世界的图中，边可能带有噪声或定义不明确，限制了信息传播到局部邻域。现有的扩散核，如个性化PageRank和热核，通过全局传播缓解了这个问题，但仍然难以处理复杂的局部结构和远距离节点噪声。为了解决这些限制，我们提出了一种K跳高斯扩散核作为图数据的预处理模块。KHG引入了多跳扩散，并对远程节点进行高斯加权，在应用标准GNN之前平衡局部和全局信息传播。在多个基准数据集上的实验表明，KHG显著优于传统的消息传递GNN，以及PPR和热核扩散，特别是在噪声或结构复杂的图中。

英文摘要

Most graph neural network (GNN) cores rely on graph convolutions, typically implemented as message passing between direct (single-hop) neighbors. In many real-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods. Existing diffusion kernels, such as Personalized PageRank (PPR) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise. To address these limitations, we propose a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph data. KHG introduces multi-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.18444 2026-06-18 cs.LG cs.AI 新提交

TMR-GGNN: Credit Card Fraud Detection based on Time-Aware Multi-Relational Guided Graph Neural Network

TMR-GGNN：基于时间感知多关系引导图神经网络的信用卡欺诈检测

Rohit Tewari, Shubhankar Shilpi, Navin Chhibber, Devendra Singh Parmar, Sunil Khemka, Piyush Ranjan

发表机构 * Unysis Truist Banks Infinity Tech Group Technical Product（Unysis 信任银行 Infinity 技术集团技术产品）； Fairfax, USA（美国费尔法克斯）； Atlanta, USA（美国亚特兰大）； Sunnyvale, USA（美国 Sunnyvale）； Persistent Systems IEEE Vice Chair AeroSpace Chapter（Persistent 系统 IEEE 副主席航空航天分会）； Discover Financial Services（Discover 金融服务）； Edison, USA（美国埃迪森）

AI总结提出TMR-GGNN框架，通过时间窗口内异构实体交互建模、动态多关系图构建、时间感知注意力机制和对比学习解码器，结合InfoNCE与Focal Loss复合损失函数，解决数据不平衡和欺诈模式演化问题。

Comments 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON), Pages 7

详情

AI中文摘要

近年来，由于高度不平衡的数据、不断演变的欺诈模式以及交易实体间复杂的关联结构，信用卡欺诈检测面临重大挑战。为解决这些问题，本研究提出了一种名为时间感知多关系引导图神经网络（TMR-GGNN）的新框架。具体而言，所提出的TMR-GGNN通过建模客户、商户、设备和IP在时间窗口内的异构交互，扩展了编码器-解码器图神经网络（GNN）架构。随后，该TMR-GGNN方法构建了一个动态的多关系图，并在编码器中引入时间感知关系注意力机制，以基于时间邻近性和语义上下文自适应地权衡交易相关性。因此，解码器采用对比学习模块来区分真实和合成的交易模式，同时提高模型对罕见欺诈案例的泛化能力。此外，为有效管理严重的类别不平衡并强调判别性学习，引入了结合基于信息噪声对比估计（InfoNCE）的对比损失与Focal Loss的复合损失函数。这种集成有助于改进欺诈识别，同时减少假阴性。

英文摘要

In recent years, credit card fraud detection has faced significant challenges due to highly imbalanced data, evolving fraud patterns, and complex relational structures among transaction entities. To address these issues, this research proposes a novel framework called Timeaware Multi Relational Guided Graph Neural Network (TMR GGNN). Particularly, the proposed TMR GGNN extends the encoder decoder Graph Neural Network GNN architecture by modeling heterogeneous interactions across customers, merchants, devices, and IPs over temporal windows. Subsequently, the proposed TMR GGNN approach constructs a dynamic, multi relational graph and incorporates a time aware relational attention mechanism within the encoder to adaptively weigh the transaction relevance based on temporal proximity and semantic context. Consequently, the decoder employs a contrastive learning module to distinguish between real and synthesized transaction patterns, while improving the models generalization of rare fraud cases. Additionally, to effectively manage severe class imbalances and emphasize discriminative learning, a composite loss function combining Information Noise Contrastive Estimation (InfoNCE) based contrastive loss with Focal Loss is introduced. This integration assists in improving fraud identification while mitigating false negatives.

URL PDF HTML ☆

赞 0 踩 0

2606.18621 2026-06-18 cs.LG 新提交

Towards Anomaly Detection on Relational Data

面向关系数据的异常检测

Shiyuan Li, Yunfeng Zhao, Yue Tan, Qingfeng Chen, Yixin Liu, Shirui Pan

发表机构 * Griffith University（格里菲斯大学）； Guangxi University（广西大学）

AI总结提出RelAD框架，通过条件稀疏门控属性重建和双视图多关系边重建，有效检测关系数据中的属性异常和连接模式异常，在6个基准数据集上优于现有方法。

详情

AI中文摘要

关系数据库广泛应用于现实系统中管理结构化数据。从这类关系数据中检测异常对于识别欺诈、风险和异常行为至关重要，但尚未得到充分探索。关键挑战在于关系数据的内在复杂性：多表属性是高维且异质的，使得稀疏的异常线索容易被正常或无关信息淹没；异常还可能表现为跨不同外键关系的异常连接模式，而现有的表格和图异常检测方法难以捕捉。为解决这些问题，我们提出RelAD，一个基于重建的框架，从属性和关系边重建中捕捉异常。RelAD包含两个核心模块：条件稀疏门控属性重建，抑制冗余的多表属性并强调异常语义块；以及双视图多关系边重建，从内在和行为实体画像中检测关系特定的异常连接。得到的属性和关系信号通过轻量级融合模块整合，产生最终异常分数。我们进一步构建了6个具有系统性异常的基准数据集，大量实验表明RelAD在取得竞争性效率的同时，始终优于其他基线方法。

英文摘要

Relational databases are widely used for managing structured data in real-world systems. Detecting anomalies from such relational data is crucial for identifying fraud, risks, and abnormal behaviors, yet remains under-explored. The key challenges lie in the intrinsic complexity of relational data: multi-table attributes are high-dimensional and heterogeneous, making sparse abnormal clues easy to overwhelm by normal or irrelevant information; and anomalies may further manifest as abnormal connection patterns across different foreign-key relations, which existing tabular and graph anomaly detection methods are ill-suited to capture. To address them, we propose RelAD, a reconstruction-based framework that captures anomalies from both attribute and relational edge reconstruction. RelAD contains two core modules: conditional sparse-gated attribute reconstruction, which suppresses redundant multi-table attributes and emphasizes abnormal semantic blocks, and dual-view multi-relational edge reconstruction, which detects relation-specific abnormal connections from both intrinsic and behavioral entity profiles. The resulting attribute and relational signals are integrated through a lightweight fusion module to produce the final anomaly score. We further construct 6 benchmark datasets with systematic anomalies, on which extensive experiments show that RelAD consistently outperforms other baselines while achieving competitive efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.19185 2026-06-18 cs.LG 新提交

AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

AGDN：利用各向异性图扩散网络学习求解旅行商问题

Bolin Shen, Ziwei Huang, Zhiguang Cao, Yushun Dong

发表机构 * Florida State University（佛罗里达州立大学）； Singapore Management University（新加坡管理大学）

AI总结提出各向异性图扩散网络（AGDN），通过MixScore转移矩阵和各向异性扩散策略，有效利用图结构信息求解旅行商问题，在多种实例规模和分布上优于现有方法。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3817789

AI中文摘要

旅行商问题（TSP）是组合优化的基石，出现在许多实际场景中。尽管基于图的学习方法已被探索用于TSP，但如何更有效地利用图结构的问题仍然悬而未决。我们提出了各向异性图扩散网络（AGDN），一种新的图神经网络框架，旨在求解TSP。我们的方法解决了两个核心难点：（1）完全连接TSP图中缺乏信息丰富的拓扑先验，以及（2）在常用的图稀疏化技术后，最优解中丢失连接节点。为了克服这些问题，我们构建了一个MixScore转移矩阵，将节点相似性与成对距离相结合，并开发了一种各向异性图扩散策略，支持跨多跳的高效信息交换。涵盖不同实例规模和节点分布的全面实验表明，AGDN在保持计算时间竞争力的同时，始终优于现有方法。此外，AGDN能够很好地泛化到训练期间未见的问题规模和分布。实现代码已公开在：this https URL。

英文摘要

The Traveling Salesman Problem (TSP) is a cornerstone of combinatorial optimization and arises in many practical scenarios. Although graph-based learning approaches have been explored for TSP, the question of how to exploit graph structure more effectively remains open. We present the Anisotropic Graph Diffusion Network (AGDN), a new Graph Neural Network framework designed to solve TSP. Our method tackles two central difficulties: (1) the lack of informative topological prior in fully connected TSP graphs, and (2) losing connected nodes in the optimal solution after the commonly used graph sparsification techniques. To overcome these issues, we construct a MixScore transition matrix that merges node similarity with pairwise distance, and we develop an anisotropic graph diffusion strategy that supports efficient information exchange across multiple hops. Comprehensive experiments spanning diverse instance sizes and node distributions show that AGDN consistently outperforms existing methods while keeping computation time competitive. Furthermore, AGDN generalizes well to problem sizes and distributions beyond those seen during training. The implementation is publicly available at: https://github.com/LabRAI/AGDN.

URL PDF HTML ☆

赞 0 踩 0

2606.19303 2026-06-18 cs.LG 新提交

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution

P-K-GCN：物理增强的Koopman图卷积网络用于深度时空超分辨率

Xizhuo, Zhang, Zekai Wang, Fei Liu, Bing Yao

发表机构 * Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville（田纳西大学诺克斯维尔分校工业与系统工程系）； Charles F. Dolan School of Business, Fairfield University（费尔菲尔德大学查尔斯·F·多兰商学院）； Department of Electrical Engineering & Computer Science, The University of Tennessee, Knoxville（田纳西大学诺克斯维尔分校电气工程与计算机科学系）

AI总结提出P-K-GCN，结合样条GCN和Koopman算子理论，在非规则几何上实现时空超分辨率，并通过物理损失和理论分析保证误差降低。

详情

AI中文摘要

高保真时空动力学模拟计算成本高昂，因此需要高效的超分辨率技术从粗粒度输入重建高分辨率数据。传统数据驱动方法缺乏物理约束，而简单的物理信息学习难以处理不规则空间几何和复杂时间演化。为解决这些问题，我们提出了一种物理增强的Koopman图卷积网络（P-K-GCN），用于不规则几何上的时空超分辨率。具体地，首先设计了一个基于连续样条的GCN，直接从粗粒度图中提取空间依赖关系，并引入Koopman算子理论将非线性动力学投影到紧凑的潜空间，其中时间演化被线性化。其次，我们通过基于物理的损失增强优化目标，迫使数据驱动重建遵循物理定律，以提高预测保真度和鲁棒性。最后，我们提供了严格的理论分析，证明物理增强和Koopman正则化通过减少Rademacher复杂度和收紧泛化界，数学上保证了超分辨率误差的降低。我们在从稀疏低分辨率测量重建三维心脏几何上的高分辨率心脏电动力学上评估了我们的框架。数值实验表明，我们的方法相比基线模型实现了更高的精度。

英文摘要

High-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, necessitating efficient super-resolution techniques to reconstruct high-resolution data from coarse-grained inputs. Traditional data-driven methods often lack physical constraints, and simple physics-informed learning struggles with irregular spatial geometries and intricately evolving temporal dynamics. To tackle these challenges, we propose a Physics-augmented Koopman-enhanced Graph Convolutional Network (P-K-GCN) for spatiotemporal super-resolution on irregular geometries. Specifically, a continuous spline-based GCN is first designed to extract spatial dependencies directly from coarse graph, and Koopman operator theory is incorporated to project the nonlinear dynamics into a compact latent space where temporal progression is linearized. Second, we augment the optimization objective with a physics-based loss to force the data-driven reconstructions to adhere to physical laws for improving predictive fidelity and robustness. Finally, we provide a rigorous theoretical analysis, establishing that the physics augmentation and Koopman regularization mathematically guarantees a reduction in super-resolution error by diminishing Rademacher complexity and tightening generalization bounds. We evaluate our framework on reconstructing spatially high-resolution cardiac electrodynamics across a 3D heart geometry from sparse low-resolution measurements. Numerical experiments demonstrate that our method achieves superior accuracy compared to baseline models.

URL PDF HTML ☆

赞 0 踩 0

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN：面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London（伦敦大学学院）； The Hong Kong University of Science（香港科学大学）； Nokia Bell Labs（诺基亚贝尔实验室）； Technical University of Munich（慕尼黑技术大学）； University of Oxford（牛津大学）

AI总结提出UST-GNN框架，整合邻域连通性、异质城市特征和位置嵌入，在大伦敦4835个邻域的健康预测中，严格空间交叉验证下R²提升8.4-13.2%，并引入主成分模块解释嵌入。

详情

AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果，对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系，而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战，通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN}，一个统一的空间-拓扑图神经网络框架，将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集（包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果），UST-GNN在严格空间交叉验证下，比强统计基线、地理增强基线和图机器学习基线表现更优，样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块，从地理角度解释学习到的节点嵌入，并将其与政策相关的协变量联系起来。结果分析恢复了已知模式，为有争议的关联提供了新视角，并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外，UST-GNN提供了一个统一的GeoAI分析流程，可嵌入城市数字孪生工作流，用于情景测试、监测和数据驱动的决策，以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

URL PDF HTML ☆

赞 0 踩 0

2606.15633 2026-06-18 cs.LG 版本更新

Formalizing and Mitigating Structural Distortion in LLM Attention for Graph Reasoning

形式化并缓解大语言模型注意力中的结构失真以实现零样本图推理

Donald Loveland, Puja Trivedi, Ari Weinstein, Edward W Huang, Danai Koutra

发表机构 * University of Michigan（密歇根大学）； Amazon（亚马逊）

AI总结本文形式化了大语言模型处理文本属性图时因图线性化导致的结构失真机制，并提出轻量级推理时修改方法GaLA，通过校正注意力偏差提升零样本图推理性能。

Comments Accepted to KDD 2026

详情

AI中文摘要

大语言模型（LLM）在文本属性图（TAG）推理中展现出潜力。然而，将LLM应用于图需要将其结构线性化为序列，这引入了根源于图带宽问题的失真。虽然这种失真已被证明会降低性能，但通常归因于提示设计或模型规模，其潜在机制尚不清楚。在这项工作中，我们展示了旋转位置嵌入如何将图线性化为带宽相关的注意力衰减，抑制了序列化序列中被强制分隔开的图相邻节点之间的注意力。这将基于LLM的图推理的焦点从提示工程和规模缩放转向纠正注意力错位。受此分析启发，我们提出了图对齐语言注意力（GaLA），一种轻量级的、推理时修改LLM的方法。GaLA将注意力偏向图相邻节点，同时保留LLM的序列归纳偏差。在TAG基准测试中，GaLA以可忽略的开销提升了性能，表明失真是基于LLM的图推理中可纠正的瓶颈。

英文摘要

Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph bandwidth problem. While this distortion has been shown to degrade performance, it is often attributed to prompt design or model scale, leaving the underlying mechanism unclear. In this work, we show \textit{how} rotary positional embeddings turn graph linearization into bandwidth-dependent attention decay, suppressing attention between graph-adjacent nodes that are forced far apart in the serialized sequence. This shifts the focus of LLM-based graph reasoning from prompt engineering and scaling toward correcting attention misalignment. Motivated by this analysis, we propose \textbf{G}raph-\textbf{a}ligned \textbf{L}anguage \textbf{A}ttention (\textbf{GaLA}), a lightweight, inference-time modification for LLMs. GaLA biases attention toward graph-adjacent nodes while preserving the LLM's sequential inductive biases. Across TAG benchmarks, GaLA improves performance with negligible overhead, demonstrating that distortion is a correctable bottleneck in LLM-based graph reasoning.

URL PDF HTML ☆

赞 0 踩 0

2505.12369 2026-06-18 cs.AI cs.LG cs.LO 版本更新

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

知识图谱上具有传递关系的全几何多跳推理

Fernando Zhapa-Camacho, Robert Hoehndorf

发表机构 * KAUST Center of Excellence for Smart Health (KCSH)（智能健康卓越中心）； KAUST Center of Excellence for Generative AI（生成人工智能卓越中心）

AI总结提出GeometrE方法，将逻辑操作映射为纯几何变换，并引入传递损失函数，在保持可解释性的同时提升多跳推理性能。

Comments Accepted at ESWC 2026

详情

DOI: 10.1007/978-3-032-25156-5_14
Journal ref: The Semantic Web. ESWC 2026. Lecture Notes in Computer Science, vol 16549. Springer, Cham (2026)

AI中文摘要

知识图谱上的多跳逻辑推理需要将逻辑语义忠实地映射到潜在空间。当前的几何嵌入方法通过将实体映射到几何区域、逻辑操作映射到潜在变换，在此任务上表现出有效性。虽然几何嵌入可以为查询回答提供直接的可解释性框架，但当前方法仅利用了实体的几何构造，未能将逻辑操作映射为纯几何变换，而是使用神经组件来学习这些操作。另一方面，纯神经方法优于几何方法，但在潜在空间中缺乏可解释性。我们提出了GeometrE，一种用于多跳推理的几何嵌入方法，它将每个逻辑操作映射为潜在空间中的纯几何操作。此外，我们引入了一个传递损失函数，并表明与现有方法不同，它可以保留对所有a,b,c的逻辑规则：r(a,b)和r(b,c) -> r(a,c)。我们的实验表明，GeometrE优于当前最先进的几何方法，并在标准基准数据集上与现有的神经方法保持竞争力。

英文摘要

Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule for all a,b,c: r(a,b) and r(b,c) -> r(a,c). Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.19164 2026-06-18 cs.LG cs.AI 新提交

神经网络在渐变世界中会失去可塑性吗？

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta ； Canada CIFAR AI Chair

AI总结研究任务转换的突然性对神经网络可塑性损失的影响，通过输入/输出插值和任务采样模拟渐变环境，理论和实验表明可塑性损失严重程度与任务转换突然性密切相关，渐变环境下可显著减轻。

2303.18031 2026-06-18 cs.CV cs.AI cs.LG 版本更新

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

简单域泛化方法是开放域泛化的强基线

Masashi Noguchi, Shinichi Shirakawa

发表机构 * Graduate School of Environment and Information Sciences（环境与信息科学研究生院）； Yokohama National University（Yokohama国立大学）； Faculty of Environment（环境学系）

AI总结本文评估现有域泛化方法在开放域泛化中的表现，发现简单方法CORAL和MMD与复杂方法DAML竞争力相当，并通过集成学习和Dirichlet混合数据增强简单扩展后性能接近DAML且计算成本更低。

Comments Accepted at IJCNN 2024. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval

详情

DOI: 10.1109/IJCNN60899.2024.10650639

AI中文摘要

在现实应用中，机器学习模型需要处理开放集识别（OSR），即在推理过程中出现未知类别，同时还要处理域偏移，即训练和推理阶段数据分布不同。域泛化（DG）旨在处理推理阶段目标域在模型训练期间不可访问的域偏移情况。开放域泛化（ODG）同时考虑DG和OSR。域增强元学习（DAML）是一种针对ODG的方法，但其学习过程复杂。相比之下，尽管已提出多种DG方法，但它们尚未在ODG场景下进行评估。在本研究中，我们全面评估了现有DG方法在ODG中的表现，并表明两种简单的DG方法——相关对齐（CORAL）和最大均值差异（MMD）——在多种情况下与DAML具有竞争力。此外，我们通过引入DAML中使用的技术（如集成学习和Dirichlet混合数据增强）提出了CORAL和MMD的简单扩展。实验评估表明，扩展后的CORAL和MMD可以以较低的计算成本达到与DAML相当的性能。这表明简单的DG方法及其简单扩展是ODG的强基线。

英文摘要

In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the data distribution differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during the model training. Open domain generalization (ODG) considers DG and OSR. Domain-augmented meta-learning (DAML) is a method targeting ODG; however, it has a complicated learning process. By contrast, although various DG methods have been proposed, they have not been evaluated in ODG situations. In this study, we comprehensively evaluate the existing DG methods in ODG and show that the two simple DG methods, CORrelation ALignment (CORAL) and maximum mean discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG.

URL PDF HTML ☆

赞 0 踩 0

2510.15551 2026-06-18 cs.CL cs.AI cs.LG 版本更新

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

从统计视角重新思考跨语言差距

Vihari Piratla, Purvam Jain, Darshan Singh, Trevor Cohn, Preethi Jyothi, Partha Talukdar

发表机构 * Google DeepMind（谷歌深Mind）

AI总结提出跨语言差距源于目标语言响应方差，通过形式化偏差和无偏误差，并采用推理时集成方法降低方差，使跨语言迁移得分提升8%-50%以上。

Comments 30 pages

详情

AI中文摘要

任何知识片段通常以一种或少数几种自然语言表达在网页或大型语料库中。大型语言模型（LLMs）通过从源语言获取知识，并在使用目标语言查询时使其可访问，从而充当桥梁。跨语言差距是指使用目标语言而非源语言查询知识时准确率的下降。现有研究侧重于导致跨语言差距的建模或训练失败。在这项工作中，我们采取另一种视角来表征跨语言错误的性质，并假设目标语言中响应的方差是造成这一差距的关键原因。我们首次将跨语言差距形式化为有偏误差和无偏误差。通过多种控制方差并减少跨语言差距的推理时干预，我们实证验证了我们的假设。我们展示了几种测试时集成方法，这些方法降低了响应方差，从而将源-目标迁移得分提高了多达12个绝对百分点，在各种LLMs上实现了8%到超过50%的相对提升。

英文摘要

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

URL PDF HTML ☆

赞 0 踩 0

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化：利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple（苹果公司）； ETH Zürich（苏黎世联邦理工学院）

AI总结针对反因果设置下的域泛化问题，提出利用无标签数据估计环境扰动方向，通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性，并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据，这在标记数据稀缺时限制了它们的适用性。在这项工作中，我们研究了反因果设置下的域泛化，其中结果导致观察到的协变量。在这种结构下，影响协变量的环境扰动不会传播到结果，这促使我们对模型对这些扰动的敏感性进行正则化。关键在于，估计这些扰动方向不需要标签，使我们能够利用来自多个环境的无标签数据。我们提出了两种方法，分别惩罚模型对跨环境协变量均值和协方差变化的敏感性，并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后，我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.18307 2026-06-18 cs.LG cs.AI 新提交

跨模型VLM评判协议用于单图像3D网格质量（以及为什么廉价代理方法不足）

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结提出可重复的VLM评判协议评估单图3D网格质量，发现几何有效性和渲染CLIP等廉价代理方法无法替代VLM评判。

详情

AI中文摘要

单图像到3D生成器正在快速改进，但目前没有公认的、无需人工的方法来判断生成的网格是否优于另一个。从业者通常依赖廉价的自动代理方法（渲染空间的CLIP相似性和网格几何有效性统计），但这些方法在多大程度上跟踪感知质量尚未确定。我们做出两项贡献。首先，我们提出并验证了一个可重复的VLM评判评估协议：一个固定的24视角无头渲染装置、两个独立的视觉语言评判家族，以及一个强制的位置偏差校正，该校正查询两种呈现顺序并仅保留顺序一致的判决。两个评判家族彼此高度一致（Cohen's kappa = 0.66），远高于随机一致性基线。其次，以该协议为参考，我们证明廉价代理方法无法替代它。几何有效性平均而言仅是一个弱信号（因为，如我们所示，它是双峰的），且低于我们预先注册的目标，而渲染CLIP则处于随机水平。一个学习的Bradley-Terry头部坍缩到一个单一流形统计量（给渲染CLIP赋予负权重），并且与仅几何方法完全匹配，因此学习特征权重毫无收益。该代理方法也是双峰的：在具有可见几何缺陷的对比中显著高于随机水平，但在模糊对比中处于随机水平，这与几何有效性仅在缺陷视觉显著时跟踪评判者的行为一致。因此，我们推荐VLM评判协议作为在测试条件下（Google Scanned Objects上的两个前馈生成器，采用面丢失退化机制）可靠且可重复的评估器，并建议不要将几何/CLIP代理方法作为优化目标。

英文摘要

Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-bias correction that queries both presentation orders and keeps only order-consistent verdicts. The two judge families agree substantially with each other (Cohen's kappa = 0.66), well above the chance-agreement floor. Second, using this protocol as the reference, we show the cheap proxies do not substitute for it. Geometry validity is only a weak signal on average (because, as we show, it is bimodal) and stays below our pre-registered target, while render-CLIP is at chance. A learned Bradley-Terry head collapses onto a single manifoldness statistic (giving render-CLIP a negative weight) and matches geometry-only exactly, so learning the feature weights buys nothing. The proxy is also bimodal: it is significantly above chance on contrasts with visible geometric defects but at chance on ambiguous contrasts, consistent with geometry validity tracking the judge only when the defect is visually salient. We therefore recommend the VLM-judge protocol as a reliable, reproducible evaluator under the conditions tested (two feed-forward generators on Google Scanned Objects, with a face-drop degradation regime) and advise against geometry/CLIP proxies as optimization targets.

URL PDF HTML ☆

赞 0 踩 0

2606.18539 2026-06-18 cs.LG stat.ML 新提交

基于A-Contrario异常检测的种子引导半监督聚类

Nassir Mohammad

发表机构 * Cyber Innovation Lab, Airbus, Newport, UK（空中客车公司网络创新实验室（英国纽波特））

AI总结提出一种基于统计对偶性的半监督聚类框架，通过a-contrario推理和感知算法，利用种子标签初始化并迭代排除异常点，实现鲁棒聚类，在少量种子下达到强性能。

详情

AI中文摘要

本文介绍了一种基于分组原则与异常检测之间统计对偶性的半监督聚类框架。我们解决了噪声环境中鲁棒聚类定义的挑战——在该任务中，划分算法往往过度分配离群点，而基于密度的方法仍对启发式全局参数敏感。借鉴\textit{a-contrario}统计推理和格式塔邻近原则，我们将聚类定义为相对于均匀随机性零假设不包含任何异常点的最大数据点子集。该方法的核心是感知算法，该算法利用基于期望的原则性阈值（$\mathbb{E} < 1$）来识别异常点，无需手动参数调整。通过将聚类视为异常检测的对偶问题，我们采用迭代的“通过排除进行聚类”机制。该算法由种子引导，利用最少的用户提供标签来初始化鲁棒的聚类中位数并形成初始组，随后通过接纳非异常点进行扩展。这种方法自然地隔离了边缘点、孤立噪声和新兴的未知聚类。我们在合成和真实基准数据集上评估了该方法，包括通过原始、线性降维和邻域保持嵌入表示的图像和文本数据集。结果表明，在每个聚类仅使用10-30个种子的情况下，所提出的方法在实用的低调优基准测试协议下实现了具有竞争力且通常非常强的性能，同时在固定种子聚类数和迭代次数下，对观测数和维度均保持线性可扩展性。

英文摘要

This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments -- a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on \textit{a-contrario} statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10--30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

URL PDF HTML ☆

赞 0 踩 0

2606.18970 2026-06-18 cs.LG cs.AI cs.CV 新提交

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics（数学系）； Department of Political and Social Sciences（政治与社会科学系）

AI总结通过受控基准测试，比较量子与经典生成器在脑MRI数据增强中的性能，发现两者均未显著优于仅用真实数据训练，且量子生成器无额外优势。

Comments This work has been submitted to the IEEE for possible publication. This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

医学图像分类常受限于有限的标注数据，因此生成式增强被提出；最近，量子生成模型被用于此目的，并经常报告准确率提升。然而，这些声称通常基于单次训练运行，未匹配量子与经典生成器的参数预算，也未表征任何收益出现的数据范围。我们提出了一个受控基准测试，隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中，在该空间中，使用变分量子生成器或参数数量几乎相同的经典生成器（1648 vs. 1632）训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器，覆盖从5%到100%的标注数据比例，通过八个随机种子进行配对显著性检验（多重比较校正）以及集内多样性和潜在分布分析。在所有比例下，没有增强变体显著优于仅用真实数据训练，且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展：合成样本分布外移，并且在数据稀缺时严重模式崩溃，而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

URL PDF HTML ☆

赞 0 踩 0

2606.19297 2026-06-18 cs.LG cs.RO 新提交

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

VLA 甚至知道基础知识吗？衡量视觉-语言-动作模型中的常识和世界知识保留

Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro

发表机构 * CogAI Lab（CogAI实验室）； FusionBrain Lab（FusionBrain实验室）； IAI MSU（莫斯科大学人工智能研究所）； Lomonosov MSU（莫斯科国立罗蒙诺索夫大学）； NUST MISIS（国立研究型技术大学MISIS）； Applied AI Institute（应用人工智能研究所）； HSE University（高等经济大学）； Generalizable AI Systems（通用人工智能系统实验室）； ISP RAS（俄罗斯科学院系统编程研究所）； MIRAI ； Domain-specific NLP Group（领域特定自然语言处理组）

AI总结提出 Act2Answer 协议，通过动作回答评估 VLA 模型的知识保留，发现模型在简单概念上表现良好，但在丰富语义类别上存在差距，且 VQA 联合训练有助于知识保留。

Comments Project page: https://tttonyalpha.github.io/act2answer/

详情

AI中文摘要

具身视觉-语言-动作（VLA）模型通常通过在机器人数据上微调强大的预训练 VLM 获得，但目前尚不清楚它们在适应后保留了多少常识和事实知识。在知识敏感任务上的失败是模糊的，混淆了知识缺失与低级控制泛化能力差。我们引入 Act2Answer，一种轻量级协议，通过要求智能体通过动作来回答，将 VLM 知识基准适配到 VLA 评估。每个问题变成一个简短的桌面场景，其中智能体执行单个物体放置动作以选择候选答案，从而产生动作基础的、减少控制混淆的成功率。我们在不同的常识和世界知识类别中策划了这样的环境测试套件，并引入逐层意图探测以定位 VLM 骨干和动作头中与答案相关的信息。在对 7 个 VLA 模型和 9 个 VLM 基线的大规模研究中，我们系统地跨类别对模型进行排名，发现 VLA 在简单概念上表现稳健，但在更丰富的语义类别上相对于其源 VLM 显示出更大的差距，VQA 联合训练与更好的知识保留相关，并且答案相关信号在 VLA 中间层达到峰值，但在上层减弱。Act2Answer 可在以下网址获取：此 https URL。

英文摘要

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalization of low-level control. We introduce Act2Answer, a lightweight protocol that adapts VLM knowledge benchmarks to VLA evaluation by requiring agents to answer through action. Each question becomes a short tabletop episode where the agent performs a single object-placement action to select among candidate answers, yielding an action-grounded success rate with reduced control confounds. We curate a test suite of such environments across diverse commonsense and world-knowledge categories and introduce layerwise intent probing to localize answer-relevant information across the VLM backbone and action head. In a large-scale study of 7 VLA models and 9 VLM baselines, we systematically rank models across categories, finding that VLAs show solid performance on simple concepts while exhibiting larger gaps on richer semantic categories relative to their source VLMs, that VQA co-training is associated with better knowledge retention, and that answer-relevant signals peak in middle VLA layers but attenuate in upper layers. Act2Answer is available at https://tttonyalpha.github.io/act2answer/.

URL PDF HTML ☆

赞 0 踩 0

2606.18267 2026-06-18 cs.SI cs.LG cs.NE 交叉投稿

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

图实例景观：当结构相似性（不）反映最短路径性能时

Maryam Gholami Shiri, Ivana Krminac, Marko Djukanović, Sašo Džeroski, Eva Tuba, Tome Eftimov

发表机构 * Jožef Stefan Institute（乔泽夫·斯塔芬研究所）； Ljubljana, Slovenia（斯洛文尼亚卢布尔雅那）； Jožef Stefan International Postgraduate School（乔泽夫·斯塔芬国际研究生学院）； University of Banja Luka（班贾卢卡大学）； Faculty of Natural Science and Mathematics（自然科学与数学学院）； University of Nova Gorica（诺瓦戈里察大学）； Institute of Information Sciences (IZUM)（信息科学研究所（IZUM））； Trinity University（特里尼蒂大学）

AI总结通过将图嵌入低维结构特征空间并聚类，分析最短路径算法在不同图结构区域中的性能差异，发现结构相似性并不保证性能相似。

Comments Preprint version of a paper accepted at the 2026 IEEE Congress on Evolutionary Computation (IEEE CEC 2026)

详情

AI中文摘要

最短路径算法的基准测试通常基于异构图集上的聚合性能，这限制了对不同搜索范式如何响应实例结构的理解。我们采用实例景观视角进行图基准测试，将图嵌入到低成本的结构特征空间中，并将其聚类为结构相似的区域。研究了三个基准套件：加权 Erdős--Rényi 图、随机几何（无线）图和真实世界道路网络。我们评估了四种代表性的最短路径求解器，涵盖无信息精确搜索（Dijkstra）、双向精确搜索（双向 Dijkstra）、启发式引导精确搜索（A$^{*}$）和基于双端队列的策略（DEQ）。在多种特征选择方案下分析聚类鲁棒性，并使用非参数检验比较不同景观区域内的运行时间分布。虽然生成器参数诱导出稳定的结构区域，但我们发现特征空间相似性并不一定意味着性能相似：即使在相同的景观区域内，也经常观察到显著的运行时间变化。合并套件分析进一步表明，不同的基准族占据大部分不相交的区域。这些结果突出了结构景观用于最短路径算法结构感知基准测试的潜力和局限性。

英文摘要

Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős--Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.18281 2026-06-18 stat.AP cs.LG stat.ML 交叉投稿

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

竞争风险背景下条件平均处理效应估计指南

Daniel Klippert, Sarah Friedrich, Markus Pauly

发表机构 * Department of Statistics, TU Dortmund University（图恩-多特蒙德大学统计学系）； Research Center Trustworthy Data Science and Security, University Alliance Ruhr (UA Ruhr)（鲁尔大学联盟可信数据科学与安全研究中心）； Institute for Mathematics, University of Augsburg（艾希施泰特大学数学研究所）

AI总结针对竞争风险生存数据，比较六种元学习器估计条件平均处理效应，提供R包crsurvlearners指导模型选择。

详情

AI中文摘要

条件平均处理效应（CATE）是个性化医疗中治疗决策的核心。在竞争风险背景下，从生存数据估计CATE允许对特定感兴趣事件的治疗效果进行患者特异性评估，同时适当考虑替代事件类型。在存在合并症的情况下，这种区分至关重要，因为竞争死亡原因可能混淆治疗效果。本文聚焦于右删失生存时间和二元治疗，研究CATE定义为在固定时间点上感兴趣事件绝对风险的协变量条件差异。为此，我们研究了元学习器，这些学习器将机器学习算法适应于竞争风险场景中的CATE估计。我们系统比较了六种元学习器，结合Cox回归或随机生存森林进行风险建模，以及弹性网回归或随机森林进行直接CATE建模。为提供模型选择的实践指导，我们在多种模拟设置中评估其性能，这些设置在风险复杂性、治疗异质性、治疗分配、事件类型分布和删失方面有所不同。为促进应用，我们提供R包crsurvlearners，实现了所有考虑的方法。

英文摘要

Conditional average treatment effects (CATEs) are central to treatment decision-making in personalized medicine. In competing risks settings, estimating CATEs from survival data allows for patient-specific assessments of treatment effectiveness for a specific event of interest while properly accounting for alternative event types. This distinction is essential in the presence of comorbidities, where competing causes of death may otherwise confound the therapeutic benefit. Focusing on right-censored survival times with binary treatment, we examine CATEs defined as covariate-conditional differences in the absolute risk for the event of interest at a fixed time. To this end, we study meta-learners which adapt machine learning algorithms for CATE estimation in competing risks scenarios. We systematically compare six meta-learners, combining Cox regression or random survival forests for risk modeling with elastic net regression or random forests for direct CATE modeling. To provide practical guidance on model selection, we evaluate their performance in multiple simulation settings, that differ in hazard complexity, treatment heterogeneity, treatment assignment, event type distribution and censoring. To facilitate applied use, we provide the R package, crsurvlearners, which implements all considered approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.18302 2026-06-18 q-bio.OT cs.LG 交叉投稿

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

基于蛋白质的鱼类物种识别：孟加拉本土鱼类的数据集、模型与见解

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Mohammad Shahidur Rahman

发表机构 * Shahjalal University of Science

AI总结本研究构建了首个孟加拉本土鱼类蛋白质序列数据集，并系统评估了七种架构，提出了一种轻量级混合模型MotifCNN-Transformer+TA-PE，在资源受限场景下优于大型蛋白质语言模型ProtBERT。

Comments Published in 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN). \c{opyright} 2026 IEEE. Personal use of this material is permitted

详情

DOI: 10.1109/QPAIN69676.2026.11546620
Journal ref: 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)

AI中文摘要

在孟加拉国，正确识别鱼类物种对于粮食安全、经济发展和气候适应性至关重要。蛋白质序列直接反映功能和进化约束，对物种认证和生物多样性监测具有重要意义。然而，目前尚无针对孟加拉本土鱼类物种的蛋白质序列识别基准。本研究通过引入首个包含9种孟加拉本土鱼类2845条高质量蛋白质序列的精选数据集来填补这一空白。我们还通过对七种架构范式进行系统基准测试，建立了该领域首个蛋白质序列分类基线。此外，我们提出了一种实用的新型混合架构——MotifCNN与具有末端感知位置编码的Transformer（MotifCNN-Transformer+TA-PE）。该新架构实现了79.80%的准确率和0.80的宏F1分数。最高准确率83.04%由微调的蛋白质语言模型ProtBERT取得，该模型有4.2亿参数，需要双16GB GPU进行推理。根据McNemar检验，ProtBERT相比我们的MotifCNN-Transformer+TA-PE的3.24%准确率提升在统计上不显著（p = 0.1120）。在九类中的六类上，我们的新架构在每类识别中优于ProtBERT。此外，我们的MotifCNN-Transformer+TA-PE比ProtBERT快约5倍，小42倍，支持16倍更大的批处理大小，且无需GPU推理，使其在资源受限地区（如孟加拉农村）部署更为实用。除此之外，我们的基础性工作展示了系统发育关系对序列相似性的影响，并为南亚蛋白质依赖型经济中的渔业管理、食品认证和生物多样性保护建立了途径。

英文摘要

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

URL PDF HTML ☆

赞 0 踩 0

2606.18436 2026-06-18 stat.ML cs.LG 交叉投稿

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne（墨尔本大学数学与统计学学院）； Statistics, The University of Melbourne（墨尔本大学统计学系）； Statistics, University of Sydney（悉尼大学统计学系）； Responsible AI Research Centre, Australian Institute for Machine Learning（澳大利亚机器学习研究所负责任人工智能研究中心）； Amazon（亚马逊）； School of Mathematical Sciences, Adelaide University（阿德莱德大学数学科学学院）； Department of Machine Learning, MBZUAI（MBZUAI机器学习系）

AI总结提出TimeLAVA，一种学习无关框架，通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献，无需模型训练，在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情

Journal ref: ICML2026

AI中文摘要

数据估值量化单个样本的内在质量，以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列，有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型，限制了其泛化性，要么针对独立同分布数据设计，因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA，一种学习无关框架，通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异，结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值，无需模型训练，并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证，并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明，TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.18750 2026-06-18 stat.AP cs.LG 交叉投稿

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试：解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结本文系统解决CUPED应用中五个常见但被忽视的问题，包括最优调整规范、回归调整有效性、鲁棒方差估计，并扩展到多臂实验和两阶段抽样设计，通过理论分析和实验验证提供可靠方法，已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情

AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准，为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度，许多科技公司常规使用实验前数据控制实验（CUPED），该技术实现大幅方差缩减，同时保持平均处理效应估计的无偏性。尽管被广泛采用，CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先，我们提供各种后CUPED估计量的比较分析，以确定最优调整规范。其次，我们评估基于回归的调整的有效性，并描述为此类框架定制的鲁棒方差估计方法。最后，我们将研究扩展到复杂但常见的场景，包括多臂实验和两阶段抽样设计。我们的发现表明，在这些设置中，天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证，本工作加深了对CUPED的概念理解。值得注意的是，推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

URL PDF HTML ☆

赞 0 踩 0

2606.18972 2026-06-18 stat.ML cs.LG 交叉投稿

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结提出FOSC-X框架，通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类，支持聚类数约束，在线性时间内保证最优排序。

详情

AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务，可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X，一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架，同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类，捕捉层次结构的不同方面。无约束时，利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质，通过动态规划在多项式时间内求解前M问题。然而，这可能导致聚类数最终不理想——例如，在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质，因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战，该策略使用可行性的下界和上界维护紧凑的可行候选集，同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下，均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明，FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

URL PDF HTML ☆

赞 0 踩 0

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 交叉投稿

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

通过正-无标签学习量化与审计大语言模型评估

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

AI总结针对大语言模型作为评估者存在的系统性偏差（如冗长偏好），提出基于部分最优传输的几何审计框架，利用少量人工验证正样本校正偏差，无需重训练即可提升与人类偏好的一致性。

详情

AI中文摘要

大语言模型（LLM）越来越多地被用作可扩展评估的评判者，然而这种LLM作为评判者的系统表现出与语义质量脱节的系统性偏差，最显著的是冗长偏差。同时，人工监督成本高昂且通常具有选择性，产生可靠的正向判断，但大多数输出未被标记且质量可能参差不齐。我们将选择性人工监督下的LLM评估形式化为一个正-无标签学习问题，并提出了一个基于部分最优传输的几何审计框架。通过在固定嵌入空间中将一小部分人工验证的正样本与可靠的无标签输出子集对齐，我们的方法识别出与人类一致的偏好，并在无需重新训练的情况下纠正有偏的评判者。实验表明，该方法提高了与人类偏好的一致性，增强了对呈现偏差的鲁棒性，并提供了可解释的置信度估计，为现有的LLM作为评判者流程提供了一种可扩展且统计上有依据的替代方案。

英文摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.19184 2026-06-18 cs.CV cs.LG 交叉投稿

When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

当AUC误导：域偏移下深度伪造检测器的极化感知评估

Dat Nguyen, Cosmin Radoi, Romain Hermary, Marcella Astrid, Nesryne Mejri, Enjie Ghorbel, Djamila Aouada

发表机构 * Cristal Laboratory, National School of Computer Sciences, University of Manouba（马努巴大学国家计算机科学学院Cristal实验室）

AI总结针对现有AUC评估无法反映真实场景中混合数据源和不同伪影类型的问题，提出Cross-dataset AUC（Cross-AUC）指标，通过平均每域AUC并引入预测极化度量（Wasserstein距离）来评估域偏移鲁棒性，实验证明其有效性。

详情

AI中文摘要

生成式AI的最新进展，如扩散模型和换脸工具，使得创建高度逼真的深度伪造成为可能，导致了包括金融欺诈和非自愿色情内容在内的现实危害。为此，深度伪造检测成为一个活跃的研究领域，近期方法越来越关注提高对未见操作的泛化能力。这通常通过跨多个数据集分别测量的ROC曲线下面积（AUC）来评估。然而，这种评估未能反映检测器面对混合数据源和不同伪影类型的真实场景。为解决这一局限，我们引入一种新指标——跨数据集AUC（Cross-AUC），该指标平均每域AUC并加入预测极化度量，以考虑对域偏移的鲁棒性。极化程度通过类别分数分布之间的Wasserstein距离量化。Cross-AUC不仅更真实地评估深度伪造检测器在域偏移下的泛化能力，而且具有可解释性，因为它能更好地解释性能下降的原因。在七个基准数据集上的实验证明了其实用性。

英文摘要

Recent advances in generative AI, such as diffusion models and face-swapping tools, have enabled the creation of highly realistic deepfakes, leading to real-world harms including financial fraud and non-consensual explicit content. In response, deepfake detection has become an active research area, with recent methods increasingly focusing on improving generalization to unseen manipulations. This is typically evaluated using the Area Under the ROC Curve (AUC) measured separately across multiple datasets. However, such an evaluation fails to reflect real-world scenarios where detectors face a mixture of data sources and varying artifact types. To address this limitation, we introduce a novel metric, Cross-dataset AUC (Cross-AUC) that averages per-domain AUCs with a measure of prediction polarization for taking into account the robustness to domain shift. The polarization extent is quantified by the Wasserstein Distance between class score distributions. Cross-AUC not only assesses the generalization capabilities of deepfake detectors under domain shifts more realistically, but it is also interpretable as it better explains the reason behind a drop in performance. Experiments performed on seven benchmark datasets demonstrate its practical relevance.

URL PDF HTML ☆

赞 0 踩 0

2606.19245 2026-06-18 cs.AI cs.LG 交叉投稿

自主驾驶数据集：从2000万篇论文到大规模精细化生物医学知识

Haydn Jones, Yimeng Zeng, Alden Rose, Li S. Yifei, Yining Huang, Kaiwen Wu, Jiaming Liang, Maggie Ziyu Huan, Yoseph Barash, Cesar de la Fuente-Nunez, Osbert Bastani, Zachary Ives, Mark Yatskar, Jacob R. Gardner

发表机构 * Department of Computer and Information Science, University of Pennsylvania（宾夕法尼亚大学计算机与信息科学系）； Department of Genetics, University of Pennsylvania（宾夕法尼亚大学遗传学系）； Departments of Bioengineering and Chemical and Biomolecular Engineering, University of Pennsylvania（宾夕法尼亚大学生物工程与化学与生物分子工程系）

AI总结本文提出通过PubMed自动生成结构化数据集，实现更大规模、更精细和更准确的生物医学知识，展示Starling系统在多个任务中生成大规模数据集并提升准确性。

详情

AI中文摘要

人工编纂的生物医学仓库在生物活性、基因组学和化学领域昂贵且滞后于原始文献，丢弃实验背景，掩盖了评估数据正确性和覆盖范围所需的细微差别。我们证明PubMed本身可以被自动且经济地转化为结构化数据集，这些数据集比它们取代的编纂数据库更大、更细致和更准确。我们提出了三个耦合贡献：(1)基于九个生物医学本体的LLM实体标记流水线，能够在包含2250万篇论文和2500亿个token的PubMed语料库中标记45亿个实体，跨19个类别；(2)混合稀疏密集检索支持在标记语料库上执行实体过滤的语义查询；(3)Starling，一个多代理深度研究系统，仅给定自然语言任务描述，即可设计精度和召回率目标的检索过滤器，诱导提取模式，并输出具有丰富细节字段和支持段落的结构化记录。在六个任务中——血脑屏障渗透性、口服生物利用度、急性毒性（LD50）、基因疾病关联、蛋白质亚细胞定位和化学反应——Starling生成约630万条记录（每任务91K至3M条）；其中一些是目前最大的公开数据集。前沿模型对我们的提取的拒绝率在0.6-7.7%之间，远低于我们在广泛使用的编纂数据集上测量的错误率（例如，BBB_Martins为16.5%，Bioavailability_Ma为7.3%）。除了规模和准确性外，支持段落还携带了表格数据库所丢弃的细微差别——例如，口服生物利用度可能取决于进食与否的状态。共同，语料库、检索和代理为AI驱动的治疗设计建立了基础。代码和数据集：https://github.com/starling-labs/starling.

英文摘要

Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

URL PDF HTML ☆

赞 0 踩 0

2606.07591 2026-06-18 cs.LG cs.AI cs.CL 版本更新

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

ResearchClawBench: 端到端自主科学研究基准

Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Koutian Wu, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结提出ResearchClawBench基准，包含10个领域40个任务，通过多模态评分标准评估自主科研能力，最强智能体仅得21.5分，揭示当前系统在实验协议、证据匹配和科学核心方面的不足。

详情

AI中文摘要

AI编码智能体越来越多地用于科学工作，但其端到端自主研究能力仍然难以验证。我们提出了ResearchClawBench，一个用于评估自主科学研究的基准，涵盖来自10个科学领域的40个任务。每个任务基于一篇真实发表论文，提供相关文献和原始数据，并在评估期间隐藏目标论文。专家策划的多模态评分标准将目标科学制品分解为加权标准，从而能够评估目标论文级别的重新发现，同时为新发现留出空间。我们在统一协议下评估了七个自主研究（auto-research）智能体，并通过轻量级ResearchHarness评估了十七个原生LLM。当前系统远未达到可靠的重新发现：最强的自主智能体Claude Code平均得分为21.5，最强的ResearchHarness LLM Claude-Opus-4.7平均得分为20.7，LLM前沿均值仅为26.5。错误分析表明，失败集中在实验协议不匹配、证据不匹配和缺失科学核心。ResearchClawBench为衡量自主科学研究进展提供了一个可复现的评估前沿。

英文摘要

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

URL PDF HTML ☆

赞 0 踩 0

2407.18245 2026-06-18 cs.CV cs.LG 版本更新

TopBench：表格问答中隐式预测推理的基准

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye

发表机构 * School of Artificial Intelligence, Nanjing University, China（人工智能学院，南京大学，中国）； National Key Laboratory for Novel Software Technology, Nanjing University, China（新型软件技术国家重点实验室，南京大学，中国）

AI总结提出TopBench基准，包含779个样本和四个子任务，评估大语言模型在表格问答中识别隐式预测意图并进行可靠推理的能力，发现当前模型在意图识别上存在困难。

详情

AI中文摘要

大型语言模型（LLM）推动了表格问答的发展，其中大多数查询可以通过提取信息或简单聚合来回答。然而，一类常见的现实世界查询是隐式预测性的，需要从历史模式中推断未观察到的答案，而不仅仅是检索。这些查询带来了两个挑战：识别潜在意图和对大规模表格进行可靠的预测推理。为了评估LLM在带有隐式预测任务的表格问答中的表现，我们引入了TopBench，一个包含779个样本的基准，涵盖四个子任务，从单点预测到决策制定、处理效应分析和复杂过滤，要求模型生成涵盖推理文本和结构化表格的输出。我们在基于文本和代理工作流下评估了多种模型。实验表明，当前模型通常在意图识别上存在困难，默认进行查找。更深入的分析发现，准确的意图消歧是引导这些预测行为的前提。此外，提升预测精度的上限需要整合更复杂的建模或推理能力。

英文摘要

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring the inference of unobserved answers from historical patterns rather than mere retrieval. These queries introduce two challenges: recognizing latent intent and reliable predictive reasoning over massive tables. To assess LLMs in such Tabular questiOn answering with implicit Prediction tasks, we introduce TopBench, a benchmark consisting of 779 samples across four sub-tasks, ranging from single-point prediction to decision making, treatment effect analysis, and complex filtering, requiring models to generate outputs spanning reasoning text and structured tables. We evaluate diverse models under both text-based and agentic workflows. Experiments reveal that current models often struggle with intent recognition, defaulting to just lookups. Deeper analysis identifies that accurate intent disambiguation serves as the prerequisite for leading these predictive behaviors. Furthermore, elevating the upper bound of prediction precision requires the integration of more sophisticated modeling or reasoning capabilities.

URL PDF HTML ☆

赞 0 踩 0

2605.03460 2026-06-18 cs.AI cs.LG 版本更新

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

FinSTaR：面向时间序列推理模型的金融推理

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, Soonyoung Lee, Wonbin Ahn

发表机构 * LG AI Research（LG人工智能研究）

AI总结针对时间序列推理模型在金融领域的失效问题，提出基于2x2能力分类法的FinSTaR模型，通过Compute-in-CoT和Scenario-Aware CoT策略在FinTSR-Bench基准上达到78.9%平均准确率。

Comments KDD Workshop on SciSoc Agents & LLMs 2026 (Oral Presentation)

详情

AI中文摘要

时间序列推理模型在通用领域表现出色，但在具有独特特征的金融领域却持续失败。我们提出一个通用的2x2能力分类法，通过交叉1)单实体与多实体分析，以及2)当前状态评估与未来行为预测来划分TSRM能力。我们在金融领域实例化该分类法——其中确定性评估与随机性预测的区分尤为关键——形成十个金融推理任务，并基于标普股票构建FinTSR-Bench基准。为此，我们提出FinSTaR（金融时间序列思考与推理），在FinTSR-Bench上训练，并针对每个类别采用不同的思维链策略。对于评估（确定性，即可从可观测数据计算得出），我们采用Compute-in-CoT，一种程序化思维链，使模型能够直接从原始价格推导答案。对于预测（本质上是随机的，即受不可观测因素影响），我们采用场景感知思维链，在做出判断前生成多种场景，模拟金融分析师在不确定性下的推理方式。所提方法在FinTSR-Bench上达到78.9%的平均准确率，显著优于LLM和TSRM基线。此外，我们展示了四个能力类别通过联合训练具有互补性和相互增强性，并且场景感知思维链相比标准思维链持续提升预测准确率。代码已公开：https://github.com/seunghan96/FinSTaR。

英文摘要

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail in the financial domain, which exhibits unique characteristics. We propose a general 2 x 2 capability taxonomy for TSRMs by crossing 1) single-entity vs. multi-entity analysis with 2) assessment of the current state vs. prediction of future behavior. We instantiate this taxonomy in the financial domain-where the distinction between deterministic assessment and stochastic prediction is particularly critical-as ten financial reasoning tasks, forming the FinTSR-Bench benchmark based on S&P stocks. To this end, we propose FinSTaR (Financial Time Series Thinking and Reasoning), trained on FinTSR-Bench with distinct chain-of-thought (CoT) strategies tailored to each category. For assessment, which is deterministic (i.e., computable from observable data), we employ Compute-in-CoT, a programmatic CoT that enables models to derive answers directly from raw prices. For prediction, which is inherently stochastic (i.e., subject to unobservable factors), we adopt Scenario-Aware CoT, which generates diverse scenarios before making a judgment, mirroring how financial analysts reason under uncertainty. The proposed method achieves 78.9% average accuracy on FinTSR-Bench, substantially outperforming LLM and TSRM baselines. Furthermore, we show that the four capability categories are complementary and mutually reinforcing through joint training, and that Scenario-Aware CoT consistently improves prediction accuracy over standard CoT. Code is available at https://github.com/seunghan96/FinSTaR.

URL PDF HTML ☆

赞 0 踩 0

2606.16000 2026-06-18 cs.CL cs.LG 版本更新

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

GRACE-DS：数据科学中的受保护奖励引导智能体修正环境

Aleksandr Tsymbalov, Danis Zaripov, Artem Epifanov, Anastasiya Palienko

发表机构 * ITMO University（ITMO大学）； HSE University（高等经济学院）

AI总结提出GRACE-DS，一个用于评估LLM驱动的AutoML智能体在部署前性能的隔离环境，通过隐藏的可执行验证器衡量预测性能、泄漏避免、可重复性等指标，实验证明其灵活迭代交互模式优于基线方法。

详情

AI中文摘要

我们介绍了GRACE-DS，一个数据科学中的受保护奖励引导智能体修正环境，用于对LLM驱动的AutoML智能体进行部署前评估。GRACE-DS是一组在隔离环境中的评估指标，可应用于特定组织的表格ML任务。它将智能体暴露于现实的工作流阶段，从规划和数据检查到特征工程、模型开发、验证、代码修复直至最终提交，同时隐藏的可执行验证器不仅衡量最终预测性能，还衡量泄漏避免、可重复性、协议有效性、修正行为和奖励对齐。最强的结构化机制——灵活迭代交互（我们的方法）——实现了比单次生成、非结构化交互和基于重启的基线更高的端到端归一化隐藏测试质量，同时提高了协议有效完成率。经过7000多个回合的验证，这些结果确立了GRACE-DS作为评估基于LLM的AutoML智能体在生产类条件下按照组织特定要求执行机器学习工作流能力的稳健平台。

英文摘要

We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be applied to tabular ML tasks specific to a particular organization. It exposes agents to realistic workflow stages, from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction (our approach), achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-based baselines, while also improving protocol-valid completion. Validated across more than 7,000 episodes, these results establish GRACE-DS as a robust platform for assessing the capacity of LLM-based AutoML agents to execute machine learning workflows under production-like conditions and in accordance with organization-specific requirements.

URL PDF HTML ☆

赞 0 踩 0

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述：数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University（浙江大学）； Nanyang Technological University（南洋理工大学）； Alibaba Group（阿里巴巴集团）

AI总结综述直接偏好优化（DPO）在理论、变体、数据集和应用方面的进展，指出其作为RL-free替代方案的潜力与局限，并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情

DOI: 10.1109/TPAMI.2026.3704314

AI中文摘要

随着大语言模型（LLMs）的快速发展，将策略模型与人类偏好对齐变得日益关键。直接偏好优化（DPO）作为一种有前景的对齐方法，作为从人类反馈中强化学习（RLHF）的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性，但文献中目前缺乏对这些方面的深入综述。在这项工作中，我们对DPO中的挑战和机遇进行了全面回顾，涵盖理论分析、变体、相关偏好数据集和应用。具体而言，我们基于关键研究问题对近期DPO研究进行分类，以提供对DPO当前格局的透彻理解。此外，我们提出了几个未来研究方向，为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

URL PDF HTML ☆

赞 0 踩 0

2606.18287 2026-06-18 cs.LG 新提交

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis: 解剖分辨的干预方法用于消除多模态神经影像混杂因素

Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan

发表机构 * University of Pittsburgh（匹兹堡大学）； University of Maryland（马里兰大学）； University of Southern California（南加州大学）； Binghamton University（宾汉姆顿大学）； University of Texas Rio Grande Valley（德克萨斯大学里奥格兰德河谷分校）

AI总结提出Artemis框架，通过区域级因果干预学习特定脑区的混杂因素表示，消除fMRI和DTI多模态神经影像中人口统计学混杂因素对GNN的影响，在三个基准上提升性能。

Comments 11 pages, 8 figures

详情

AI中文摘要

多模态神经影像学整合了来自fMRI的功能连接和来自DTI的结构连接，使得使用图神经网络对脑网络进行无创分析成为可能。然而，年龄和性别等人口统计学因素系统地混淆了脑连接与临床结果之间的关系，导致GNN利用虚假捷径而非学习因果不变表示。尽管最近的因果GNN方法在图建模层面引入因果关系，但其因果机制仍然是领域无关的，没有考虑临床神经影像数据中固有的真实世界混杂因素。此外，脑网络是基于图谱分区构建的，每个区域对人口统计学因素表现出不同的敏感性，因此需要区域感知的调整。我们提出了Artemis，一个区域级因果框架，通过在每个脑区域独立进行因果干预，使用轻量级参数学习区域特定的混杂因素表示，从而弥合了这一差距。我们的调整综合利用多模态功能和结构特征进行图推理，作为一个与任意GNN骨干兼容的插件模块。在三个基准（用于疾病诊断的ADNI、用于痴呆分期的OASIS和用于性别分类的HCP）上的实验表明，与代表性的基于GNN的基线相比，该方法具有一致的改进。多项支持实验进一步证明了统计显著性和神经科学可解释性。

英文摘要

Multimodal neuroimaging, integrating functional connectivity from fMRI and structural connectivity from DTI, enables non-invasive analysis of brain networks using graph neural networks. However, demographic factors such as age and sex systematically confound the relationship between brain connectivity and clinical outcomes, causing GNNs to exploit spurious shortcuts rather than learning causally invariant representations. While recent causal GNN methods introduce causality at the graph-modeling level, their causal mechanisms remain domain-agnostic without accounting for the real-world confounders inherent in clinical neuroimaging data. Moreover, brain networks are constructed from atlas-based parcellations where each region exhibits distinct sensitivity to demographic factors, necessitating region-aware adjustment. We propose Artemis, a region-level causal framework that bridges this gap with causal intervention at each brain region independently by learning region-specific confounder representations with lightweight parameters. Our adjustment comprehensively utilized the multimodal functional and structural features for graph reasoning as a plug-in module compatible with arbitrary GNN backbones. Experiments on three benchmarks, ADNI for disease diagnosis, OASIS for dementia staging, and HCP for sex classification, demonstrate consistent improvements over representative GNN-based baselines. Multiple supporting experiments further demonstrate statistical significance and neuroscientific interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.18316 2026-06-18 cs.LG 新提交

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

基于数据驱动的土壤湿度回归与分类模型综述

Ilektra Tsimpidi, George Georgoulas, Vidya Sumathy, George Nikolakopoulos

发表机构 * Electrical Engineering\ University of Technology\ , Sweden（电气工程\ 技术大学\ ，瑞典）

AI总结综述了基于AI的土壤湿度建模方法，分为五类：统计时间序列、地统计、经典机器学习、深度学习和概率/贝叶斯方法，利用多源数据实现回归或分类。

Comments 14 pages, 3 figures, AIAI 2026 Conference

详情

AI中文摘要

土壤湿度（SM）建模构成一个复杂的时空学习问题，其特点是非线性环境相互作用、异构数据源和有限的地面观测。基于物理的方法，如水量平衡模型，依赖于明确的水文方程和高质量的输入，但其计算成本和可扩展性限制阻碍了大规模部署。数据驱动的人工智能（AI）方法已成为灵活的替代方案，能够以较少的建模假设提取土壤湿度与环境变量之间的经验关系。本文对基于AI的土壤湿度估计和分类模型进行了结构化综述。现有方法被组织为五类：（a）统计时间序列模型，（b）地统计方法，（c）经典机器学习（ML）模型，（d）深度学习（DL）模型和（e）概率/贝叶斯方法。这些模型利用历史土壤湿度记录、气象变量、植被指数、地形、土壤特征和地理位置数据来执行回归或分类任务。

英文摘要

Soil Moisture (SM) modelling constitutes a complex spatiotemporal learning problem characterised by nonlinear environmental interactions, heterogeneous data sources, and limited ground observations. Physics-based approaches, such as water balance models, rely on explicit hydrological equations and high-quality inputs, but their computational cost and scalability limitations restrict large-scale deployment. Data-driven artificial intelligence (AI) methods have emerged as flexible alternatives, enabling the extraction of empirical relationships between soil moisture and environmental variables with reduced modelling assumptions. This work presents a structured survey of AI-based models for soil moisture estimation and classification. Existing approaches are organized into five categories: (a) statistical time-series models, (b) geostatistical methods (c) classical machine learning (ML) models, (d) Deep Learning (DL) models and (e) Probabilistic/Bayesian methods. These models leverage historical soil moisture records, meteorological variables, vegetation indices, topography, soil characteristics, and geolocation data to perform regression or classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.18319 2026-06-18 cs.LG cs.AI cs.HC cs.SE 新提交

可训练光子测量用于物理信息偏微分方程学习

Jiale Linghu, Hao Dong, Yangshuai Wang

发表机构 * Xidian University（西安电子科技大学）； National University of Singapore（新加坡国立大学）

AI总结提出一种光子量子神经场，将坐标编码为可训练光学相位，通过多光子Fock空间干涉混合并从光子数测量解码，作为物理信息残差最小化的可训练表示，在七种PDE基准上展示相位复杂度转变，在困难区域误差低一个数量级且参数少约四分之一。

详情

AI中文摘要

光子量子机器学习提供了一条从相位、干涉和测量构建可训练物理表示的途径。然而，其在科学机器学习中的作用仍 largely unexplored。物理信息神经场提供了一个自然设置，因为微分方程需要保留相位、频率和导数结构的试验空间。这里我们引入一种光子量子神经场，其中坐标成为可训练光学相位，通过多光子Fock空间干涉混合，并从光子数测量解码。光子电路本身作为神经场表示进行优化，而非固定特征图或硬件加速器。因此，光子测量是一种可训练表示，在此基础上最小化物理信息残差。在七个椭圆、波动、非线性色散和逆PDE基准测试中，我们观察到相位复杂度转变：经典坐标和傅里叶特征网络在平滑区域足够，而光子场在残差导数放大相位失配时最准确。在最困难区域，它给出最低误差，差距达一个数量级，且可训练参数约为经典基线四分之一。冻结和打乱控制以及噪声压力测试将这一增益归因于学习到的干涉和在复合扰动下稳定的Fock概率读出。这些结果将光子量子测量识别为科学机器学习的一种表示学习原理。

英文摘要

Photonic quantum machine learning offers a route to trainable physical representations built from phase, interference and measurement. However, its role in scientific machine learning remains largely unexplored. Physics-informed neural fields provide a natural setting, because differential equations require trial spaces that preserve phase, frequency and derivative structure. Here we introduce a photonic quantum neural field in which coordinates become trainable optical phases, are mixed by multi-photon Fock-space interference and are decoded from photon-number measurements. The photonic circuit is optimized as the neural-field representation itself, not as a fixed feature map or hardware accelerator. Photonic measurement is therefore a trainable representation on which the physics-informed residual is minimized. Across seven elliptic, wave, nonlinear dispersive and inverse PDE benchmarks, we observe a phase-complexity transition: classical coordinate and Fourier-feature networks suffice in smooth regimes, whereas the photonic field is most accurate when residual derivatives amplify phase mismatch. In the hardest regimes it gives the lowest errors, with margins reaching an order of magnitude and about one quarter of the trainable parameters of classical baselines. Frozen and shuffled controls, together with noise stress tests, attribute this gain to learned interference and stable Fock-probability readout under compound perturbations. These results identify photonic quantum measurement as a representation-learning principle for scientific machine learning.

URL PDF HTML ☆

赞 0 踩 0

2606.18726 2026-06-18 cs.LG cs.AI 新提交

Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring

基于图锚定交叉注意力Transformer神经网络的预测过程监控中结构约束完整事件序列生成

Fang Wang, Ernesto Damiani

发表机构 * Department of Computer Science, University of Milan（米兰大学计算机科学系）

AI总结提出图锚定交叉注意力Transformer（GGATN），通过全局过程图作为结构化记忆、Transformer自注意力编码序列位置、图锚定交叉注意力注入过程拓扑，结合维特比式图约束解码，一次性生成完整事件序列，在六个基准日志上优于LLM基线。

Comments 40 pages

详情

AI中文摘要

结构约束的事件序列生成仍然具有挑战性，因为生成的路径必须保持转移可行性、时间顺序、终止和属性一致性。在预测过程监控（PPM）中，这一挑战表现为完整事件序列生成，而现有工作主要处理子任务，如下一个活动、剩余时间、结果和属性预测。本文提出了图锚定交叉注意力Transformer神经网络（GGATN）用于这一统一的PPM任务。GGATN使用全局过程图作为结构化活动记忆，通过Transformer自注意力对序列位置进行上下文化，并通过图锚定交叉注意力注入过程拓扑。与自回归解码不同，GGATN一次性生成活动、时间戳、长度以及事件级和序列级属性，随后进行维特比风格的图约束解码以获得可行路径和显式终止。在六个基准事件日志上的实验表明，其生成质量优于局部指令提示的LLM基线。GGATN在序列相似性、Damerau-Levenshtein相似性、基于二元组的控制流相似性和持续时间分布方面取得了强劲性能，同时保持零幻觉活动和零序列级属性不一致。消融分析证实了全局图编码器作为稳定的结构先验。可解释性分析展示了图结构、序列上下文、反馈细化和约束解码如何塑造生成过程。

英文摘要

Structurally constrained event sequence generation remains challenging because generated paths must preserve transition feasibility, temporal order, termination, and attribute consistency. In predictive process monitoring (PPM), this challenge appears as full event sequence generation, whereas existing work mainly addresses component tasks such as next activity, remaining time, outcome, and attribute prediction. This paper proposes the Graph Grounded Cross Attention Transformer Neural Network (GGATN) for this unified PPM task. GGATN uses a global process graph as structured activity memory, contextualizes sequence positions through Transformer self attention, and injects process topology through graph grounded cross attention. Unlike autoregressive decoding, GGATN generates activities, timestamps, length, and event level and sequence level attributes in a single pass, followed by Viterbi style graph constrained decoding for feasible paths and explicit termination. Experiments on six benchmark event logs show more reliable generation quality than local instruction prompted LLM baselines. GGATN achieves strong performance on sequence similarity, Damerau Levenshtein similarity, bigram based control flow similarity, and duration distribution, while maintaining zero hallucinated activities and zero sequence level attribute inconsistency. Ablation analyses confirm the global graph encoder as a stable structural prior. Interpretability analyses show how graph structure, sequence context, feedback refinement, and constrained decoding shape generation.

URL PDF HTML ☆

赞 0 踩 0

2606.18732 2026-06-18 cs.LG cs.CV 新提交

Low-Cost Neuromorphic Fall Detection Using Synthetic Event Data and Hybrid SNNs

低成本神经形态跌倒检测：使用合成事件数据和混合SNN

Guillermo Rojas, Gonzalo Soto, Daniel Yunge

发表机构 * School of Electrical Engineering Pontificia Universidad Católica de Valparaíso, Chile（瓦尔帕莱索天主教大学电气工程学院）

AI总结提出混合SNN-CNN模型，从智能手机视频合成事件相机数据，实现高效准确的跌倒检测。

Comments 4 pages, 6 figures, presented at ICONS 2025 during the Poster Session, but not published

2606.18857 2026-06-18 cs.LG physics.ao-ph 新提交

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

研究理想化Isca模拟中平流层突然增温的机器学习模拟的归纳偏差

Oskar Bohn Lassen, Simon Driscoll, Stephen I. Thomson, Sebastian Schemm, Francisco C. Pereira

发表机构 * Technical University of Denmark（丹麦技术大学）； University of Cambridge（剑桥大学）； University of Exeter（埃克塞特大学）

AI总结测试不同架构的归纳偏差对模拟平流层突然增温动力学的影响，发现三维垂直耦合是关键，但低预测误差不保证物理一致性。

详情

AI中文摘要

机器学习模拟器越来越多地用于天气预报，并有可能通过学习动态重要的可预测性来源，将技能扩展到次季节到季节时间尺度。一个关键挑战是模型能否利用可预测性锚点，例如平流层变率，这些锚点在超出短期超前时间时影响对流层环流。我们使用配对的理想化Isca模拟测试架构归纳偏差如何影响对平流层突然增温（SSW）动力学的模拟，这些模拟仅在施加的波-2加热扰动上有所不同。在用于一步预测的卷积、变换器和基于图的架构中，当平流层动态安静时，模型差异不大，但当类似SSW的变率活跃时，差异显著扩大。我们的结果确定显式三维垂直耦合是机器学习模拟平流层动力学的关键归纳偏差。然而，Eliassen-Palm通量诊断表明，低预测误差并不能保证物理上真实的波-平均流相互作用，平流层波驱动结构中仍存在相干误差。

英文摘要

Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

URL PDF HTML ☆

赞 0 踩 0

2606.18864 2026-06-18 cs.LG cs.AI 新提交

Scaling Learning-based AEB with Massive Unlabeled Data

基于大规模无标签数据的可扩展学习型自动紧急制动

Xiangyu Wang, Yang Zhan, Mengxiang Hao, Chuanchuan Zhong, Yansong Jia, Junjie Zhang, Yu Han, Xin Jiang, Zhen Cao, Ying Wang, Yulun Song, Zhitao Xu

发表机构 * Li Auto

AI总结提出稳定元反馈半监督学习框架，通过噪声感知解耦和运动学门控伪标签，利用大规模无标签数据提升自动紧急制动性能，实现超100:1正误触发比和35%无事故里程提升。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情

AI中文摘要

本文研究如何在生产约束下，利用大规模无标签车队数据扩展基于学习的自动紧急制动（AEB）。我们的方法基于元反馈半监督学习（MF-SSL），其中教师模型为无标签驾驶数据生成伪标签，并使用小型有标签锚定集作为安全关键反馈进行更新。在生产中，锚定歧义和有标签-无标签不匹配会放大系统性的伪标签错误，导致误触发。我们提出了一种稳定的MF-SSL框架，包括：(i) 噪声感知解耦，从教师监督更新路径中移除易产生歧义的锚定；(ii) 运动学门控伪标签，结合教师冲突惩罚，抑制无标签数据上由不匹配引起的风险幻觉，同时保持广泛覆盖。大量实验表明，随着无标签数据从1M扩展到1B窗口，模型性能持续提升，在保持舒适性的同时提高了安全性。经过1B数据训练的学生模型已部署到数十万辆车辆上，并在超过10^9公里的行驶中得到验证，实现了超过100:1的正误触发比，且相比仅基于规则的基线，无事故行驶里程提升了35%。

英文摘要

This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.18882 2026-06-18 cs.LG cs.AI eess.SP 新提交

Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems

面向旋转系统不平衡表征的域偏移感知神经网络

Bernardo Feijó Junqueira, Claudio Kiyoshi Umezu, Bruno Bilhar Karaziack, Tomaz Junior, Daniel Alves Castello

发表机构 * Springer Nature

AI总结提出域偏移感知神经网络，通过最大均值差异策略对齐源域与目标域特征，解决变工况下旋转轴不平衡质量估计的回归问题，实验证明该方法在域偏移未知时显著提升预测精度。

详情

AI中文摘要

本文研究了域偏移感知神经网络在回归任务中的应用，旨在估计不同运行条件下旋转轴的不平衡质量。实验数据来自一个测试台，其中主轴上安装有带不平衡质量的法兰，在不同转速下驱动，同时可选择性地激活副轴以引入域差异。不平衡质量固定在径向距离上，使用三轴加速度计记录系统的动态响应。质量估计的逆问题在域自适应框架中提出，网络采用最大均值差异策略进行训练，以对齐源域和目标域的特征表示。结果表明，显式处理域偏移能有效提高预测精度，尤其是在系统的物理行为和域偏移来源不完全已知且超出训练条件的情况下。这些发现凸显了域偏移感知模型在结构健康监测回归任务中的潜力。

英文摘要

This work investigates the application of a domain-shift aware neural network for regression tasks aimed at estimating unbalance masses in rotating shafts under varying operating conditions. Experimental data were collected from a test rig in which a primary shaft, equipped with a flange carrying unbalanced masses, was driven at different rotational speeds, while a secondary shaft could be optionally activated to introduce domain discrepancy. The unbalance masses were positioned at a fixed radial distance, and the dynamic response of the system was recorded using triaxial accelerometers. The inverse problem of mass estimation is formulated within a domain adaptation framework, where the network is trained with a maximum mean discrepancy strategy to align feature representations across source and target distributions. The results demonstrate the effectiveness of explicitly addressing domain shift in improving prediction accuracy, especially when the system's physical behavior and sources of domain discrepancy are not fully known and fall outside the training conditions. These findings highlight the potential of domain-shift aware models for regression tasks in Structural Health Monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.18933 2026-06-18 cs.LG cs.IR stat.ME 新提交

一种面向约束感知的生物过程开发的人机协同贝叶斯优化框架

Samuel Stricker, Claus Wirnsperger, Alessandro Butté, Laura Helleckes, Gonzalo Guillén Gosálbez, Antonio del Rio Chanona, Mehmet Mercangöz

发表机构 * Imperial College London（伦敦帝国理工学院）； DataHow AG ； ETH Zurich（苏黎世联邦理工学院）

AI总结提出一种扩展的帕累托前沿引导采样框架，通过将高斯过程代理的约束满足概率和鲁棒性作为多目标优化目标，结合交互式仪表盘实现人机协同的约束感知生物过程优化。

详情

AI中文摘要

本文提出了帕累托前沿引导采样（PFGS）的一种扩展，这是一种人机协同（HitL）贝叶斯优化（BO）框架，其中高斯过程（GP）代理导出的量被重新表述为多目标优化问题的目标，得到的帕累托前沿暴露给领域专家进行交互式候选选择，而不是返回单一的自动推荐。该框架在两个方向上进行了扩展：约束优化通过将满足输出规格限的后验概率作为显式的帕累托目标来处理，该概率从GP后验分布解析计算得到；鲁棒优化通过蒙特卡洛采样策略来处理，该策略估计在用户定义的输入扰动变异性下的期望下置信性能，捕捉在可能的实现偏差下的性能退化。由此产生的多维帕累托表示通过交互式仪表盘上的成对二维投影同时显示预测性能、模型不确定性、概率约束满足和输入鲁棒性之间的权衡，使得选择标准能够随着代理模型的改进和开发目标的演变而迭代细化。该框架在一个八维的补料分批中国仓鼠卵巢（CHO）细胞培养模拟器上进行了展示，证明了系统性地识别高性能、满足可行性且对扰动具有鲁棒性的操作条件，并说明了专家定义的需求如何提供原则性的停止标准并支持实验资源的明智分配。

英文摘要

This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

URL PDF HTML ☆

赞 0 踩 0

2606.19255 2026-06-18 cs.LG 新提交

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

SCAN: 通过多尺度邻域中心聚类增强时间序列异常检测

Xingze Zheng, Hanyin Cheng, Siyuan Wang, Yiting Hao, Peng Chen, Yuan Jun, Yang Shu

发表机构 * East China Normal University（华东师范大学）； APPLab, Huawei（华为2012应用实验室）； Huawei（华为）

AI总结提出SCAN方法，通过多尺度聚类增强重建型异常检测，在表示层集成正常模式聚类中心约束重建，在异常判据层结合聚类概率与重建误差，并利用邻域中心表示改进聚类性能，在多个真实数据集上达到最优。

详情

AI中文摘要

时间序列异常检测在广泛的现实应用中扮演着关键角色。基于重建的方法已成为主流范式，但它们面临过度泛化和欠泛化问题，且难以平衡。为了解决这一问题，我们引入多尺度聚类来增强基于重建的方法。在表示层面，我们整合正常模式的聚类中心表示，以约束模型针对代表性正常模式进行重建，防止强大能力和表示能力的主导。在异常判据层面，我们基于聚类成员概率推导异常置信度分数，并将其与重建误差结合，提供双重检测标准。此外，聚类中心表示和异常置信度分数的有效性取决于聚类性能。因此，我们提取邻域中心表示用于多视图聚类，以提高聚类性能。在来自不同应用领域的多个真实数据集上的大量实验表明，SCAN达到了最先进的性能。

英文摘要

Time series anomaly detection plays a crucial role in a wide range of real-world applications. Reconstruction-based methods have become the mainstream paradigm, but they suffer from over-generalization and under-generalization problems, which are challenging to balance. To address this, we introduce multi-scale clustering to enhance reconstruction-based methods. At the representation level, we integrate the cluster center representations of normal patterns to constrain the model to target representative normal patterns for reconstruction, preventing dominance of powerful capacity and representation capability. At the anomaly criterion level, we derive anomaly confidence score based on cluster membership probability and combine it with reconstruction error, providing dual criteria for detection. Furthermore, the effectiveness of the cluster center representations and anomaly confidence score depends on the clustering performance. Accordingly, we extract neighborhood-centered representations for multi-view clustering to improve clustering performance. Extensive experiments on multiple real-world datasets from diverse application domains demonstrate the state-of-the-art performance of SCAN.

URL PDF HTML ☆

赞 0 踩 0

2606.19292 2026-06-18 cs.LG 新提交

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

使用普适环境感知信息进行ICU谵妄风险分层

Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi

发表机构 * University of Florida（佛罗里达大学）； Stanford University（斯坦福大学）

AI总结本研究利用环境声音和光照强度数据，通过高效序列神经网络模型预测ICU患者谵妄风险，发现声音是主要预测因子，结合光照可改善短期预测，AUC达0.80。

详情

AI中文摘要

谵妄是重症监护室（ICU）中常见且严重的并发症，与发病率增加、住院时间延长和医疗成本升高相关。尽管其普遍存在，早期预测和预防仍具挑战性。环境因素如环境声音和光照可能影响谵妄的发生，但在风险评估中常被忽视。在本研究中，我们检验了光照强度和声压级是否能在多个预测时间窗口内独立预测谵妄。我们评估了四种高效的序列神经网络模型，这些模型基于来自9个ICU的309名患者的数据，用于预测10种预测窗口大小的谵妄。我们使用Shapley Additive Explanations分析报告了特征重要性和影响方向。卷积模型实现了最强的区分能力，在声音数据和组合数据上的AUC均为0.80。声音特征是整体上的主要预测因子。将声音与光照结合改善了短期（<1周）预测，组合模型在感知期后立即分配最高风险。这些发现表明，被动环境感知，尤其是声音，可以为谵妄风险评估增加临床上有意义、可解释的信号，并为丰富多模态ICU预测和预防策略提供实用途径。

英文摘要

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the onset of delirium, yet they are often overlooked in risk assessments. In this study, we examined whether light intensity and sound pressure levels can independently predict delirium across multiple prediction horizons. We evaluated four efficient sequential neural network models on data collected from 9 ICUs across 309 patients to predict delirium for 10 prediction-window sizes. We reported feature importance and direction of influence using Shapley Additive Explanations analysis. The convolutional model achieved the strongest discrimination, with AUC = 0.80 on sound data and on combined data. Sound features were the dominant predictors overall. Integrating sound with light improved short-term ($<1$ week) prediction, with the combined model assigning the highest risk immediately after the sensing period. These findings suggest that passive ambient sensing, especially sound, can add a clinically meaningful, interpretable signal for delirium risk estimation and offer a practical pathway to enrich multimodal ICU prediction and prevention strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.17077 2026-06-18 physics.chem-ph cs.AI cs.LG quant-ph 交叉投稿

通过ASR自验证与蒸馏实现可靠的神经编解码文本转语音：跨模型与编解码器的近零灾难性失败

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结针对开放自回归神经编解码TTS模型的随机灾难性失败（静音、早停、重复或幻觉），提出基于ASR往返的格式鲁棒度量，通过最佳N自验证将失败率降至近零，并通过蒸馏将鲁棒性迁移至单次解码，在无测试代价下关闭约52-58%的失败。

详情

AI中文摘要

开放自回归神经编解码文本转语音（TTS）模型在典型输入上表现优异，但会出现随机灾难性失败：在相当一部分话语中，它们会发出静音、提前终止或陷入重复或幻觉内容。我们表明这种失败模式可以廉价地消除。在单一格式鲁棒度量（通过ASR往返的灾难性失败率）下，最佳N ASR自验证将失败率降至近零：在标准语料库（LibriSpeech）上N=2时未观察到失败，在困难提示集上N=4时也未观察到。这不是单一模型的假象：该减少在四个开放编解码TTS系统和三个神经编解码器（XCodec2、SNAC、Mimi）上复现，其中三个系统在N=2时达到近零下限。然后，通过将自验证行为蒸馏到模型中，我们在推理时免费实现了修复，这恢复了单次解码中的大部分鲁棒性，在无测试代价下关闭了困难输入上约52-58%的失败。蒸馏增益集中在需要的地方（困难输入）；在已经可靠的散文上，没有改进空间且无检测到变化。一项受控比较添加了一个干净的负面结果：离线直接偏好优化（DPO/IPO）并未优于普通监督蒸馏，而在线迭代变体虽有前景但在我们的评估规模下统计上不显著。我们诚实地报告了唯一抵抗的模型（一个更大的Llasa，其中规模并未明显帮助）以及一个罕见词能力上限，该上限无法通过任何自蒸馏方法克服。

英文摘要

Open autoregressive neural-codec text-to-speech (TTS) models sound excellent on typical inputs yet suffer stochastic catastrophic failures: on a meaningful fraction of utterances they emit silence, terminate early, or collapse into repetitive or hallucinated content. We show this failure mode is cheap to remove. Under a single format-robust metric (a catastrophic-failure rate via an ASR round-trip), best-of-N ASR self-verification drives failures to near-zero: no observed failures remain by N=2 on a standard corpus (LibriSpeech) and by N=4 on a hard prompt set. This is not an artifact of one model: the reduction replicates across four open codec-TTS systems and three neural codecs (XCodec2, SNAC, Mimi), reaching the near-zero floor by N=2 on three of the four. We then make the fix free at inference time by distilling the self-verified behaviour into the model, which recovers much of the robustness in single-shot decoding, closing ~52-58% of the failure mass on hard inputs at no test-time cost. The distillation gain concentrates where it is needed (hard inputs); on already-reliable prose there is no headroom and no detectable change. A controlled comparison adds a clean negative: offline direct preference optimization (DPO/IPO) does not beat plain supervised distillation, and an online iterative variant is promising but not statistically separable at our evaluation size. We report honestly the one model that resists (a larger Llasa where scale did not obviously help) and a rare-word capability ceiling that no self-distillation method overcomes

URL PDF HTML ☆

赞 0 踩 0

2606.18429 2026-06-18 cs.CV cs.AI cs.LG 交叉投稿

CAOA -- Completion-Assisted Object-CAD Alignment

CAOA -- 补全辅助的物体-CAD对齐

Hiranya Garbha Kumar, Minhas Kamal, Balakrishnan Prabhakaran

发表机构 * University at Albany（奥尔巴尼大学）

AI总结提出CAOA方法，结合语义感知点云补全和对称感知相对位姿估计，在Scan2CAD上实现17%精度提升，并发布S2C-Completion数据集。

Comments GitHub: https://github.com/MinhasKamal/CAOA

详情

DOI: 10.1109/3DV69130.2026.00047
Journal ref: Thirteenth International Conference on 3D Vision (3DV), 2026

AI中文摘要

准确地将CAD模型与室内RGB-D扫描中的对应物体对齐是3D语义重建的核心挑战。该任务需要估计9自由度（DoF）位姿——位置、旋转和三轴尺度——但受到噪声和不完整扫描以及导致几何畸变的分割误差的阻碍。我们提出补全辅助的物体-CAD对齐（CAOA），该方法将语义和上下文感知的点云补全模块与对称感知的相对位姿估计算法相结合，实现CAD模型与扫描物体的精确对齐。现有的补全方法通常在合成数据集上训练和评估，往往难以泛化到真实扫描。为弥合这一差距，我们引入了一种针对室内场景的合成数据生成策略，通过与广泛使用的补全数据集进行定量比较，验证了其显著减小合成到真实领域差距的效果。此外，我们发布了S2C-Completion，一个来自Scan2CAD的超过8500个物体-CAD对的专家标注数据集，用于真实室内单物体补全，并作为该任务的新基准。对于物体-CAD对齐，我们通过对称感知损失融入对称信息，提高了对对称模糊的鲁棒性。在Scan2CAD基准上，CAOA相比最先进方法实现了17%的精度提升。

英文摘要

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2606.18464 2026-06-18 astro-ph.IM astro-ph.EP cs.LG 交叉投稿

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

利用深度学习建模径向速度数据中的多普勒频移以探测地球质量系外行星

Isidro Gómez-Vargas, Xavier Dumusque, Yinan Zhao, Khaled Al Moulla, Michael Cretignier

发表机构 * Department of Astronomy, University of Geneva 51 chemin de Pegasi, 1290 Versoix, Switzerland. Instituto de Astrofı\'isica de Andaluc\'ia (CSIC), Glorieta de la Astronom\'ia s/n, E-18008 Granada, Spain. Institute of Space Sciences (CSIC), Carrer de Can Magrans s/n, E-08193 Barcelona, Spain. Department of Astronomy, University of Texas at Austin, 2515 Speedway, Austin, TX 78712, USA. Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, 4150-762 Porto, Portugal. Department of Physics, University of Oxford, OX13RH Oxford, UK.

AI总结针对恒星活动干扰，提出结合物理启发光谱表示与深度学习的框架，通过交叉验证和遗传算法优化，可靠恢复振幅≥25 cm/s、周期10-550天的行星信号，并发布Python包doppleriann。

Comments 20 pages, 14 figures. Accepted for publication in Astronomy & Astrophysics

详情

AI中文摘要

由于恒星活动的影响，在恒星径向速度测量中探测由地球质量行星引起的微小多普勒频移仍然极具挑战性。许多在模拟数据上表现良好的深度学习方法难以可靠地应用于真实恒星光谱。本工作的目标是开发一种深度学习框架，使其能够泛化到真实、未见过的光谱，并提高径向速度数据中地球质量行星的可探测性。我们在注入行星信号的HARPS-N太阳光谱上训练人工神经网络，使用基于通量和谱线形成温度的物理驱动光谱表示，以及它们的速度梯度。探索了两种训练策略：留出测试和交叉验证。通过基于遗传算法的超参数优化增强模型鲁棒性，并使用蒙特卡洛dropout量化预测不确定性。在交叉验证策略下，我们最精确的神经网络模型能够可靠地恢复振幅≥25 cm/s、周期在10到550天之间的行星信号的振幅、相位和轨道周期。此外，在所有测试案例中，成功恢复的信号对应于多普勒频移预测周期图中最显著的峰值。基于温度的光谱壳表示始终优于基于通量的壳。我们还发布了实现该框架的Python包doppleriann。我们的结果表明，将物理驱动的光谱表示与深度学习相结合，为从真实观测的径向速度数据中探测地球质量行星提供了一条有前景的途径，该建模框架既具有物理基础又具有统计严谨性，并包含了不确定性量化和优化的训练策略。

英文摘要

Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.18698 2026-06-18 cs.RO cs.AI cs.LG 交叉投稿

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

利用能量特征进行基于深度学习的表面分类：三个独立数据集的比较分析

Alexander Belyaev, Oleg Kushnarev

AI总结研究评估能量特征作为表面分类的独立或辅助模态的可行性，在三个数据集上比较多种深度学习架构，发现CNN性能最优，纯能量特征准确率85-90%，与惯性特征结合可达96-99%，且能量特征可稳定提升1-2%准确率。

详情

AI中文摘要

基于能量的方法在移动机器人表面分类中仍是一个相对未被充分研究的途径，尽管在受限环境中取得了有希望的结果。本研究评估了使用能量衍生特征作为独立分类模态或作为惯性数据补充输入的可行性。在三个公开数据集上进行了全面评估，比较了现代深度学习架构（包括循环神经网络、卷积神经网络、仅编码器变压器和Mamba状态空间模型）在自动超参数调整和输入序列长度优化下的性能。模型在所有评估数据集上均实现了比先前报道值更高的准确率，其中卷积神经网络取得了最高的整体性能。当仅依赖基于能量的特征时，模型分类准确率在85-90%范围内，比与惯性特征结合时（96-99%）低约5-10%。用能量特征增强惯性数据导致平均准确率持续提高1-2%。这些发现表明，仅依赖能量特征的分类器为独立部署提供了足够的准确性，同时在与其它感知模态结合使用时也提供了一致的增益。

英文摘要

The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

URL PDF HTML ☆

赞 0 踩 0

2606.18723 2026-06-18 cs.CV cs.LG 交叉投稿

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

临床对齐的几何约束用于鲁棒的IVUS血管边界分割

Yunshu Chen, Litao Yang, Giuseppe Di Giovanni, Jordan Tan, Deval Mehta, Andrew Lin, Derek Chew, Masasi Fujino, Julie Butters, Stephen Nicholls, Zongyuan Ge, Kyung Hoon Cho

发表机构 * AIM For Health Lab, Monash University（莫纳什大学AIM健康实验室）； Department of Data Science and Artificial Intelligence, Faculty of IT, Monash University（莫纳什大学信息技术学院数据科学与人工智能系）； Monash University Victorian Heart Institute（莫纳什大学维多利亚心脏研究所）； School of Computing Technologies, RMIT University（皇家墨尔本理工大学计算技术学院）； National Cerebral and Cardiovascular Center（国立循环器病研究中心）； Department of Cardiology, Chonnam National University Hospital and Medical School（全南大学医院和医学院心脏病学系）

AI总结提出GeoCat网络，通过双编码器与可微几何一致性损失，在IVUS分割中降低边界漂移和拓扑错误，提升临床几何测量精度。

Comments MICCAI2026 Accepted

详情

AI中文摘要

血管内超声（IVUS）管腔和外弹性膜（EEM）分割对于定量评估冠状动脉斑块负荷至关重要。管腔或EEM勾画的误差会直接传播到斑块面积、斑块负荷和几何测量中。然而，优先考虑重叠分数的标准方法常常遭受边界漂移和拓扑错误，导致临床测量不准确。我们提出GeoCat，一个几何一致性网络，使用双笛卡尔-极坐标编码器，结合跨域注意力和时间融合，处理5帧IVUS片段。可微的几何一致性损失直接监督临床相关描述符，包括直径、方向和横截面积。该模型在来自146名患者的12,242张标注帧上训练，这些帧使用两种商用IVUS系统采集。我们使用分割准确性和斑块相关临床指标评估性能，包括Dice/IoU、边界测量（95HD（mm）、ASSD）、拓扑违规率和临床几何误差（dmax/dmin、角度和面积）。在我们的数据集上，GeoCat实现了0.93的Dice，将95HD降低到0.14 mm，并将拓扑违规率降低到1.0%。重要的是，它显著提高了几何保真度，产生0.13-0.16 mm的直径误差和约8度的角度误差，支持可靠的斑块负荷量化。

英文摘要

Intravascular ultrasound (IVUS) lumen and external elastic membrane (EEM) segmentation is important for quantitative coronary plaque burden assessment. Errors in lumen or EEM delineation directly propagate to plaque area, plaque burden and geometric measurements. However, standard methods prioritising overlap scores often suffer from boundary drift and topology errors, leading to inaccurate clinical measurements. We present GeoCat, a geometry-consistent network that processes 5-frame IVUS clips using dual Cartesian-polar encoders with cross-domain attention and temporal fusion. A differentiable geometry consistency loss directly supervises clinically relevant descriptors including diameters, orientations, and cross-sectional areas. The model is trained on 12,242 annotated frames from 146 patients acquired with two commercial IVUS systems. We evaluate performance using both segmentation accuracy and plaque-relevant clinical metrics, including Dice/IoU, boundary measures(95HD (mm), ASSD), topology violation rate, and clinical geometry errors (dmax/dmin, angles, and areas). On our dataset, GeoCat achieves a Dice of 0.93, reduces 95HD to 0.14 mm, and lowers topology violations to 1.0%. Importantly, it significantly improves geometric fidelity, yielding diameter errors of 0.13-0.16 mm and angular errors of ~8 degrees, supporting reliable plaque burden quantification.

URL PDF HTML ☆

赞 0 踩 0

2606.18734 2026-06-18 eess.SP cs.LG 交叉投稿

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结提出点云辅助切线高斯溅射（PC-TGS）框架，通过融合稀疏无线电测量与密集LiDAR几何数据，将角功率谱外推到未测量网格，实现大规模无线数字孪生中的高效信道预测。

详情

AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中，局部统计信道建模（LSCM）通过从参考信号接收功率（RSRP）测量中建模信道多径角功率谱（APS），已成为一种针对高效网络优化的最先进方法。然而，尽管其有效性，LSCM无法在绝大多数没有测量值的位置预测APS，这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战，我们提出了\emph{点云辅助切线高斯溅射}（PC-TGS），这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合，将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布，通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域，而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署，我们推导了用于APS bin积分的闭式高斯加权平均（GWA），并提供了可证明的误差界。在LiDAR扫描的城市规模数据集（500万个点，6310个RSRP样本）上的评估表明，与最先进的基线相比，PC-TGS在APS和RSRP预测性能上更优，并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

URL PDF HTML ☆

赞 0 踩 0

2606.18824 2026-06-18 cs.CV cs.LG 交叉投稿

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里？从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow（格拉斯哥大学计算机科学学院）； James Watt School of Engineering, University of Glasgow（格拉斯哥大学詹姆斯·瓦特工程学院）； Department of Computer Science, Durham University（杜伦大学计算机科学系）

AI总结提出MMPM框架，通过行为感知交互模块和基于CVAE的模态感知轨迹预测器，分别建模行人过马路和不过马路两种模式，提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情

AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性，因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图，通常会产生多模态（即多个模式）分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹，这可能导致次优的“混合模式”轨迹，这些轨迹位于不同的运动模式之间，并在真实场景中变得不合理。在本文中，我们提出MMPM，一种模态感知框架，基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成：行为感知行人交互模块（PIM），通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互；以及基于CVAE的模态感知轨迹预测器（MTP）模块，分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明，我们的方法超越了最先进的基线。我们提出的MTP是模型无关的，可以集成到现有框架如BiTrap-NP和SGNet-ED中，以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议，将预测与时空一致的真实轨迹匹配，展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

URL PDF HTML ☆

赞 0 踩 0

2606.18876 2026-06-18 cs.CV cs.LG 交叉投稿

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

光学相干断层扫描中基于轨迹对齐的时间无关流的测试时自适应

Veit Hucke, Thomas Pinetz, Gregor Reiter, Ursula Schmidt-Erfurth, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria（人工智能研究所、医学数据科学中心、维也纳医学大学，奥地利）； Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna, Austria（医学人工智能综合中心、维也纳医学大学，奥地利）； Department of Ophthalmology and Optometry, Medical University of Vienna, Austria（眼科与视光学部、维也纳医学大学，奥地利）； Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Austria（眼科图像分析实验室、维也纳医学大学，奥地利）

AI总结提出一种基于流匹配的测试时自适应方法，通过直方图匹配和去除时间条件，生成高质量替代图像，在AMD分割中达到最优性能。

Comments Accepted in MICCAI

2606.18932 2026-06-18 astro-ph.EP astro-ph.IM cs.AI cs.LG 交叉投稿

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

TransitNet: 一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架

Xingchen Yan, Jian Ge, Qingtian Liu, Kevin Willis, Quanquan Hu, Jiapeng Zhu

发表机构 * Shanghai Astronomical Observatory, Shanghai 200030, China（上海天文台，上海200030，中国）； University of Chinese Academy of Sciences, Yanqi Lake Campus, East Road 1, Huairou, Beijing 101408, China（中国科学院大学，燕琦湖校区，东路1号，北京101408，中国）； Science Talent Training Center, Gainesville, FL, 32606 USA（科学人才培训中心，佛罗里达州盖恩斯维尔，32606美国）

AI总结提出紧凑型注意力增强深度学习框架TransitNet，用于低信噪比凌星盲搜索，在SNR 6-8范围内达到95.2%准确率，恢复率93.0%，远超TLS和BLS，且模型仅1.5 MB，推理速度提升12-25倍。

Comments 24 pages, 23 figures, 3 tables, submitted to MNRAS

详情

AI中文摘要

受中长周期地球大小行星观测不完整性的启发，我们提出了TransitNet，一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架。为了实现盲搜索条件下现实的方法开发和客观的阈值校准，我们开发了一个统一的数据集构建、基准测试和阈值选择框架。在由未见过的Kepler目标构建的恢复基准测试中，TransitNet在具有挑战性的信噪比6-8范围内达到了95.2%的准确率，并优于TLS和BLS，ROC-AUC和PR-AP值分别为0.974和0.982。在一次注入的地球大小和亚地球大小凌星恢复实验中，TransitNet实现了93.0%的恢复率，显著超过TLS（63.1%）和BLS（60.0%）。除了检测，TransitNet还提供了基于注意力的凌星窗口和中点估计。在一个独立评估集上，97.4%的注入凌星被估计的凌星窗口完全覆盖。应用于真实的Kepler观测，该模型成功恢复了所有34个选定的已确认Kepler行星，平均绝对凌星中点误差为1.24小时。该模型结合了约1.5 MB的紧凑体积和高推理效率，相对于CPU-TLS加速约12-25倍，相对于CPU-BLS加速约4-5倍。这些结果表明，TransitNet在测试范围内为低信噪比凌星盲搜索提供了一个准确、可扩展且计算高效的框架，并激励其扩展到更长周期的地球大小行星搜索。

英文摘要

Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method development and objective threshold calibration under blind-search conditions, we develop a unified dataset construction, benchmarking, and threshold-selection framework. On recovery benchmarks constructed from unseen Kepler targets, TransitNet attains 95.2 percent accuracy in the challenging SNR range of 6 to 8 and outperforms both TLS and BLS, achieving ROC-AUC and PR-AP values of 0.974 and 0.982, respectively. In an injected Earth-size and sub-Earth-size transit recovery experiment, TransitNet achieves a recovery rate of 93.0 percent, substantially exceeding those of TLS (63.1 percent) and BLS (60.0 percent). In addition to detection, TransitNet provides attention-based estimates of transit windows and midpoints. On an independent evaluation set, 97.4 percent of injected transits are fully covered by the estimated transit window. Applied to real Kepler observations, the model successfully recovers all 34 selected confirmed Kepler planets, with a mean absolute transit midpoint error of 1.24 hours. The model combines a compact footprint of about 1.5 MB with high inference efficiency, yielding speed-ups of about 12 to 25 times relative to CPU-TLS and about 4 to 5 times relative to CPU-BLS. These results demonstrate that TransitNet provides an accurate, scalable, and computationally efficient framework for low-SNR transit blind searches in the tested regime and motivate its extension to longer-period Earth-size planet searches.

URL PDF HTML ☆

赞 0 踩 0

2606.19092 2026-06-18 stat.AP cs.LG 交叉投稿

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结提出上下文马尔可夫决策过程模型，利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔，识别低风险和高风险亚群，相比固定间隔策略显著降低预期累积成本。

详情

AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病，当前指南对所有患者规定固定的初级保健随访间隔，忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型，利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据，优化亚群特定的随访间隔决策。上下文通过以下方式识别：i) 利用主成分分析对代表个体健康轨迹的变量进行降维，以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文，分别代表低风险和高风险亚群。CMDP导出的策略建议：(i) 如果当前就诊的实验室值未测量，则在1个月内随访；(ii) 对于实验室值升高或近期住院，最多3个月；(iii) 对于持续血糖控制，6至12个月，高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本（例如，在高共病上下文中，相对于美国糖尿病协会类似的固定间隔随访策略，CMDP策略降低了约34.8%的成本；在低共病上下文中降低了约6.4%）。这些发现展示了上下文感知方法如何为适应性随访策略提供信息，并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

URL PDF HTML ☆

赞 0 踩 0

2606.19118 2026-06-18 cs.AI cs.LG econ.GN q-fin.EC 交叉投稿

Analysing drivers and interdependencies in European electricity markets using XAI

使用XAI分析欧洲电力市场的驱动因素与相互依赖性

Antoine Pesenti, Aidan O'Sullivan

发表机构 * UCL Energy Institute, University College London, UK（伦敦大学学院能源研究所，英国）

AI总结结合深度神经网络与可解释人工智能（XAI）技术，利用SHAP和SSHAP框架分析39个欧洲竞价区的电价决定因素，发现可再生能源（尤其是太阳能）对电价形成具有重要作用，天然气价格仍是主导驱动因素，且互联互通显著影响价格动态。

Comments 12 pages

详情

AI中文摘要

电力市场本质上是复杂系统，具有强非线性、高维交互以及跨区域日益增长的相互依赖性。虽然深度神经网络（DNN）在电价预测方面表现出强大的能力，但其缺乏可解释性限制了其在理解电价形成潜在驱动因素方面的实用性。本文通过将DNN模型与可解释人工智能（XAI）技术相结合，分析了39个欧洲竞价区电价的决定因素，填补了这一空白。我们采用SHAP（SHapley Additive exPlanations）量化特征贡献，并应用和扩展了SSHAP（一种聚合框架）以提高高维设置下的可解释性。分析表明，可再生能源（尤其是太阳能）在电价形成中发挥着不成比例的重要作用，尽管其在总发电量中占比较低。天然气价格仍然是跨电力市场的主导且一致的驱动因素，而互联互通显著影响价格动态，凸显了欧洲电力系统的强相互依赖性。此外，我们构建了一个合成性的全欧盟电力市场，以探索完全一体化单一价格市场的反事实情景。

英文摘要

Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.

URL PDF HTML ☆

赞 0 踩 0

2606.19149 2026-06-18 cs.CR cs.LG 交叉投稿

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt：通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结提出OpenAnt系统，结合静态分析与LLM推理，通过代码分解、对抗性验证和动态测试三阶段流水线，在降低误报率的同时发现未知漏洞。

详情

AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性：传统静态分析误报率高，而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型（LLM）的最新进展使得对程序行为进行语义推理成为可能，但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt，一个开源漏洞发现系统，它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先，代码库被分解为自包含的分析单元，并通过从外部入口点的可达性进行过滤，将分析面减少高达97%，同时保留与攻击相关的代码。其次，候选漏洞通过受限攻击者模拟进行对抗性验证，其中模型在现实攻击者能力下评估可利用性。第三，通过动态验证确认发现结果，其中自动生成利用环境，在沙箱容器中执行，并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明，这种架构可以识别先前未知的漏洞，同时保持可管理的分析成本并大幅减少误报。我们的结果表明，结合语义推理与利用验证的闭环漏洞发现流水线，为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源，网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

URL PDF HTML ☆

赞 0 踩 0

2606.19186 2026-06-18 cs.RO cs.LG 交叉投稿

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件：针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto（理想汽车）

AI总结提出首个自动化AEB标注框架，通过特定数据增强和噪声抑制技术，解决极端类别不平衡和非对称标签噪声问题，将延迟/误报触发召回率提升80%，人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

详情

Journal ref: 2026 IEEE International Conference on Robotics and Automation (ICRA)

AI中文摘要

自主紧急制动（AEB）优化依赖于准确标注的真实世界触发事件，特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而，这些少数样本在每天数千次触发事件中占比不到5%，使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中，我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战：（1）极端类别不平衡，其中延迟/误报触发被真实触发淹没；（2）非对称标签噪声，其中误标注的多数样本（真实触发）抑制了少数样本（延迟/误报触发）的学习。为克服这些挑战，我们提出两项关键创新：（1）特定数据增强，通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本；（2）噪声抑制，使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是，我们将模型部署为具有全栈架构的实用标注系统，从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明，延迟/误报触发的召回率提高了80%，人工工作量减少了50%。除了直接收益，该系统通过积累高质量标注实现持续自我改进，为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

URL PDF HTML ☆

赞 0 踩 0

2606.19251 2026-06-18 physics.comp-ph cs.LG physics.flu-dyn 交叉投稿

Acceleration of an algebraic multigrid pressure solver using graph neural networks

使用图神经网络加速代数多重网格压力求解器

Eric Chillón, Artur K. Lidtke, Nguyen Anh Khoa Doan, Bernat Font

发表机构 * Faculty of Mechanical Engineering, Delft University of Technology, The Netherlands（荷兰代尔夫特理工大学机械工程学院）； Maritime Research Institute Netherlands, The Netherlands（荷兰海事研究院）； Department of Aeronautics, Imperial College London, United Kingdom（英国伦敦帝国理工学院航空系）

AI总结提出一种基于图卷积同构网络的代数多重网格平滑器，通过预测最优多项式系数构造稀疏伪逆算子，减少V-cycle迭代次数，在非结构化网格上实现4%-37%的加速，并泛化至训练时未见的大规模网格。

Comments 23 pages, 11 figures

详情

AI中文摘要

求解压力-泊松方程仍然是非结构化不可压缩流求解器的主要计算瓶颈，这主要是由于传统线性求解器对网格不规则性固有的敏感性。本文引入了一种数据驱动的代数多重网格（AMG）平滑器，该平滑器使用改进的图卷积同构网络（GCIN）。图神经网络预测最优多项式系数，以在不同网格拓扑上构造稀疏伪逆算子。优化系数以减少每次V-cycle迭代后的残差。通过直接从稀疏系数矩阵捕获系统的代数结构，所提出的方法在适应非结构化网格中的局部各向异性的同时，保持了求解器的线性性。我们的框架通过减少达到给定容差所需的V-cycle次数，并在不同基准测试中实现4%到37%的墙钟加速，展示了显著的性能提升。值得注意的是，该模型在比训练时所见大128倍的网格上保持效率，并在未见过的工业相关问题上（如AirfRANS数据集）加速求解器收敛，表现出鲁棒的泛化能力。

英文摘要

Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.19253 2026-06-18 cs.CV cs.AI cs.LG cs.RO 交叉投稿

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas: 通过全景重投影实现3D场景理解

Bartłomiej Baranowski, Dave Zhenyu Chen, Matthias Nießner

发表机构 * Technical University of Munich（慕尼黑工业大学）； Huawei（华为）

AI总结提出OneCanvas方法，将多视图补丁特征聚合到全景画布上，利用深度和相机位姿进行重投影，无需复杂几何编码器或大量训练，在SQA3D等基准上达到最先进精度。

Comments Project page: https://baranowskibrt.github.io/onecanvas/

详情

AI中文摘要

现有的视觉语言模型（VLM）中的3D场景理解方法要么依赖复杂的、模型特定的几何编码器，要么为了追求空间推理而需要大量的训练预算。相反，OneCanvas将所有视图的补丁特征聚合到一个单一的等距柱状全景画布上。具体来说，每个补丁利用其深度和相机位姿被反投影到3D世界坐标，然后根据从画布原点看到的该点的连续经度和纬度放置在画布上，无需对重叠视图进行光栅化或聚合。补丁的度量坐标的3D位置嵌入被添加到其特征中，从而恢复了将世界位置压缩到角度画布坐标时丢失的深度。因此，来自所有帧的补丁共享一个空间坐标系，无需融合或对主干网络进行重大架构修改。预训练的VLM将此表示视为普通图像。由于画布可以以任何感兴趣的姿态为中心，相同的表示直接支持从特定视角进行情境推理，这是机器人和具身AI中的常见需求。得益于这种表示，我们还可以引入空间预训练课程：通过程序化地将从真实图像中提取的对象的补丁特征放置在原本空白的画布上的选定3D世界位置，我们生成了涵盖广泛空间推理任务的即时监督，并控制答案分布以减少空间推理捷径。OneCanvas在SQA3D和VSI-Bench上达到了最先进的准确率，并在SPBench上泛化到分布外数据，其训练计算量比最强竞争方法少一个数量级。

英文摘要

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectangular panoramic canvas. Namely, each patch is unprojected to a 3D world coordinate using its depth and camera pose, then placed on the canvas at the continuous longitude and latitude of that point as seen from the canvas origin, with no rasterization or aggregation across overlapping views. A 3D position embedding of the patch's metric coordinates is added to its feature, restoring the depth lost when collapsing the world position to an angular canvas coordinate. Patches from all frames thus share one spatial coordinate system with no fusion or major architectural modifications of the backbone. The pretrained VLM consumes this representation as if it were an ordinary image. Because the canvas can be centered on any pose of interest, the same representation directly supports situated reasoning from a specific viewpoint, a common requirement in robotics and embodied AI. Thanks to this representation, we can also introduce a spatial pretraining curriculum: by procedurally placing patch features of objects, drawn from real images, at chosen 3D world positions on an otherwise empty canvas, we generate on-the-fly supervision spanning a broad range of spatial reasoning tasks, with answer distributions controlled to reduce spatial reasoning shortcuts. OneCanvas achieves state-of-the-art accuracy on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using an order of magnitude less training compute than the strongest competing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.19302 2026-06-18 physics.ao-ph cs.LG 交叉投稿

Optimal scenario design for climate emulation

气候模拟的最优情景设计

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology（航空与航天系，麻省理工学院）； Center for Sustainability Science and Strategy, Massachusetts Institute of Technology（可持续科学与战略中心，麻省理工学院）； Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology（地球、大气与行星科学系，麻省理工学院）； Brahmal Vasudevan Institute for Sustainable Aviation, Department of Aeronautics, Imperial College London（可持续航空研究所，帝国理工学院伦敦校区）； Institute for Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）

AI总结针对气候模拟器泛化能力受限的问题，提出通过可微简单气候模型优化训练数据情景，使小数据集训练的模拟器性能优于标准情景集。

详情

AI中文摘要

随着深度学习在物理系统中的普及，改进泛化性的努力主要集中在设计嵌入物理约束的架构上。然而，对于机器学习替代气候模型（模拟器），我们表明现有情景中用于生成训练数据的低结构多样性限制了预测能力。在此，我们研究是否可以优化训练数据集本身以提高泛化性。我们引入一种方法创建数据集，使模拟器能够泛化到训练数据中未出现的新结构情景。我们使用可微简单气候模型（SCM）计算模拟器损失对训练数据扰动的敏感性，迭代更新训练数据以最大化模拟器技能。对于SCM，以这种方式优化的一个情景训练出的模拟器优于在六个标准ScenarioMIP路径上训练的模拟器。尽管训练数据集更小，但我们实现了更高的预测技能，发现我们的模拟器成功隔离了不同气候强迫因子（如温室气体与气溶胶）的独特物理行为，而无需单强迫运行。然后我们证明，使用SCM优化的情景驱动中等复杂度气候模型时，产生的训练数据集比在ScenarioMIP输出上训练得到更熟练的模拟器。我们的结果表明，在运行全尺度气候模型的计算受限环境中，生成少量动态丰富的情景比扩展传统排放路径集对模拟和表征系统响应具有更大的边际价值。

英文摘要

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

URL PDF HTML ☆

赞 0 踩 0

2606.19329 2026-06-18 astro-ph.IM cs.LG 交叉投稿

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

钱德拉-盖亚对应体星表：利用机器学习解决钱德拉源星表中X射线源与盖亚源的多重匹配歧义

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

发表机构 * Center for Astrophysics Harvard \& Smithsonian, 60 Garden St, Cambridge MA 02138, USA ； Harvard John A. Paulson School of Engineering ； Universidad del Rosario, School of Engineering, Science ； The NSF AI Institute for Artificial Intelligence ； New York University, Courant Institute, 60 5th Avenue, New York NY, USA ； Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 ； New College of Florida, 5800 Bayshore Road, Sarasota, FL 34243, USA ； Astrophysics Laboratory, 3251 Hanover St, Palo Alto, CA 94304, USA

AI总结提出结合源属性（星等、颜色、距离）的机器学习框架，解决钱德拉源星表与盖亚源星表的交叉匹配歧义，为约11.3万个X射线源找到对应体，并识别约2万个假匹配。

Comments Accepted to The Astrophysical Journal. Website: https://www.samuelperezdi.com/chandragaia/

详情

AI中文摘要

我们提出了一个框架，用于将钱德拉源星表（CSC v2.1）中的源与盖亚数据发布3中的光学源进行交叉匹配。与纯空间方法不同，我们使用源属性（如星等、颜色和距离）来识别真实对应体、检测偶然重合，并在存在多个合理候选者时解决歧义。我们使用NWAY（一种考虑位置误差和源密度的贝叶斯交叉匹配框架）定义高置信度匹配的训练集。我们在两个星表的多种特征上训练梯度提升分类器（LightGBM）。在约25.4万个独特X射线源中，我们为约11.3万个源找到了对应体，其中约7000个源存在多个合理对应体。对于约2万个基于分离的交叉匹配能找到匹配的源，我们未找到对应体，并将其中的一半归因于偶然重合。我们在钱德拉猎户座超深项目（COUP）上验证了该流程，机器学习匹配在不使用任何位置信息的情况下再现了NWAY交叉匹配的95%。我们发布了约11.3万个钱德拉-盖亚对应体的星表，以及约7000个替代匹配和约2万个歧义NWAY关联，以支持未来对钱德拉和盖亚均可探测到的源进行种群研究。我们讨论了局限性，并提供了该框架的泛化版本，适用于其他交叉匹配场景。

英文摘要

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

URL PDF HTML ☆

赞 0 踩 0

2509.24725 2026-06-18 cs.LG cs.AI 版本更新

Q-Net: Queue Length Estimation via Kalman-based Neural Networks

Q-Net：基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出Q-Net框架，通过结合卡尔曼滤波与神经网络，解决信号交叉口队列长度估计中的数据融合问题，提升空间转移性和实时性，实现无需昂贵传感设备的准确队列估计。

详情

DOI: 10.1016/j.trc.2026.105809

AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源：(i) 接近停止线的环形检测器提供的车辆计数汇总数据，以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD)，但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此，本文提出Q-Net：一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战，如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构，并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现，并通过将aFCD测量分组为固定大小的局部组来提高空间转移性，使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示，Q-Net优于基线方法，能够准确追踪队列的形成和消散，并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性，Q-Net在无需昂贵的传感基础设施（如摄像头或雷达）的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

URL PDF HTML ☆

赞 0 踩 0

2307.05623 2026-06-18 cs.LG cs.AI 版本更新

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

一种用于动态估计起点-终点序列的深度学习框架

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * School of Data Science University of Science（数据科学学院中国科学技术大学）； Yangtze River Delta Information Intelligence Innovation Research Institute, China（长江三角洲信息智能创新研究院）

AI总结针对OD矩阵估计中的欠定性和滞后性问题，提出集成深度学习方法，利用神经网络推断OD序列结构并引导数值优化，实验证明能有效提供时空约束。

Comments 11 pages,25 figures

详情

AI中文摘要

OD矩阵估计是交通领域的一个关键问题。主要方法利用交通传感器测量信息（如交通计数）来估计由OD矩阵表示的交通需求。该问题分为两类：静态OD矩阵估计和动态OD矩阵序列（简称OD序列）估计。上述两类都面临由大量待估参数和不足的约束信息引起的欠定性问题。此外，OD序列估计还面临滞后挑战：由于拥堵等不同交通状况，同一车辆在相同观测时段内会出现在不同路段，导致相同的OD需求对应不同的行程。为此，本文提出一种集成方法，利用深度学习方法推断OD序列的结构，并利用结构约束指导传统数值优化。实验表明，神经网络能有效推断OD序列的结构，并为数值优化提供实用的约束以获得更好的结果。此外，实验表明，所提供的结构信息不仅包含对OD矩阵空间结构的约束，还提供了对OD序列时间结构的约束，很好地解决了滞后问题的影响。

英文摘要

OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

URL PDF HTML ☆

赞 0 踩 0

2506.13196 2026-06-18 cs.LG 版本更新

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

KEPLA：一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University（浙江大学杭州国际科技创新中心）； School of Software, Shandong University（山东大学软件学院）； College of Informatics, Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳）计算机学院）

AI总结提出KEPLA框架，通过整合基因本体和配体属性的先验知识，利用全局表示对齐与局部交叉注意力，提升蛋白质-配体结合亲和力预测的准确性，在多个基准数据集上超越现有方法。

详情

AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果，但它们通常仅依赖蛋白质和配体的结构特征，忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限，我们提出KEPLA，一种新颖的深度学习框架，明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入，并优化两个互补目标：（1）将全局表示与知识图谱关系对齐，以捕获领域特定的生化见解；（2）利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明，KEPLA始终优于最先进的基线方法。此外，基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记：一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence（认知智能国家重点实验室）； University of Science and Technology of China（中国科学技术大学）； College of Intelligence and Computing（智能科学与计算学院）； iFLYTEK Research（iFLYTEK研究院）

AI总结提出TokenCast框架，利用大语言模型通过符号离散化将连续时间序列转化为标记，与上下文文本对齐，实现上下文感知的预测，实验证明有效。

详情

AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展，但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战，预测精度仍然有限。为了解决这一挑战，我们提出了TokenCast，一个由大语言模型（LLM）驱动的框架，利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说，TokenCast采用离散分词器将连续数值序列转化为时间标记，实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距，时间和上下文标记通过预训练的LLM嵌入到共享表示空间中，并通过生成目标进一步优化。基于这一统一语义空间，对齐的LLM随后以监督方式进行微调，以预测未来的时间标记，然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性，并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

URL PDF HTML ☆

赞 0 踩 0

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect：通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany（科隆大学数学与自然科学学院，德国）； Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院生物医学信息学研究所，德国）； Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆分子医学中心（CMMC），科隆大学医学院与科隆大学医院，德国）； Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究，德国）； Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany（认知神经科学，神经科学与医学研究所，Juelich研究中心，德国）； Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院神经科，德国）； Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany（神经科中心，帕金森、睡眠与运动障碍部门，波恩大学医院，德国）； German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany（德国神经退行性疾病研究中心（DZNE），波恩，德国）； Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany（老龄化与相关疾病卓越中心（CECAD），科隆大学，德国）； Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany（神经科，施普伦德-霍斯特大学医院，基尔校区和基尔大学，德国）； Department of Informatics, Technical University of Munich, Germany（信息学院，慕尼黑技术大学，德国）； Institute for Digital Medicine, University Hospital Bonn, Germany（数字医学研究所，波恩大学医院，德国）； Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark（路德维希基金会帕金森病研究中心（PACE），奥胡斯大学，丹麦）； Department of Nuclear Medicine, Aarhus University Hospital, Denmark（核医学部，奥胡斯大学医院，丹麦）； Department of Electrical and Computer Engineering, Aarhus University, Denmark（电气与计算机工程系，奥胡斯大学，丹麦）； Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK（牛津帕金森病中心与神经科，牛津大学临床神经科学系，英国）

AI总结提出ActiTect，一个全自动开源机器学习工具，通过标准化预处理和睡眠-觉醒检测，从体动记录中识别RBD，在多个独立队列中验证了泛化能力（AUROC 0.84-0.94）。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

详情

DOI: 10.1038/s41746-026-02738-8
Journal ref: npj Digital Medicine (2026)

AI中文摘要

孤立性快速眼动睡眠行为障碍（iRBD）是α-突触核蛋白病的主要前驱标志，通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力，但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect，一个全自动开源机器学习工具，用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力，我们的流程包括稳健的预处理和自动睡眠-觉醒检测，以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列，在嵌套交叉验证下表现出强大的区分能力（AUROC = 0.95）。在盲法本地测试集（n = 31，AUROC = 0.86）和两个独立外部队列（n = 113，AUROC = 0.84；n = 57，AUROC = 0.94）上验证了泛化性。为评估现实世界鲁棒性，跨内部和外部队列的留一数据集交叉验证显示出一致的性能（AUROC范围 = 0.84-0.89）。补充稳定性分析表明，关键预测特征在数据集中保持可重复性，支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用，我们的工具促进了广泛采用，并促进了独立验证和协作改进，从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

URL PDF HTML ☆

赞 0 踩 0

2602.19591 2026-06-18 cs.LG cs.AI 版本更新

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

使用异构图神经网络检测高潜力中小企业

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

发表机构 * University of Michigan（密歇根大学）； The University of Hong Kong（香港大学）

AI总结提出SME-HGT异构图Transformer框架，利用公开数据构建包含公司、研究主题和政府机构的异构图，预测SBIR第一阶段获奖者能否进入第二阶段，AUPRC达0.621，优于基线模型。

Comments accepted by (ICIIS 2026)

详情

AI中文摘要

中小企业占美国企业的99.9%，贡献44%的经济活动，但系统性地识别高潜力中小企业仍是一个开放挑战。我们提出了SME-HGT，一个异构图Transformer框架，仅使用公开数据预测哪些SBIR第一阶段获奖者将进入第二阶段资助。我们构建了一个异构图，包含32,268个公司节点、124个研究主题节点和13个政府机构节点，通过约99,000条边连接三种语义关系类型。SME-HGT在时间分割测试集上达到0.621±0.003的AUPRC，在五个随机种子上优于MLP基线（0.590±0.002）和R-GCN（0.608±0.013）。在筛选深度为100家公司时，SME-HGT达到89.6%的精确率，比随机选择提升2.14倍。我们的时间评估协议防止信息泄露，对公开数据的依赖确保了可重复性。这些结果表明，公司、研究主题和资助机构之间的关系结构为中小企业潜力评估提供了有意义的信号，对政策制定者和早期投资者具有启示意义。

英文摘要

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

URL PDF HTML ☆

赞 0 踩 0

2605.10083 2026-06-18 cs.LG 版本更新

Unlocking air traffic flow prediction through microscopic aircraft-state modeling

通过微观飞机状态建模解锁空交通流量预测

Bin Wang, Anqi Liu, Jiangtao Zhao, Hina Birahmani, Yanyong Huang, Peilan He, Guiyuan Jiang, Feng Hong, Yanwei Yu, Yuanyuan Hou, Tianrui Li

发表机构 * Faculty of Information Science and Engineering（信息科学与工程学院）； Ocean University of China（中国海洋大学）； Sanya Oceanographic Institution（三亚海洋研究所）； Joint Laboratory of Data Science and Business Intelligence（数据科学与商务智能联合实验室）； Southwestern University of Finance and Economics（西南财经大学）； The Affiliated Hospital of Qingdao University（青岛大学附属医院）； School of Computing and Artificial Intelligence（计算机与人工智能学院）

AI总结本文提出AeroSense模型，通过微观飞机状态直接预测未来区域交通流量，提升高密度交通下的预测精度，替代传统时间序列方法。

详情

AI中文摘要

终端空域短期空交通流量预测对主动空交通管理至关重要。现有方法主要将交通流量建模为聚合时间序列，尽管交通动态由飞机状态和连续空域中的相互作用决定。此类聚合掩盖了包括飞机运动学、边界相互作用和控制意图在内的细粒度信息。本文提出AeroSense，一种从即时空域情况中的动态飞机状态集直接预测未来交通流量的状态到流量建模框架。通过建立从微观飞机状态到未来区域交通流量的端到端映射，AeroSense在保持飞机级动态的同时，自然适应变化的交通密度，而无需依赖历史回溯窗口。在大规模真实数据集上的实验表明，AeroSense在高密度交通期间比基于聚合的预测方法具有持续的预测精度提升。这些发现表明，即时空域情况为传统基于时间序列的交通预测范式提供了有效的替代方案。

英文摘要

Short-term air traffic flow prediction in terminal airspace is essential for proactive air traffic management. Existing approaches predominantly model traffic flow as aggregated time series. However, traffic dynamics are governed by aircraft states and their interactions in continuous airspace. Such aggregation obscures fine-grained information, including aircraft kinematics, boundary interactions, and control intent. Here we present AeroSense, a state-to-flow modeling paradigm that predicts future traffic flow directly from instantaneous airspace situations represented as dynamic sets of aircraft states derived from ADS-B trajectories. By establishing an end-to-end mapping from microscopic aircraft states to future regional traffic flow, AeroSense preserves aircraft-level dynamics while naturally accommodating varying traffic density without relying on historical look-back windows. Experiments on a large-scale real-world dataset show that AeroSense exhibits admirable predictive accuracy and robustness over aggregation-based forecasting approaches, particularly during high-density traffic periods. These findings suggest that aircraft-state situation modeling provides a promising alternative to conventional time-series forecasting in air traffic flow management.

URL PDF HTML ☆

赞 0 踩 0

2605.13566 2026-06-18 cs.LG 版本更新

Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks

基于深度神经网络的城市地表温度时空下垫面精细化与现在预报

Solomiia Kurchaba, Angela Meyer

发表机构 * Department of Geoscience and Remote Sensing（地质科学与遥感系）； Delft University of Technology（代尔夫特理工大学）； School of Engineering and Computer Science（工程与计算机科学学院）； Bern University of Applied Sciences（伯恩应用科学大学）

AI总结本文提出利用深度神经网络结合静止和极轨卫星数据，实现高时空分辨率的城市地表温度场估计与现在预报，提升城市气候与生态研究的精度与时效性。

Comments Paper after publication in IEEE Access

详情

DOI: 10.1109/ACCESS.2026.3700054
Journal ref: IEEE Access, vol. 14, pp. 85134-85151, 2026

AI中文摘要

地表温度（LST）是多种应用的关键变量，如城市气候和生态研究。然而，现有卫星衍生的LST产品提供的是高空间或高时间分辨率，导致两者之间存在根本性权衡。为解决这一权衡，我们结合静止和极轨卫星的观测数据，提供高空间和高时间分辨率（1公里，15分钟间隔）的LST场。我们展示了其在日内LST预报中的应用。为了估计高时空分辨率的LST场，训练了一个U-Net模型，将SEVIRI/MSG（3公里，15分钟分辨率）的LST场映射到Terra/Aqua MODIS（1公里，每天4次过境）的LST场，二者在空间和时间上同步。所提出的模型已在欧洲大都市的LST上进行训练，人口超过100万，且在留出测试集上达到RMSE=1.92°C和接近零偏移MVE=0.01°C。作为第二步，我们提出基于ConvLSTM架构的LST现在预报模型，训练数据为下缩的LST场，预测时间跨度为15至75分钟。该现在预报模型优于持续性和气候滚动中位数基准，对于所考虑的预测时间，RMSE为0.57至1.15°C，偏移范围从-0.1到0.14°C。此外，与独立MODIS过境的额外验证确认了鲁棒性能。我们的高时空分辨率LST预报模型可直接应用于基于卫星的LST监测操作。

英文摘要

Land Surface Temperature (LST) is a key variable for various applications, such as urban climate and ecology studies. Yet, existing satellite-derived LST products provide either high spatial or high temporal resolution, resulting in a fundamental trade-off between the two. To address this trade-off, we combine observations from a geostationary and a polar orbiting satellite and provide LST fields at high spatial and high temporal resolution (1 km at 15-min intervals). We demonstrate their application for intraday forecasting of LSTs. To estimate LST fields at high spatiotemporal resolution, a U-Net model is trained to map LST fields from SEVIRI/MSG (3 km and 15 min resolution) to LST fields from Terra/Aqua MODIS (1 km, 4 overpasses per day) that are collocated in space and time. The presented model has been trained on LSTs across large European cities with a population exceeding 1 million inhabitants, and achieves an RMSE = $1.92$°C and near-zero bias MBE = $0.01$°C on the hold-out test set. As a second step, we present an LST nowcasting model based on ConvLSTM architecture, trained across downscaled LST fields with forecast lead times of 15 to 75 minutes. The nowcasting model outperforms a persistence and a Climatological Rolling Median benchmarks, with RMSEs of $0.57$ to $1.15$°C for the considered lead times and biases ranging from $-0.1$ to $0.14$°C. An additional validation conducted against independent MODIS overpasses confirms robust performance. Our LST forecast model at high spatiotemporal resolution is directly applicable to operational satellite-based LST monitoring.

URL PDF HTML ☆

赞 0 踩 0

2605.21528 2026-06-18 cs.LG cs.AI 版本更新

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University（杭州师范大学基础医学院）； Research Department, Hangzhou Domain Zones Technology Co.Ltd.（杭州域区技术有限公司）

AI总结本文提出了一种可重复的基于日志的自动机器学习框架，用于医疗风险预测中的可解释流水线优化，通过分析组件属性、交互和冗余性，提高了模型性能和稳定性。

详情

AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性，由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit，一种确定性和基于日志的自动化机器学习框架，将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体，使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间，其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示，增强（0.454）、模型选择（0.198）和不平衡处理（0.101）是Pima数据集的关键驱动因素，而不平衡处理主导中风（0.406）。组件相似性分析显示强冗余性，特征选择变体（biMax-biMean）表现出低RMS距离（0.0252），混合匹配无增强（0.0279），TomekLinks与无不平衡处理对齐（0.0325），而高斯噪声与无增强的差异更大（0.10）。该框架使用集成模型（加权F1 0.89，宏F1 0.88在Pima；加权F1 0.94在中风）实现了强且稳定的性能，而宏F1在中风上较低（0.67）由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡，集成模型的变异性低于SVM。这些结果表明，有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.07622 2026-06-18 cs.LG stat.AP 版本更新

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University（韩国航空大学）； Korea Airports Corporation（韩国机场公社）

AI总结提出基于Transformer的框架，利用历史队列长度、等待时间和旅客吞吐量数据，预测登机口和安检点未来两小时的队列长度与等待时间，支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情

AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要，因为它能够实现主动的拥堵管理。然而，时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中，我们提出了一种旅客排队预测框架，该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构，利用过去登机口和安检点的队列长度和等待时间，以及值机岛的旅客吞吐量，来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部，以预测登机口和安检点的队列长度和等待时间。实验结果表明，该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

URL PDF HTML ☆

赞 0 踩 0

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University（萨特巴耶夫大学）； Institute of Ionosphere LLP（电离层研究所）； Information Technology Department（信息技术部门）； Assiut University（阿西乌特大学）

AI总结提出结合目标检测、GAN（CRA）修复和Transformer/CNN分类的端到端框架，发现重建质量高（PSNR 28.7dB）但分类准确率仅53%，通过置信度混合集成将MCA从48%提升至58%，揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情

DOI: 10.1109/ACCESS.2026.3705029

AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻，这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功，但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线，利用目标检测进行样本分割，随后使用具有上下文残差聚合（CRA）的生成对抗网络（GAN）进行图像修复，以重建缺失的高频细节。接着，我们在重建数据上评估了现代基于Transformer（Swin、ViT）和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧：尽管结构保真度高（PSNR 28.7 dB，FID 74.01），分类准确率却停滞在53%。为了改善少数类检测，我们提出了一种基于置信度的混合集成方法，将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性，它们可能产生视觉上合理但语义模糊的特征（“幻觉”），从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系，为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围，我们将所得到的系统定位为岩相解释的决策支持和筛选工具，而非完全自主的分类器。代码可在以下网址获取：https://github.com/your-repo（注：原文URL未提供，此处为示例）

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

URL PDF HTML ☆

赞 0 踩 0

2508.10178 2026-06-18 q-bio.QM cs.LG 版本更新

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

估算欧洲陆架海环境中的碳库：用模型指导的机器学习替代再分析？

Jozef Skakala

发表机构 * Plymouth Marine Laboratory（普利茅斯海洋实验室）； National Centre for Earth Observation（国家地球观测中心）

AI总结提出用深度集成神经网络学习可观测变量与海洋碳库的关系，以低成本替代昂贵再分析，在西北欧陆架海实现高效碳库预测并提供不确定性。

Comments 37 pages, 9 figures (+ 3 in the appendix), v3 - published version

详情

DOI: 10.1029/2026JH001326
Journal ref: JGR - Machine Learning and Computation 3 (2026)

AI中文摘要

陆架海对经济和碳循环至关重要，但碳库观测往往稀疏或高度不确定。碳再分析（无论是同化叶绿素a等代理变量还是直接同化碳）可提供替代方案，但运行成本高昂。我们提出使用计算成本低的神经网络集成（即深度集成）来学习直接可观测（大气、河流和海洋）变量与海洋碳库之间的关系，该关系来自一个物理-生物地球化学耦合模型。深度集成在西北欧陆架海（NWES）物理-生物地球化学模型自由运行模拟上训练。训练后，使用来自NWES再分析的输入而非自由运行来运行深度集成，证明它能高效预测多个NWES碳库（如碎屑、浮游动物、异养细菌），且与再分析的一致性远优于自由运行，同时提供不确定性信息。我们进一步表明，当深度集成直接由同化到再分析中的观测驱动时，其表现同样良好，但碳库只能预测在观测位置和时间。我们关注结果的可解释性，并展示了深度集成在未来气候假设情景中的潜在应用。我们认为，模型指导的机器学习为昂贵的再分析提供了可行的替代方案，并可在观测缺失和/或高度不确定的地方补充观测。

英文摘要

Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

URL PDF HTML ☆

赞 0 踩 0

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

IPSL-AID：用于从全球到区域尺度气候降尺度的生成扩散模型

Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

发表机构 * Climate Modeling Center（气候建模中心）； Sorbonne University（索邦大学）； CNRS（法国国家科学研究中心）； IPSL ； Paris（巴黎）； France（法国）

AI总结提出基于去噪扩散概率模型的IPSL-AID工具，利用ERA5再分析数据从粗分辨率输入生成0.25°温度、风和降水场，并建模细尺度特征概率分布以量化不确定性，准确重建统计分布、极端事件和空间结构。

Comments 17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science

2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University（interdisciplinary 能源研究平台，九州大学）； Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University（物理研究所，物理、天文学与信息学学院，尼古拉库普林大学）

AI总结本研究利用热力学驱动的机器学习方法（光谱映射）从全原子分子动力学轨迹中学习集体变量，揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制，并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情

AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码（$-1$ PRF）调控蛋白质合成，该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量（CVs），可以促进通过无偏分子动力学（MD）模拟理解这一过程。这里，我们使用光谱映射（SM），一种热力学驱动的机器学习技术，直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物（中性和离子化形式）复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观（FELs）表明，配体诱导的去稳定化是拓扑选择性的。在穿线假结中，抑制剂去稳定化S2茎，而在非穿线假结中，去稳定化发生在S1和S3茎。此外，每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配，而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言，我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学，并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

URL PDF HTML ☆

赞 0 踩 0

2604.22476 2026-06-18 cs.CV cs.LG 版本更新

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

全神贯注于工作流：从视频流中自动高效发现事件

Marco Pegoraro, Jonas Seng, Dustin Heller, Wil M. P. van der Aalst, Kristian Kersting

发表机构 * Chair of Process and Data Science, RWTH Aachen University（过程与数据科学教授席位，亚琛工业大学）； Artificial Intelligence & Machine Learning Lab, Technical University of Darmstadt（人工智能与机器学习实验室，达姆施塔特技术大学）

AI总结提出SnapLog方法，利用图像嵌入和帧间相似矩阵进行时间分割，结合广义少样本分类从视频中提取事件数据，生成可解释的带标签时间戳帧序列。

Comments 18 pages, 6 figures, 1 table, 27 references

详情

AI中文摘要

业务流程管理和流程挖掘等学科通过基于记录的事件数据发现流程见解来帮助组织。然而，流程分析的一个障碍是数据多模态性：例如，视频形式的数据不能直接解释为事件。现有方法依赖于活动标签字典作为输入，无法提供逐帧标签解释，或依赖于过时的计算机视觉技术。在这项工作中，我们提出了SnapLog，一种通过使用图像嵌入将帧转换为特征向量，并通过帧间相似矩阵进行时间分割来从视频中提取事件数据的方法。然后使用广义少样本分类为视频片段分配标签，生成可解释为事件的带标签、时间戳的子帧序列。传统的流程挖掘技术可用于分析结果数据。我们表明，我们的方法生成的日志准确反映了视频中的流程。

英文摘要

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. Existing approaches rely on a dictionary of activity label as input, cannot provide frame-by-frame labeling explanations, or rely on superseded computer vision techniques. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.

URL PDF HTML ☆

赞 0 踩 0

2605.22845 2026-06-18 cs.CE cs.LG 版本更新

Adv-TGD：面向人脸识别冒充攻击的对抗性文本引导扩散

Omid Ahmadieh, Nima Karimian

发表机构 * University of South Florida, Bellini College of Artificial Intelligence, Cybersecurity and Computing（南佛罗里达大学贝利尼人工智能、网络安全与计算学院）

AI总结提出Adv-TGD框架，利用Stable Diffusion和LoRA微调生成逼真对抗人脸，在保持视觉质量的同时实现高成功率身份冒充攻击，平均ASR达85.90%。

详情

AI中文摘要

人脸识别（FR）技术的广泛普及引发了严重的隐私担忧，因为面部数据可能在未经同意的情况下被利用。为了解决这一挑战，我们提出了Adv-TGD，一个生成式对抗攻击框架，能够合成逼真的人脸，冒充目标身份并欺骗人脸识别系统。基于Stable Diffusion，Adv-TGD对每个样本进行LoRA微调，以简洁的文本提示为条件，生成自然但具有对抗性操控的身份。与传统的身份攻击方法不同，我们的方法在单步去噪过程中为每个源-目标对优化轻量级交叉注意力适配器。潜在混合受到面部局部热图掩码的约束，以确保空间精确的身份操控，同时保留非敏感区域。我们引入了一个复合目标，结合了掩码epsilon-MSE重建、FR嵌入空间中的阈值化身份差异、方向特征对齐和源相似性抑制，以平衡对抗攻击和视觉真实性。可选地，LLaVA生成的属性提示增强了细粒度语义细节，而不会重新引入身份线索。在黑盒评估协议下，Adv-TGD在IR152、IRSE50、MobileFace和FaceNet上平均攻击成功率（ASR）达到85.90%，超过语义SOTA基线Adv-CPG +6.25个百分点、基于扩散的化妆方法DiffAIM +3个百分点以及基于噪声的P3-Mask +16个百分点。尽管攻击效果强劲，Adv-TGD仍保持了高视觉保真度（PSNR = 27.15 dB，SSIM = 0.981）。此外，我们通过成功将其扩展到野外数据集（LADN）、通用对象分类（ImageNet）和基于Transformer的扩散模型（FLUX.1），展示了我们框架的灵活性。

英文摘要

The widespread adoption of face recognition (FR) technologies raises serious privacy concerns, as facial data can be exploited without consent. To address this challenge, we propose Adv-TGD, a generative adversarial attack framework that synthesizes photorealistic faces capable of impersonating target identities and deceiving face recognition systems. Built upon Stable Diffusion v2.1, Adv-TGD performs per-sample LoRA fine-tuning conditioned on concise textual prompts to generate natural yet adversarially manipulated identities. Unlike conventional identity attack approaches, our method optimizes lightweight cross-attention adapters for each source-target pair within a fixed-timestep denoising process. Latent blending is constrained by a face-local heatmap mask to ensure spatially precise identity manipulation while preserving non-sensitive regions. We introduce a composite objective that integrates masked epsilon-MSE reconstruction, thresholded identity divergence in FR embedding space, directional feature alignment, and source-similarity suppression to balance adversarial attack and visual realism. Optionally, LLaVA-generated attribute prompts enhance fine-grained semantic details without reintroducing identity cues. Under the black-box evaluation protocol, Adv-TGD attains an average attack success rate (ASR) of 85.90% across IR152, IRSE50, MobileFace, and FaceNet, surpassing the semantic SOTA baseline Adv-CPG by 6.25 points, the diffusion-based makeup method DiffAIM by 3 points, and the noise-based P3-Mask by 16 points. Despite its strong attack efficacy, Adv-TGD preserves high visual fidelity (PSNR = 28.18 dB, SSIM = 0.981). Furthermore, we demonstrate the flexibility of our framework by successfully extending it to in-the-wild datasets (LADN), general object classification (ImageNet), and transformer-based diffusion models (FLUX.1).

URL PDF HTML ☆

赞 0 踩 0

2606.12816 2026-06-18 quant-ph cs.ET cs.LG 版本更新

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy

发表机构 * University of California, Berkeley（加州大学伯克利分校）； National Institute of Standards and Technology（国家标准与技术研究院）

AI总结提出一种利用图强化学习进行校准感知的量子电路路由方法，通过IBM Heron r2校准数据选择SWAP操作，在MQT Bench电路上平均保真度达0.727，优于SABRE-best20的0.440。

详情

AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由，在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器，该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略，并通过九个慕尼黑量子工具包（MQT）基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中，合并的平均精确保真度为$0.727$，而SABRE-best20为$0.440$，目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数，并集中在5q和8q电路系列中；在固定树动作图下，所有10q系列都倾向于SABRE-best20。总体而言，我们的结果表明，校准感知的学习路由可以超越基于门计数的编译，提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. We observed that fidelity gains came with higher routed two-qubit counts and were concentrated in 5 qubit and 8 qubit circuit families; under the fixed tree action graph, all 10 qubit families favored SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

URL PDF HTML ☆

赞 0 踩 0

2606.17276 2026-06-18 cs.IR cs.LG 版本更新

用程序合成解释注意力机制

Amiri Hayes, Belinda Li, Jacob Andreas

发表机构 * NJIT（新泽西理工学院）； MIT EECS（麻省理工学院电气工程与计算机科学系）； MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）

AI总结提出用可执行程序近似深度网络组件行为的方法，针对Transformer注意力头，通过生成Python程序再现注意力模式，实现可解释性。

详情

AI中文摘要

可解释深度学习研究的一个长期目标是，用人类可理解的符号描述取代不透明的神经计算。本文提出了一种用可执行程序近似深度网络组件行为的方法。我们专注于Transformer语言模型中的注意力头。对于给定的注意力头，我们首先在一组随机选择的训练样本上计算其关联的注意力矩阵。接着，我们向预训练语言模型提供这些矩阵的摘要，并指示它生成一组Python程序，这些程序仅根据输入句子中的文本即可再现相关的注意力模式。最后，我们根据最终程序集在保留输入上预测行为的效果对程序进行重新排序。我们证明，少于1000个这样的生成程序即可再现GPT-2、TinyLlama-1.1B和Llama-3B中注意力头的注意力模式，在TinyStories上平均交并比相似度超过75%。此外，最佳匹配程序可以替代神经注意力头而不会显著影响模型行为：在三个模型中用程序替代25%的注意力头仅导致平均困惑度增加16%，同时在各种下游问答基准上保持性能。这项工作为使用人类可读、可执行的代码逆向工程Transformer模型中的注意力头提供了一个可扩展的流程，推动了神经模型向符号透明性的发展。

英文摘要

A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matrices on a collection of randomly selected training examples. Next, we prompt a pre-trained language model with a summary of these matrices, and instruct it to generate a set of Python programs that can reproduce the associated attention patterns given only text from the input sentence. Finally, we re-rank programs according to how well our final set of programs predict behavior on held-out inputs. We demonstrate that a set of fewer than 1,000 such generated programs can reproduce the attention patterns of heads in GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity above 75% on TinyStories. Moreover, the best-fit programs can replace neural attention heads without substantially affecting model behavior: replacing 25% of attention heads with programmatic surrogates across the three models incurs only a 16% average perplexity increase, while maintaining performance on a variety of downstream question answering benchmarks. This work contributes a scalable pipeline for reverse-engineering attention heads in transformer models using human-readable, executable code, advancing a path toward symbolic transparency in neural models.

URL PDF HTML ☆

赞 0 踩 0

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 交叉投稿

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK（诺丁汉大学数学科学学院）； Department of Statistics, Rice University, USA（里士满大学统计学系；伯克利国家实验室）； Lawrence Berkeley National Laboratory, USA（洛斯阿拉莫斯国家实验室统计科学组）； Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结针对多原因观察研究中替代混杂因子过度编码问题，提出贝叶斯因子分配框架，利用收缩先验学习稀疏替代混杂因子，保持粗粒度多原因依赖，并证明后验集中性和重叠保持几何性质，实现潜在结果的一致性估计。

详情

AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而，对未观测混杂的直接插补通常比学习一个低维替代得分更复杂，该得分保留了稳定因果调整所需的共享分配变异。去混杂因子（Wang and Blei, 2019）及相关替代混杂因子方法利用了这一思想，但灵活的分配模型可以拟合原因的联合分布，同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架，用于学习稀疏替代混杂因子，该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述，因此不依赖于特定的收缩先验。在这些条件下，当相应的潜变量识别假设成立时，所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具：它们倾向于由多个原因支持的低维因子，阻止有效的单原因因子，并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议（ADNI）基线分析中，稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果，而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

URL PDF HTML ☆

赞 0 踩 0

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 交叉投稿

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法：医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis（马林克罗德特放射医学研究所和电气与系统工程系，华盛顿大学圣路易斯分校）

AI总结本文区分算法创新与概念创新，指出当前激励结构过度奖励算法新颖性而忽视概念贡献，通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响，并提出促进概念创新的建议。

详情

AI中文摘要

人工智能推动了医学影像研究的快速发展，产生了日益复杂的算法，并在基准任务上稳步改进。然而，这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡：虽然计算方法快速进步，但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中，我们区分了算法创新（专注于在固定问题定义内改进计算实现和性能）与概念创新（重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性）。我们认为，当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性，尤其是对早期职业研究者而言，而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子，我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后，我们为研究者、导师、审稿人和期刊提出了可操作的建议，以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

URL PDF HTML ☆

赞 0 踩 0

2412.16468 2026-06-18 cs.LG 版本更新

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

通往人工超级智能之路：超级对齐的全面综述

HyunJin Kim, DongHyun Ryu, Xiaoyuan Yi, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, JinYeong Bak, Xing Xie

发表机构 * Microsoft Research Asia（微软亚洲研究院）； Sungkyunkwan University（顺天大学）； Stanford University（斯坦福大学）； Fudan University（复旦大学）

AI总结本文综述了超级对齐问题，通过分析可扩展监督范式（夹层、自我增强和弱到强泛化）及其局限性，探讨了监督、控制和管理人工超级智能的挑战与路径。

Comments 24 pages

详情

AI中文摘要

大型语言模型（LLMs）的出现引发了关于人工超级智能（ASI）的讨论，这是一种假设性的、超越人类智能的AI系统。尽管ASI仍处于假设阶段且远超出当前AI能力，但讨论其潜力、探索其可行性和潜在风险对于未来AI系统的发展至关重要。超级对齐的概念源于可扩展监督，后者研究当直接人类监督不足时如何监督日益强大的AI系统。本文聚焦于超级对齐问题：“监督、控制和管理人工超级智能的过程”。我们首先回顾可扩展监督范式——夹层、自我增强和弱到强泛化，然后通过可能性和不可能性的视角分析当前范式的局限性，讨论关键挑战，并提出未来AI系统安全持续改进的路径。

英文摘要

The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization -- then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.08934 2026-06-18 cs.LG 版本更新

From Mechanistic to Compositional Interpretability

从机制到组合可解释性

Ward Gauderis, Thomas Dooms, Steven T. Homer, Kola Ayonrinde, Geraint A. Wiggins

发表机构 * UK AI Security Institute（英国人工智能安全研究所）

AI总结本文提出组合可解释性框架，通过范畴论原理解决机制可解释性无法客观验证的问题，将解释质量分解为忠实度和复杂度，引入压缩细化方法实现模型简化，理论证明简洁性准则保障人类对齐的解释。

详情

AI中文摘要

机制可解释性旨在通过逆向工程神经模型的行为来解释其计算结构，但缺乏正式框架导致无法客观验证。本文引入组合可解释性，基于组合性和最小描述长度原则的范畴论框架。组合解释是语法和语义映射的对，必须满足一致性。将解释质量分解为忠实度和复杂度，将其视为约束优化问题，并引入压缩细化方法系统地重构模型为更简单的部分。最后证明了在简洁性准则下，语法压缩理论上能保证更简洁的人类对齐解释。该框架将 prominent 机制方法作为细化子类，澄清了为何其压缩性启发式方法与人类可解释性一致。本文为自动化发现和评估机制解释提供了可测量、可优化的基础。

英文摘要

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We introduce compositional interpretability, a category-theoretic framework grounded in the principles of compositionality and minimum description length. Compositional interpretations are pairs of syntactic and semantic mappings that must commute to enforce consistency between a model's decomposition and its observed behaviour. We deconstruct explanation quality into measures of faithfulness and complexity to cast interpretability as a constrained optimisation problem, and introduce compressive refinement to systematically restructure models into simpler parts without altering their function. Finally, we derive a parsimony criterion under which syntactic compression theoretically guarantees more concise, human-aligned explanations. Our framework situates prominent mechanistic methods as subclasses of refinement, and clarifies why their compressibility heuristics tend to align with human interpretability. Our work provides a measurable, optimisable blueprint for automating the discovery and evaluation of mechanistic explanations.

URL PDF HTML ☆

赞 0 踩 0

2410.21258 2026-06-18 quant-ph cs.CC cs.LG 版本更新

Provable quantum speedups for computing persistence in topological data analysis

可证明的量子加速用于拓扑数据分析中的持久性计算

Casper Gyurik, Alexander Schmidhuber, Robbie King, Vedran Dunjko, Ryu Hayakawa

发表机构 * applied Quantum algorithms (aQa), Leiden University, 2300 RA Leiden, The Netherlands ； Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, USA ； Department of Computing ； Yukawa Institute for Theoretical Physics \& The Hakubi Center, Kyoto University, Japan

AI总结提出一种高效量子算法，用于判断拓扑数据分析中洞的持久性，并证明该问题为BQP_1-hard，暗示在标准复杂性假设下存在指数级量子加速。

Comments 17 pages

详情

DOI: 10.1103/gvys-hl8h
Journal ref: PRX Quantum 7, 020361 (2026)

AI中文摘要

拓扑数据分析（TDA）旨在通过检查数据拓扑中空洞的数量和持久性，从数据集中提取对噪声鲁棒的特征。我们为与TDA核心任务密切相关的一个计算问题提供了高效的量子算法——判断给定空洞是否在不同长度尺度上持续存在。此外，我们证明该问题本身是$\mathsf{BQP}_1$-hard的，意味着经典解决方案极不可能；这与所有先前的TDA量子方法形成对比，在这些方法中，问题对于量子计算机也是难解的，或者严格的经典困难性证明仍然悬而未决。这一结果表明，在标准复杂性理论假设下，该问题存在指数级的量子加速。我们的方法依赖于将空洞的持久性编码到引导稀疏哈密顿量问题的一个变体中，其中引导态由空洞的调和代表元构造而成。

英文摘要

Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA -- determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

URL PDF HTML ☆

赞 0 踩 0

2604.23716 2026-06-18 cs.AI cs.IT cs.LG cs.MA math.IT 版本更新

Information-Theoretic Measures in AI: A Practical Decision Guide

人工智能中的信息论度量：实用决策指南

Nikolaos Al. Papadopoulos, Konstantinos E. Psannis

发表机构 * Department of Applied Informatics, University of Macedonia（马其顿大学应用信息系）

AI总结本文为七种信息论度量提供实用决策框架，围绕每个度量的三个关键问题：回答的问题与AI场景、适合的估计器、最危险的误用，并附有流程图和决策表。

Comments 25 pages, 2 tables, 1 figure. Submitted to Entropy (MDPI)

详情

AI中文摘要

信息论（IT）度量在人工智能中无处不在：熵驱动决策树分裂和不确定性量化，交叉熵是默认的分类损失，互信息支撑表示学习和特征选择，转移熵揭示动态系统中的有向影响。第二类较不成熟的度量——整合信息（Phi）、有效信息（EI）和自主性——已出现用于表征智能体复杂性。尽管被广泛采用，度量选择常常与估计器假设、失败模式和安全的推断主张脱节。本文为所有七种度量提供了一个实用决策框架，围绕每个度量的三个指导性问题组织：（i）该度量回答什么问题，在何种AI背景下；（ii）哪种估计器适合数据类型和维度；（iii）最危险的误用是什么。该框架通过两个互补的人工制品实现：度量选择流程图和主决策表。我们涵盖每个度量的AI/ML和决策智能体应用领域，并使用标准化桥接框将IT量与认知构造联系起来。三个工作示例展示了该框架在具体从业者场景中的应用，涵盖表示学习、时间影响分析和进化智能体复杂性。

英文摘要

Information-theoretic (IT) measures are ubiquitous in artificial intelligence: entropy drives decision-tree splits and uncertainty quantification, cross-entropy is the default classification loss, mutual information underpins representation learning and feature selection, and transfer entropy reveals directed influence in dynamical systems. A second, less consolidated family of measures, integrated information (Phi), effective information (EI), and autonomy, has emerged for characterizing agent complexity. Despite wide adoption, measure selection is often decoupled from estimator assumptions, failure modes, and safe inferential claims. This paper provides a practical decision framework for all seven measures, organized around three prescriptive questions for each: (i) what question does the measure answer and in which AI context; (ii) which estimator is appropriate for the data type and dimensionality; and (iii) what is the most dangerous misuse. The framework is operationalized in two complementary artifacts: a measure-selection flowchart and a master decision table. We cover both AI/ML and decision-making agent application domains per measure, with standardized Bridge Boxes linking IT quantities to cognitive constructs. Three worked examples illustrate the framework on concrete practitioner scenarios spanning representation learning, temporal influence analysis, and evolved agent complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany（纽约州立大学阿尔巴尼分校）

AI总结本文系统性地探讨了点云分类和分割中的深度学习架构，分析了点云数据的结构特性，分类了不同架构的工作，并评估了其在主流基准上的性能，同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

详情

DOI: 10.1145/3815180
Journal ref: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026

AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而，其固有的无序和不规则性质，加剧了传感器噪声和遮挡的影响，给基于机器学习的方法带来了独特的挑战。为应对这些问题，已开发出多种策略，包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中，我们的重点是深度学习模型在3D视觉三个基本任务中的应用：点云分类、部分分割和语义分割。我们首先正式定义点云数据，然后深入讨论其结构特性。接着，我们根据其骨干结构对重要工作进行分类，并评估其在流行基准上的性能。除了经验比较外，我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

URL PDF HTML ☆

赞 0 踩 0

2605.25929 2026-06-18 cs.MA cs.LG 版本更新

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合：谁成为影响者？

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）； Faculty of Computer Science, University of Vienna（维也纳大学计算机科学系）

AI总结本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制，揭示输入依赖的FJ参数使系统成为专家混合，并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

2606.17454 2026-06-18 cs.AI cs.LG 版本更新

Dissecting model behavior through agent trajectories

通过智能体轨迹剖析模型行为

Gaurav Gupta, Vatshank Chaturvedi, Jun Huan, Anoop Deoras

发表机构 * AWS AI Labs（AWS人工智能实验室）

AI总结本文提出“意图-执行差距”概念，并设计Simple Strands Agent（SSA）框架，通过分析138k条轨迹揭示模型在自主问题解决中的行为差异。

Comments 106 pages, 50 Figures, 16 Tables

详情

AI中文摘要

AI智能体性能不仅仅是一个建模问题，它本质上是一个系统问题。模型的高级能力通过智能体框架（harness）实现。因此，模型假设与框架行为之间的差距很容易阻止模型的全部能力转化为智能体性能。我们将此形式化为“意图-执行差距”：模型意图与框架执行之间的不匹配，反之亦然。我们认为，最小化这种意图-执行差距与框架设计的其他方面（如工具和执行循环）同样重要。为了说明这种框架-模型对齐的影响，我们开发了一个简单且可定制的框架，称为“Simple Strands Agent”（SSA）。SSA旨在找到跨不同模型家族（如Claude、Gemini、GPT、Grok、Qwen）通用的常见模式，以及少量模型特定的偏好。我们做出两个贡献：（i）我们在流行的智能体基准测试（SWE-Pro、SWE-Verified和Terminal-Bench-2）上**复现或改进了**不同模型提供商家族报告的pass@1性能；（ii）基于对**SSA生成的138k条轨迹的分析**，我们超越了前沿模型之间通常相对均匀的pass@1数字。通过在代码状态空间中表示智能体轨迹，我们观察到问题解决行为中的模型级差异。更细粒度的指标，如编辑频率、测试活动和阶段转换，揭示了单个模型如何在自主问题解决的不同阶段分配努力。

英文摘要

AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we reproduce or improve on the pass@1 performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an analysis of 138k trajectories generated by SSA, we look beyond the pass@1 numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

URL PDF HTML ☆

赞 0 踩 0

2510.15300 2026-06-18 cs.LG 版本更新

DFCA: Decentralized Federated Clustering Algorithm

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues, Stefan Harmeling

发表机构 * Fraunhofer Institute for Software and Systems Engineering（弗劳恩霍夫软件与系统工程研究所）； Lamarr Institute for Machine Learning and AI（拉马尔人工智能与机器学习研究所）

2601.18637 2026-06-18 quant-ph cs.LG stat.ML 版本更新

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan（富士通量子实验室，富士通研究，富士通株式会社，神户，神奈川县211-8588，日本）

Comments 21 pages, 6 figures (added Github repository)

2405.14273 2026-06-18 cs.LG cs.AI math.OC 版本更新

Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based Methods

通过基于梯度的方法在有限时间内精确求解混合整数线性规划的驱动数据反优化问题

Akira Kitaoka

发表机构 * NEC Corporation（日本电气株式会社）

AI总结本文研究了混合整数线性规划中驱动数据反优化问题，揭示了子最优损失的几何结构，并证明了基于梯度的优化方法可以在有限次迭代内达到观测数据的一致性，同时给出了投影子梯度下降法的迭代次数上界。

Comments 66 pages; comments are welcome

详情

AI中文摘要

驱动数据反优化问题（DDIOP）是估计能够解释观测最优解数据的目标函数参数（权重）的问题，广泛应用于混合整数线性规划（MILP）中。在MILP的反优化中，特征的预测误差对权重的不连续性使得直接应用基于梯度的优化方法具有挑战性。本文聚焦于子最优损失，该损失在权重与观测数据完全一致时达到最小值零。我们揭示了该损失的几何结构——它具有凸性和分段线性特性，并且与观测数据完全一致的权重集合具有正的“厚度”而非单一点或薄边界。利用这一结构，我们证明了：首先，一类广泛的基于梯度的优化方法，包括投影子梯度下降法，在有限次迭代中可以达到观测数据的一致性（在有限时间内获得精确解）。其次，对于投影子梯度下降法，我们给出了达到精确一致性的迭代次数的显式上界。第三，当正向问题是一个整数线性规划（ILP）时，我们将其上界表示为仅由样本数、特征维度和约束系数矩阵结构（例如，若系数矩阵是总模矩阵，则迭代次数被显式地限制为样本数平方和维度的多项式）决定的完全显式迭代次数。通过数值实验，我们验证了这种有限步数达到行为。

英文摘要

A data-driven inverse optimization problem (DDIOP) is the problem of estimating the objective-function parameters (weights) that explain observed optimal-solution data, and it arises in many applications, including mixed integer linear programming (MILP). In inverse optimization for MILPs, the prediction error of the features is discontinuous with respect to the weights, so applying gradient-based optimization directly is difficult. In this paper we focus on the suboptimality loss. This loss attains its minimum value, zero, if and only if the weights are exactly consistent with the observed data. We reveal a geometric structure of this loss -- it is convex and piecewise linear, and moreover the set of weights that are exactly consistent with the observed data has a positive ``thickness'' rather than being a single point or a thin boundary -- and use it to show the following. First, a broad class of gradient-based optimization methods, including projected subgradient descent, reaches exact consistency with the observed data in finitely many iterations (an exact solution is obtained in finite time). Second, for projected subgradient descent we give an explicit upper bound on the number of iterations needed to reach exact consistency. Third, when the forward problem is an integer linear program (ILP), we give this upper bound as a fully explicit iteration count determined solely by the number of samples, the dimension of the features, and the structure of the constraint coefficient matrix. Through numerical experiments, we confirm this finite-step attainment behavior.

URL PDF HTML ☆

赞 0 踩 0

2407.00449 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Fully tensorial approach to hypercomplex-valued neural networks

Agnieszka Niemczynowicz, Radosław Antoni Kycia

发表机构 * Faculty of Computer Science and Mathematics, Cracow University of Technology（克拉科夫技术大学计算机科学与数学系）

Comments 23 pages, 3 figures

2512.17696 2026-06-18 cs.LG stat.ME stat.ML 版本更新

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

Yuri Calleo

发表机构 * Unimercatorum（乌尼默卡图姆大学）

详情

DOI: 10.1007/s11135-026-02743-9

英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

URL PDF HTML ☆

赞 0 踩 0

2508.06406 2026-06-18 cs.DC cs.LG 版本更新

Blockchain-Enabled Federated Learning

Murtaza Rangwala, KR Venugopal, Rajkumar Buyya

发表机构 * Quantum Cloud and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne, Australia（量子云与分布式系统实验室，计算机与信息系统学院，墨尔本大学，澳大利亚）； Department of Computer Science and Engineering, University of Visvesvaraya College of Engineering, Bangalore University, India（计算机科学与工程系，维萨瓦拉亚工程学院，班加罗尔大学，印度）

Comments 32 pages, 6 figures, chapter for edited book (Federated Learning: Foundations and Applications)

详情

DOI: 10.1016/B978-0-44-344433-3.00018-6

英文摘要

Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.

URL PDF HTML ☆

赞 0 踩 0

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM 版本更新

A Systematic Review on the Generative AI Applications in Human Medical Genomics

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine（基因组医学系）； D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology（D.O. Ott妇产科与生殖医学研究所）

Comments 31 pages, 5 figures

详情

DOI: 10.3389/fgene.2025.1694070
Journal ref: Frontiers in Genetics 16 (2026) 1694070

英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

URL PDF HTML ☆

赞 0 踩 0

2503.01163 2026-06-18 cs.AI cs.CL cs.HC cs.LG cs.NE 版本更新

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

Comments Accepted to ACL 2025 Findings

2502.15376 2026-06-18 cs.LG cond-mat.mes-hall 版本更新

Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks

Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

发表机构 * Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg（数学科学系，查尔姆斯理工大学和哥德堡大学）； Department of Physics, Stockholm University, AlbaNova University Center（物理系，斯德哥尔摩大学，阿尔巴诺瓦大学中心）； VERSES AI Research Lab, Los Angeles, USA（VERSES AI研究实验室，美国洛杉矶）； Department of Physics, University of Gothenburg（物理系，哥德堡大学）

2410.23503 2026-06-18 cs.LG 版本更新

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada（SADC-CDSS IA儿科，圣-朱斯特医院，蒙特利尔，加拿大）； Solutions Applicare AI Inc., Montreal, Canada（应用爱智AI公司，蒙特利尔，加拿大）； Université de Montréal, Canada（蒙特利尔大学，加拿大）； MEDINT CBRNE Group, Montreal, Canada（MEDINT CBRNE组，蒙特利尔，加拿大）

Comments 12 figures, 12 tables and 39 pages

详情

DOI: 10.3390/diagnostics14232763
Journal ref: Diagnostics 14 (2024) 2763

英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

URL PDF HTML ☆

赞 0 踩 0

2211.01960 2026-06-18 q-bio.NC cs.HC cs.LG 版本更新

FingerFlex: Inferring Finger Trajectories from ECoG signals

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

发表机构 * Bauman Moscow State Technical University（巴乌曼莫斯科国立技术大学）； ALVI Labs（ALVI实验室）； Brain Dynamics Group, Higher School of Economics（高等经济学院脑动力组）； University of Tuebingen（图宾根大学）

Comments 6 pages, 3 figures, 4 tables. Preprint. Under review

1909.13203 2026-06-18 cs.LG stat.ML 版本更新

Learning transport cost from subset correspondence

Ruishan Liu, Akshay Balsubramani, James Zou

发表机构 * Department of Electrical Engineering（电气工程系）； Department of Genetics（遗传学系）； Stanford University（斯坦福大学）； Department of Biomedical Data Science（生物医学数据科学系）

1. 深度学习架构与训练方法 39 篇

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents

Task-Restricted Symmetries in Recurrent Weight Space

SFT Overtraining Predicts Rank Inversion via Entropy Collapse Under RLVR

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

On the Residual Scaling of Looped Transformers: Stability and Transferability

Hierarchical Attention via Domain Decomposition

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

InTrain: Intrinsic Trainability for Zero-Cost Neural Architecture Search

Attention as Frustrated Synchronization

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

GrapNet: A Programmable Dynamic-Architecture Neural Graph Substrate

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

INDEQS: Informed Neural controlled Differential EQuationS

A physical adaptive material motor unit neural network: a hygromorph composite material machine

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

RNN(p) for Power Consumption Forecasting

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Generalized Kullback-Leibler Divergence Loss

Self-Evolving Multi-Agent Systems via Textual Backpropagation

Decomposing Prediction Mechanisms for In-Context Recall

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

Trust Region On-Policy Distillation

HAARES Half-Split Residual Basis Routing for Deep Transformers

Cosmos 3: Omnimodal World Models for Physical AI

2. 表示学习、自监督与对比学习 11 篇

From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization

Compact Geometric Representations of Hierarchies

Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

3. 强化学习与序列决策 26 篇

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Structured Representation Learning with Locally Linear Embeddings and Adaptive Feature Fusion

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

Do as the Romans Do: Learning Universal Behaviors from Heterogeneous Agents

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

Reinforcement Learning Foundation Models Should Already Be A Thing

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

Pareto Q-Learning with Reward Machines

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

Hierarchical Planning with Latent World Models

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

4. 生成模型与概率建模 23 篇