赞 0 踩 0

2606.18326 2026-06-18 cs.LG 新提交

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

基于重正化群神经网络的类别不平衡故障诊断

Evgeny Nikulchev, Dmitry Ilin

发表机构 * MIREA – Russian Technological University（莫斯科俄罗斯技术大学）

AI总结提出RGNet，一种基于重正化群概念的神经网络架构，通过层次化粗粒化特征空间处理类别不平衡和多维噪声，在AI4I数据集上验证了其有效性。

Comments 8 pages

2606.18388 2026-06-18 cs.LG cs.AI cs.CL cs.MA 新提交

InTrain: 面向零成本神经架构搜索的内在可训练性

Qinqin Zhou, Fuhai Chen, Jipeng Wu, Zhiwei Chen, Zhikai Hu, Weiwei Cai

发表机构 * School of Computer and Data Science, Fuzhou University（福州大学计算机与数据科学学院）； School of Computer and Data Science, Minjiang University（闽江学院计算机与数据科学学院）； School of Artificial Intelligence, Nanchang University（南昌大学人工智能学院）； Department of Computer Science, Hong Kong Baptist University（香港浸会大学计算机科学系）； School of Interdisciplinary Medicine and Engineering, Harbin Medical University（哈尔滨医科大学跨学科医学与工程学院）

AI总结提出统一理论代理InTrain，通过几何容量和优化韧性两个协同成分形式化架构的可训练性，在NAS基准上达到与集成方法相当的排序相关性。

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

AI中文摘要

免训练神经架构搜索有望在不进行昂贵训练的情况下高效发现高性能网络。然而，现有的零成本代理依赖于碎片化的启发式方法，未能捕捉基本问题：是什么使一个架构具有可训练性？本文引入内在可训练性（InTrain），一个统一的理论代理，将可训练性形式化为由两个协同成分——几何容量和优化韧性——涌现出的架构不变性。我们通过分析神经信息处理来操作化内在可训练性。几何容量通过激活协方差特征谱的参与比量化，捕捉表示流形的有效维度。优化韧性通过累积梯度健康度测量，评估跨网络深度的反向传播鲁棒性。InTrain通过尺度不变的乘法耦合综合这些维度，我们假设这对于捕捉它们协同、非加性的关系至关重要。在标准NAS基准和搜索空间上的大量实验表明，InTrain达到了与最先进的基于集成的代理相当的排序相关性，并优于其他单指标方法。

英文摘要

Training-free neural architecture search promises efficient discovery of high-performance networks without costly training. However, existing zero-cost proxies rely on fragmented heuristics that fail to capture the fundamental question: what makes an architecture trainable? This paper introduces Intrinsic Trainability (InTrain), a unified theoretical proxy that formalizes trainability as an architectural invariant emerging from two synergistic components: geometric capacity and optimization resilience. We operationalize intrinsic trainability through analysis of neural information processing. Geometric capacity is quantified via the participation ratio of activation covariance eigenspectrum, capturing the effective dimensionality of representation manifolds. Optimization resilience is measured through cumulative gradient health, assessing the robustness of backpropagation across network depth. InTrain synthesizes these dimensions through a scale-invariant multiplicative coupling, which we hypothesize is essential for capturing their synergistic, non-additive relationship. Extensive experiments on standard NAS benchmarks and search spaces demonstrate that InTrain achieves ranking correlations on par with state-of-the-art ensemble-based proxies and outperforms other single-metric methods.

URL PDF HTML ☆

赞 0 踩 0

2606.18694 2026-06-18 cs.LG cond-mat.dis-nn cs.CL cs.NE nlin.AO 新提交

Attention as Frustrated Synchronization

注意力作为受挫同步

Joshua Nunley

发表机构 * Cognitive Science Program（认知科学项目）； Luddy School of Informatics, Computing, and Engineering（信息学、计算与工程学院）； Indiana University Bloomington（印第安纳大学布卢明顿分校）

AI总结提出受挫同步网络（FSN），通过复值耦合核和延迟项实现基于同步的注意力机制，在百万参数级字符级文本和代码任务上优于调优的RoPE-SwiGLU Transformer。

Comments 25 pages, 4 figures. Preliminary report at the 1-10M parameter scale

详情

AI中文摘要

一个完美同步的振荡器网络无法进一步计算，因此基于同步构建的注意力架构必须将其计算定位在结构性的偏离一致中。我们引入了受挫同步网络（FSN），其令牌状态是环面上的相位，整个值通路是一个学习到的复值耦合核，包含谐波和一步延迟。核的每个分量在同步文献意义上都是一个受挫。复相位是静态的Kuramoto-Sakaguchi受挫角，带符号的谐波是排斥性的Daido分量，而延迟项（将每个令牌与其关注的令牌的后继耦合）在代数上与Kuramoto-Sakaguchi耦合相同，其受挫角是数据自身的转移，因此下一个令牌预测被实现为由数据受挫的同步。在匹配百万参数和训练预算的字符级文本和代码任务上，FSN的验证损失在每个测量周期都低于调优的RoPE-SwiGLU Transformer，并且该比较在基线训练至收敛后仍然成立：每30个周期的enwik8种子都低于Transformer收敛的50周期损失1.611，而FSN完成的50周期运行收敛至1.5953 ± 0.0014。一种变体将每个前馈块替换为对学习到的集体模式的平均场耦合，堆栈中不保留多层感知机，其性能与Transformer相当。在自然文本上，无受挫的基础层在每个复制深度上都落后于收敛的Transformer，在长距离复制事件上最差；而核在四个及以上深度处逆转了这种劣势。标题比较在百万参数规模下进行；规模阶梯在四百万参数下完成，优势持续存在，其余分支标记为进行中。

英文摘要

A network of oscillators that synchronizes perfectly computes nothing further, so an attention architecture built from synchronization must locate its computation in structured departures from agreement. We introduce the Frustrated Synchronization Network (FSN), whose token states are phases on a torus and whose entire value pathway is one learned complex coupling kernel over harmonics and a one-step delay. Each component of the kernel is a frustration in the sense of the synchronization literature. The complex phases are static Kuramoto-Sakaguchi frustration angles, the signed harmonics are repulsive Daido components, and the delay term, which couples each token to the successors of the tokens it attends to, is algebraically identical to Kuramoto-Sakaguchi coupling whose frustration angle is the data's own transition, so next-token prediction is implemented as synchronization frustrated by the data. At matched one-million-parameter and training budgets on character-level text and code, the FSN's validation loss is below a tuned RoPE-SwiGLU transformer's at every epoch measured, and the comparison survives training the baseline to convergence: every thirty-epoch enwik8 seed finishes below the transformer's converged fifty-epoch loss of 1.611, and the FSN's completed fifty-epoch runs converge to 1.5953 +/- 0.0014. A variant with every feed-forward block replaced by mean-field coupling to learned collective modes, leaving no multilayer perceptron in the stack, tracks the transformer. On natural text the unfrustrated base layer falls behind the converged transformer at every copy depth, worst on long-range copy events; the kernel reverses the deficit at every depth of four and beyond. Headline comparisons are at the one-million-parameter scale; a scale ladder is complete through four million parameters with the advantage persisting, and remaining arms are marked as in progress.

URL PDF HTML ☆

赞 0 踩 0

2606.18844 2026-06-18 cs.LG 新提交

双通道接地世界建模 (DCGWM)：通过异构外部接地与内向梯度流结构性防止目标干扰崩溃

Akshay Hazare

发表机构 * Independent Researcher（独立研究者）

AI总结提出双通道接地世界建模（DCGWM），通过分区潜空间和内向梯度流，结构性防止联合嵌入预测架构中多目标接地导致的目标干扰崩溃。

Comments Position paper. Experimental validation in progress

详情

AI中文摘要

联合嵌入预测架构（JEPAs）是世界模型表示学习的主要方法。我们识别出基于JEPA的世界模型在接地于两种性质不同的外部信号时存在一种失败模式：物理动力学（稀疏、高幅度、满足约束的梯度修正）和社会行为动力学（扩散、分布匹配的修正）。我们将其称为目标干扰崩溃（OIC）：我们认为在共享潜空间中的联合学习会导致主导通道系统地崩溃从属通道的表示子空间，且仅通过损失加权无法解决。我们提出双通道接地世界建模（DCGWM），通过分区潜空间（物理子空间Z_p，行为子空间Z_b）和内向梯度流，从结构上防止OIC。物理接地通道通过VICReg风格的对齐到物理测量仅更新Z_p；社会行为接地通道通过对齐到涌现多智能体模拟的轨迹仅更新Z_b。通道间接口模块在任务级别耦合子空间，而不产生跨子空间梯度。非对称接地 adherence 损失通过硬铰链惩罚物理违反和软KL惩罚行为发散来惩罚 rollout 漂移。生成渲染层在架构上与潜世界模型隔离。我们给出三个理论结果：分区消除了与OIC相关的梯度干扰路径；每个接地子空间从其对齐目标继承抗崩溃保证；在生成目标几何形状的假设下，生成隔离是必要的。本文建立了问题表述和架构；实验验证正在进行中，将在未来修订中报告。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this Objective Interference Collapse (OIC): we argue that joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone. We propose Dual-Channel Grounded World Modeling (DCGWM), designed to structurally prevent OIC through a partitioned latent space (physical subspace Z_p, behavioral subspace Z_b) with inward-only gradient flow. A Physical Grounding Channel updates only Z_p via VICReg-style alignment to physical measurements; a Social-Behavioral Grounding Channel updates only Z_b via alignment to trajectories from an emergent multi-agent simulation. An Inter-Channel Interface Module couples the subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model. We present three theoretical results: the partition removes the gradient-interference pathway implicated in OIC; each grounded subspace inherits anti-collapse guarantees from its alignment objective; and generative isolation is necessary under a stated assumption on the generative objective's geometry. This manuscript establishes the problem formulation and architecture; experimental validation is ongoing and will be reported in a future revision.

URL PDF HTML ☆

赞 0 踩 0

2606.18703 2026-06-18 cs.LG q-bio.QM 新提交

入乡随俗：从异构智能体学习通用行为

Caleb Chang, Davin Win Kyi, Natasha Jaques, Karen Leung

发表机构 * University of Washington（华盛顿大学）； NVIDIA（英伟达）

AI总结提出GRID方法，从追求不同目标的异构示范者中提取通用奖励，训练通用智能体以学习环境通用能力，避免模式平均偏差，提升下游任务微调效率。

详情

AI中文摘要

人类通常通过观察他人来获取新技能，因为观察到的行为隐含地揭示了如何在环境中行动。然而，从异构群体中获得的观察会引入冲突的行为信号，使得难以确定哪些行为值得模仿。我们通过通用奖励推断与解耦（GRID）来解决这一挑战，这是一种从追求不同目标的异构示范者群体中提取普遍有用行为的社会学习方法。GRID将每个智能体的奖励函数分解为通用奖励（捕捉所有智能体共享的行为）和特定奖励（捕捉个体偏好和目标）。仅基于通用奖励进行训练提供了一种通用预训练的新范式。它产生了一个通用智能体，该智能体内化了通用的环境能力，如安全性和基本任务熟练度，而不会出现困扰标准从示范学习技术的模式平均偏差。这个通用智能体作为微调到下游任务（包括训练中未见过的偏好）的优越先验。在合成基函数分解、多智能体Craftax和连续自动驾驶模拟器（Highway-Env）上的实验证实，GRID以语义上有意义的方式成功解耦了奖励结构，优于标准的从示范学习基线，并实现了更高效和稳定的特化。

英文摘要

Humans often acquire new skills by observing others, since observed behaviors implicitly reveal how to act in an environment. However, observations drawn from a heterogeneous population introduce conflicting behavioral signals, making it difficult to determine which behaviors are worth imitating. We address this challenge with General Reward Inference and Disentanglement (GRID), a social learning method that extracts universally useful behaviors from a heterogeneous population of demonstrators pursuing different goals. GRID decomposes per-agent reward functions into a general reward, capturing behaviors shared across all agents, and specific rewards, capturing individual preferences and objectives. Training exclusively on the general reward provides a new paradigm of generalist pretraining. It yields a generalist agent that internalizes universal environmental competencies, such as safety and basic task proficiency, without the mode-averaging bias that afflicts standard learning from demonstration techniques. This generalist serves as a superior prior for fine-tuning to downstream tasks, including preferences unseen during training. Experiments across a synthetic basis function decomposition, multi-agent Craftax, and a continuous autonomous driving simulator (Highway-Env) confirm that GRID successfully disentangles reward structure in a semantically meaningful way, outperforms standard learning from demonstration baselines, and enables more efficient and stable specialization.

URL PDF HTML ☆

赞 0 踩 0

2606.18785 2026-06-18 cs.LG cs.AI 新提交

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

贝叶斯任意时间帕累托集识别用于多目标多臂老虎机

Lennert Saerens, Bram Silue, Eleni Litsa, Peter Vrancx, Pieter Libin

发表机构 * imec ； Data Science Institute, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, UHasselt（哈瑟尔特大学生物统计学与统计生物信息学跨大学研究所数据科学研究所）

AI总结提出首个任意时间多目标多臂老虎机算法Top-Two帕累托前沿汤普森采样(TTPFTS)，用于帕累托集识别，在合成环境和超大型分子库中验证有效性，并引入不确定性量化指标。

Comments 26 pages, 13 figures

详情

AI中文摘要

识别帕累托最优解对于支持多目标决策至关重要。我们首次提出了一种用于帕累托集识别问题的任意时间多目标多臂老虎机算法，采用贝叶斯方法：Top-Two帕累托前沿汤普森采样（TTPFTS）。我们在合成环境中将TTPFTS与最先进的固定预算帕累托集识别算法进行基准测试。接下来，我们通过高效探索超大型按需合成分子库，在具有挑战性的多目标分子发现场景中展示了其实用性。此外，我们引入了一种新颖的不确定性量化指标，用于估计算法在预测帕累托集上的置信度。我们证明该指标有效代理真实性能，为监控复杂环境中的学习进度提供了一种稳健的方法。最后，我们用算法渐近正确性的理论证明补充了这些实证发现。

英文摘要

Identifying Pareto optimal solutions is critical to support multi-objective decision-making. We introduce the first anytime Multi-Objective Multi-Armed Bandit algorithm for the Pareto Set Identification problem, taking a Bayesian approach: Top-Two Pareto Front Thompson Sampling (TTPFTS). We benchmark TTPFTS against state-of-the-art fixed-budget Pareto Set Identification algorithms on synthetic environments. Next, we demonstrate its practical utility in a challenging multi-objective molecular discovery setting by efficiently exploring an ultra-large synthesis-on-demand molecular library. Furthermore, we introduce a novel uncertainty quantification metric that estimates our algorithm's confidence in the predicted Pareto set. We demonstrate that this metric effectively proxies true performance, yielding a robust methodology for monitoring learning progress in complex settings. Finally, we complement these empirical findings with a theoretical proof of the algorithm's asymptotic correctness.

URL PDF HTML ☆

赞 0 踩 0

2606.18810 2026-06-18 cs.LG cs.AI 新提交

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

从自身解中学习：面向可验证奖励强化学习的自条件化信用分配

Yingyu Shan, Yuhang Guo, Zihao Cheng, Zeming Liu, Xiangrong Zhu, Xinyi Wang, Jiashu Yao, Wei Lin, Hongru Wang, Heyan Huang

发表机构 * Beijing Institute of Technology（北京理工大学）； Beihang University（北京航空航天大学）； Independent Researcher（独立研究者）

AI总结提出SC-GRPO方法，利用自条件化分布间的KL散度作为GRPO梯度的乘性权重，实现细粒度信用分配，在数学、代码和智能体任务上平均提升8.1%。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）在训练LLMs进行推理任务方面取得了显著进展，但代表性方法如GRPO对所有token分配统一信用，浪费了常规token上的梯度，同时低估了关键推理步骤。现有的token级信用分配方法需要超出模型自身rollout的资源。GRPO变体依赖于过程奖励模型或真实答案。知识蒸馏通过每个token的散度分配信用，但需要外部教师（在线策略蒸馏）或特权信息（在线策略自蒸馏）。然而，这些依赖性限制了在纯RLVR设置中的适用性。我们观察到，将模型以其自身验证过的轨迹为条件，会在原始分布和条件分布之间诱导出可测量的每token KL散度，并证明当存在多个验证过的轨迹时，从由验证过的轨迹构建的自教师进行蒸馏会导致不可行的加权平均解。我们提出SC-GRPO（自条件化GRPO），它使用前述KL散度作为GRPO梯度的乘性权重。在涵盖数学、代码和智能体任务的五个基准上，SC-GRPO一致优于GRPO 8.1%，优于DAPO 5.9%，并具有更强的分布外性能。此外，SC-GRPO实现了比OPD更高的性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.

URL PDF HTML ☆

赞 0 踩 0

2606.18812 2026-06-18 cs.LG cs.AI 新提交

Reinforcement Learning Foundation Models Should Already Be A Thing

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France（巴黎高等师范学院，PSL大学，法国巴黎）； Soda team, Inria Saclay, Palaiseau, France（Soda团队，法国国家信息与自动化研究所萨克雷中心，法国帕莱索）

AI总结提出通过合成MDP构建强化学习基础模型，利用固定大小的充分统计量使注意力架构适用，在线和离线实验均优于传统算法。

详情

AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动，而结构化领域（表格预测、时间序列预测、图学习、强化学习）则不然。替代方案是合成数据，它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中：TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先}，强化学习是明显的空白：采样一个合成MDP与采样一个合成表格数据集一样可行，然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次}，MDP允许一个固定大小的充分统计量，独立于观察到的回合且形状为表格形式，这使得它们直接适用于用于表格基础模型的基于注意力的架构，只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证，我们完全在合成MDP上训练一个模型，并表明，无需任务特定的调优，它就能在上下文中解决留出的表格基准，包括在线和离线：在线时，使用比UCB-VI和表格Q-learning少得多的回合；离线时，与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains (tabular prediction, time-series forecasting, graph learning, reinforcement learning) are not. The substitute is synthetic data, which shifts the burden from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train one model entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

URL PDF HTML ☆

赞 0 踩 0

2606.18820 2026-06-18 cs.LG cs.AI 新提交

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

成熟马尔可夫决策过程：信息增加与动作集缩小下的决策制定

Jiaxi Liu, Aiping Yang, Yuhang Yang, Shuqi Zhang, Zewei Dong, Jiangming Yang, Xuebin Chen

发表机构 * Ant International（蚂蚁国际）； School of Economics, Sichuan University（四川大学经济学院）； School of Economics, Fudan University（复旦大学经济学院）

AI总结针对决策过程中信息增加与动作集缩小的不对称性，提出成熟马尔可夫决策过程（MMDP）框架，并基于过期动作优先级原则开发结构感知强化学习方法，实验证明其能提升学习效率。

Comments 25 pages, 9 figures

详情

AI中文摘要

序列决策问题通常表现出信息和决策灵活性的不对称演化：随着决策周期的展开，智能体获得更丰富的信息，而由于操作截止、承诺或资源约束，可行动作逐渐过期。标准的MDP公式通常将这种结构扁平化为阶段相关的状态描述和动作掩码，从而掩盖了嵌套的信息-动作不对称性，而这种不对称性决定了哪些决策是紧急的、哪些可以推迟。我们引入了成熟马尔可夫决策过程（MMDP），这是一种围绕这种信息-动作不对称性构建的公式。我们通过一个过期动作优先级原则来刻画其关键后果之一，该原则识别出必须在下一阶段之前解决的动作。受此结构启发，我们开发了一个结构感知的强化学习框架，包括阶段感知的策略设计、过期动作抽象以及带有蒸馏的搜索增强学习。在受控的多供应商补货问题、复杂度递增的简化现金管理环境以及生产级模拟器上的实验表明，显式建模这种不对称性可以提高学习效率，并且随着决策问题的规模扩大，其价值日益增加。

英文摘要

Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information--action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information--action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

URL PDF HTML ☆

赞 0 踩 0

2606.18910 2026-06-18 cs.LG cs.CL 新提交

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

REVES：通过修订与验证增强的测试时扩展训练

Yuanxin Liu, Ruida Zhou, Xinyan Zhao, Amr Sharaf, Hongzhou Lin, Arijit Biswas, Mohammad Ghavamzadeh, Zhaoran Wang, Mingyi Hong

发表机构 * Northwestern University（西北大学）； Amazon AGI（亚马逊人工智能实验室）； Qualcomm AI Research（高通人工智能研究）； University of Minnesota（明尼苏达大学）

AI总结提出REVES框架，通过将中间步骤的“接近正确”答案转化为解耦的修订和验证提示，实现高效的离策略数据生成，提升大语言模型的多步推理能力，在LiveCodeBench上比强化学习基线高6.5分。

详情

AI中文摘要

通过顺序修订进行测试时扩展已成为增强大语言模型（LLM）推理能力的强大范式。然而，标准的后训练方法主要优化单次目标，与多步推理动态存在根本性不匹配。虽然最近的工作将其视为多轮强化学习（RL），但传统方法直接优化多步轨迹，未能进一步利用模型可以从纠正中学习的中间步骤中的高质量错误。我们提出了一个两阶段迭代框架，交替进行在线数据/提示增强和策略优化。通过将成功恢复轨迹中的中间步骤（“接近正确”答案）转化为解耦的修订和验证提示，我们的方法将训练集中在有效的答案转换和错误识别上。与标准的多轮RL相比，这种方法实现了高效的离策略数据生成，并减少了长程采样的计算开销。在LiveCodeBench上，使用公开可用的测试用例作为反馈，我们观察到比RL基线高6.5分，比标准多轮训练高4.0分。除了编码，我们的方法在圆填充问题上达到了先前报告的SOTA结果，同时使用了最小的基础模型（4B）和远少于更大进化搜索系统的采样次数。在真实验证下的数学结果进一步证实了改进的纠正能力。该方法还泛化到分布外的约束满足谜题，如n皇后和迷你数独，其中正确性完全由问题约束定义。代码可在该https URL获取。

英文摘要

Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a fundamental misalignment with multi-step inference dynamics. While recent work treats this as multi-turn reinforcement learning (RL), conventional approaches optimize over the multi-step trajectories directly, failing to further exploit the high-quality mistakes in intermediate steps that model can learn from correcting them. We propose a two-stage iterative framework that alternates between online data/prompt augmentation and policy optimization. By converting the intermediate steps (``near-miss'' answers) in the successful recovery trajectories into decoupled revision and verification prompts, our approach concentrates training on both effective answer transformation and error identification. This approach enables efficient off-policy data generation and reduces the computational overhead of long-horizon sampling compared to standard multi-turn RL. On LiveCodeBench, using publicly available test cases as feedback, we observe gains of +6.5 points over the RL baseline and +4.0 points over standard multi-turn training. Beyond coding, our approach matches the previously reported SOTA result on circle packing while using the smallest base model (4B) and far fewer rollouts than the much larger evolutionary search systems. Math results under ground-truth verification further confirm improved correction ability. It also generalizes to out-of-distribution constraint-satisfaction puzzles such as n\_queens and mini\_sudoku, where correctness is defined entirely by problem constraints. Code is available at https://github.com/yxliu02/REVES.git.

URL PDF HTML ☆

赞 0 踩 0

2606.18963 2026-06-18 cs.LG 新提交

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

无环境奖励的固定通道感知事件流在线奖惩学习

Zirong Li

发表机构 * Zirong Li（李 Cirong）

AI总结提出OHIRL框架，在无标量奖励下通过固定通道感知流进行在线奖惩学习，利用内部轨迹评估器推断感知维度的效价，在XOR任务和CartPole等控制任务中达到高准确率。

Comments 9 pages, 5 figures, 6 tables; 13-page technical supplement

详情

AI中文摘要

我们研究当环境不提供标量奖励或评估标签时的在线奖惩学习。在每一步，智能体仅接收一个固定通道的感知数据包，诸如疼痛、能量、接触、损伤或认知错误等量被视为感知维度，其效价必须从转移后果中推断。OHIRL分离了四个角色：M_psi学习下一数据包预测，D_omega建模残差动力学，C_eta是一个固定的内部转移后轨迹评估器，B_xi学习使用由此产生的价值证据进行后续策略更新和动作评分。C_eta采用恢复正性、持久/增长负性的残差调节取向；系数来源审计显示，等单元、原始等值和随机单调变体保留了超过92%的已发布顶级动作排名，而符号反转保留了0%。无奖励协议暴露观察转移，同时隐藏环境奖励、延迟外部评估器、成功标签和动作好坏标签。条件误差分解将B_xi的证据估计误差与残差策略优化误差分离。在2x2-XOR数据包任务中，药物和辣椒在视觉XOR上下文中获得相反的价值，并且相同的疼痛或辣度增加可能根据后果结构为正或负；B_xi达到0.952的平衡奖励符号准确率。在完整的在线交错审计中，M_psi达到留出R2=0.907，B_xi达到0.940的符号准确率，策略达到0.979的最优动作准确率，而即时数据包分数、预测误差奖励、打乱目标、零奖励和误差减少控制均崩溃。隐藏奖励的CartPole和Taxi控制、公共上下文无泄漏审计以及模块角色消融进一步测试了信息边界和组件必要性。

英文摘要

We study online reward-punishment learning when the environment provides no scalar reward or evaluative label. At each step the agent receives only a fixed-channel perceptual packet, and quantities such as pain, energy, contact, damage, or cognitive error are treated as perceptual dimensions whose valence must be inferred from transition consequences. OHIRL separates four roles: M_psi learns next-packet prediction, D_omega models residual dynamics, C_eta is a fixed internal post-transition trajectory evaluator, and B_xi learns to use the resulting value evidence for later policy updates and action scoring. C_eta uses a recovery-positive and persistence/growth-negative residual-regulation orientation; a coefficient-origin audit shows that equal-unit, raw-equal, and random monotone variants preserve more than 92% of the released top-action rankings, while sign inversion preserves 0%. The reward-free protocol exposes observation transitions while withholding environment rewards, delayed external evaluators, success labels, and action-goodness labels. A conditional error decomposition separates B_xi evidence-estimation error from residual policy-optimization error. In a 2x2-XOR packet task, medicine and chili acquire opposite value under visual XOR contexts, and the same pain or spice increase can be positive or negative depending on consequence structure; B_xi reaches 0.952 balanced reward-sign accuracy. In a full online-interleaved audit, M_psi reaches holdout R2=0.907, B_xi reaches 0.940 sign accuracy, and the policy reaches 0.979 optimal-action accuracy, while immediate packet scores, prediction-error rewards, shuffled targets, zero reward, and error-reduction controls collapse. Hidden-reward CartPole and Taxi controls, public-context no-leakage audits, and module-role ablations further test information boundaries and component necessity.

URL PDF HTML ☆

赞 0 踩 0

2606.19134 2026-06-18 cs.LG cs.AI 新提交

Pareto Q-Learning with Reward Machines

带奖励机的帕累托Q学习

Arnaud Lequen, Clément Legrand-Lixon, Léo Saulières

AI总结提出PQLRM算法，结合帕累托Q学习和奖励机，在多目标强化学习中高效逼近帕累托前沿，并处理非马尔可夫奖励。

Comments Accepted at the ICAPS 2026 Workshop on Bridging the Gap Between AI Planning and (Reinforcement) Learning (PRL)

2606.19199 2026-06-18 cs.LG cs.AI 新提交

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

预测关键因素：面向决策的强化学习用于未知离开时间的受控电动汽车充电

Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder

发表机构 * Ghent University -- imec（根特大学 -- imec）

AI总结针对电动汽车充电中离开时间未知导致强化学习策略效果差的问题，提出面向决策的强化学习框架，联合训练预测器与控制器，实现端到端优化，使总奖励提升14%，未供应能量减少55%。

Comments ACM e-Energy 2026 5 pages, 1 figure, 1 table

详情

DOI: 10.1145/3744255.3811736

AI中文摘要

近年来电动汽车的普及给电力系统带来了挑战，包括峰值需求增加和潜在的电网不稳定。基于强化学习的智能充电控制可以通过从历史数据中学习时间和上下文模式来缓解这些问题。然而，在现实场景中，关键特征（如离开时间）通常不可用。这使得强化学习智能体更难学习和执行有效的充电策略。为了减轻这种不确定性，训练好的预测器可以从可用数据中近似未知特征。然而，由于这些预测模型通常针对准确性（而非对下游智能体决策质量的影响）进行训练，它们的误差可能会传播并阻碍使用预测的控制器的整体性能。为了避免这种情况，我们提出了一种面向决策的强化学习框架，其中预测器是端到端训练的，即通过强化学习智能体采取的充电策略动作的反馈。这种预测器和控制器的联合训练最终产生了更高质量的动作：与没有离开时间预测的强化学习方法相比，我们提出的面向决策的强化学习方法产生了更优的充电决策，总奖励提高了14%，未供应能量（即由于电动汽车已离开而未能进行的充电）减少了55%。

英文摘要

The recent growth of EV adoption poses challenges for power systems, including increased peak demand and potential grid instability. Smart control of EV charging -- e.g., based on reinforcement learning (RL) -- can alleviate these issues by learning temporal and contextual patterns from historical data. Yet, in real-world scenarios, key features, such as departure time, often are unavailable. This, in turn, makes it harder for an RL agent to learn and execute an effective charging policy. To mitigate this uncertainty, a trained forecaster can approximate the unknown features from available data. However, since these forecasting models are typically trained for accuracy (rather than their impact on a downstream agent's decision quality), their errors may propagate and hinder the overall performance of a controller that is using the forecasts. To avoid this, we propose a decision-focused RL (DF-RL) framework in which the forecaster is trained end-to-end, i.e., with feedback from the charging policy actions taken by the RL agent. Such joint training of both the forecaster and controller ultimately results in higher-quality actions: our proposed DF-RL method yields superior charging decisions compared to other baselines, achieving up to a 14% improvement in total reward and a 55% reduction of unsupplied energy (i.e., charging that failed to happen because the EV already left), relative to the RL method without departure time forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.19236 2026-06-18 cs.LG cs.AI cs.CL 新提交

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE: 基于惊讶度的令牌级优势重加权以实现策略熵稳定性

Haipeng Luo, Qingfeng Sun, Songli Wu, Can Xu, Wenfeng Deng, Han Hu, Yansong Tang

发表机构 * Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； Tencent Hunyuan（腾讯混元）

AI总结针对GRPO等RL算法中策略熵崩溃问题，提出STARE方法，通过惊讶度分位数识别熵关键令牌并重加权其优势，结合目标熵闭环门控稳定熵，在1.5B-32B模型和多种任务上实现稳定训练，AIME24/25准确率提升4%-8%。

Comments LLM, Reinforcement Learning

详情

AI中文摘要

基于可验证奖励的强化学习算法（如GRPO）已成为LLMs复杂推理的主流后训练范式，但通常在训练中遭受策略熵崩溃。我们对GRPO下的令牌级熵动态进行一阶梯度分析，识别出令牌级信用分配不匹配：每个令牌的熵变化分解为轨迹级优势与下一个令牌分布上的熵敏感函数的乘积，产生优势-惊讶度四象限结构和近临界性质。受此启发，我们提出STARE（基于惊讶度的令牌级优势重加权以实现策略熵稳定性），该方法通过批次内惊讶度分位数识别熵关键令牌子集，选择性重加权其有效优势，并引入目标熵闭环门控以实现稳定的熵调节。在1.5B至32B的模型规模以及三个任务族（短思维链、长思维链和多轮工具使用）上，STARE在数千步内维持稳定的RL训练，同时将策略熵保持在目标带内。在AIME24和AIME25上，STARE在平均准确率上比DAPO和其他竞争基线高出4%-8%，反思令牌和响应长度同步增长，表明持续探索-利用平衡进一步释放了RL训练潜力。代码可在https://github.com/xxxx获取。

英文摘要

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

URL PDF HTML ☆

赞 0 踩 0

2606.19328 2026-06-18 cs.LG cs.AI cs.RO 新提交

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

UBP2: 不确定性平衡的偏好规划用于高效基于偏好的强化学习

Mohamed Nabail, Leo Cheng, Jingmin Wang, Nicholas Rhinehart

发表机构 * Learning, Embodied Autonomy, and Forecasting (LEAF) Lab, University of Toronto（多伦多大学学习、具身自主与预测（LEAF）实验室）

AI总结提出UBP2方法，通过联合推理奖励、动力学和值函数的不确定性来主动引导探索，在Meta-World基准上显著提高了样本效率。

详情

AI中文摘要

基于偏好的强化学习提供了一种从行为的成对比较中学习奖励模型的方法，绕过了显式奖励设计的需求。然而，现有方法通常依赖于被动数据收集，并且在学习的早期阶段样本效率低下。我们引入了一种基于模型的方法，通过联合推理奖励、动力学和值函数的不确定性来主动引导探索。我们的方法，不确定性平衡的偏好规划（UBP2），使用奖励、动力学和值函数模型的集成，根据结合了期望奖励、终值认知不确定性的统一评分来评估候选轨迹。在此目标下的规划产生了利用和信息获取之间的显式权衡，无需临时的探索启发式。在标准正则性假设下，我们为有限时域和无限时域设置建立了次线性遗憾保证。实验上，在Meta-World基准上的实验表明，UBP2比无模型的基于偏好的方法和非乐观的基于模型的基线方法实现了更高的样本效率。

英文摘要

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introduce a model-based approach that actively directs exploration by jointly reasoning over uncertainties in the reward, dynamics, and value functions. Our method, Uncertainty-Balanced Preference Planning (UBP2), uses ensembles of reward, dynamics, and value function models to evaluate candidate trajectories according to a unified score that combines expected reward, terminal value, and epistemic uncertainty. Planning under this objective yields an explicit tradeoff between exploitation and information acquisition without requiring ad hoc exploration heuristics. Under standard regularity assumptions, we establish sublinear regret guarantees for both finite-horizon and infinite-horizon settings. Empirically, experiments on the Meta-World benchmark show UBP2 achieves substantially higher sample efficiency than model-free preference-based methods and non-optimistic model-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.18509 2026-06-18 cs.LG stat.ML 新提交

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

概念调制模型：可识别性与外推的统一框架

Soheun Yi, Yizhou Lu, Chandler Squires, Pradeep Ravikumar

发表机构 * Department of Statistics and Data Science, Carnegie Mellon University（卡内基梅隆大学统计与数据科学系）； Machine Learning Department, Carnegie Mellon University（卡内基梅隆大学机器学习系）

AI总结提出概念调制模型（CMMs），通过属性势统一条件潜变量模型的可识别性与外推分析，将基于转移的可识别性提升至条件设置，并导出代数外推准则。

详情

AI中文摘要

条件潜变量模型中的可靠泛化需要理解可识别性和外推：观测属性间的变化如何决定潜在结构，以及该结构如何决定未见属性上的分布。然而，现有的可识别性和外推保证大多是模型特定的，在非线性ICA、因果表示学习、扰动建模及相关条件潜变量模型中分别进行分析。我们引入概念调制模型（CMMs），这是一类属性索引的条件生成模型，其结构为$A\to \Lambda \to C\to X$，其中属性选择调制器，调制器诱导潜在概念法则，概念生成观测特征。CMMs通过展示观测属性上的特征一致性诱导受CMM类约束的潜在概念转移，将基于转移的可识别性提升至条件设置。我们通过属性势（属性条件概念法则之间的对数密度比）表达这些约束，将通用提升步骤与模型特定的刚性论证分离。相同的势控制外推：当且仅当传输的属性势恒等式扩展到这些属性时，未见属性上的一致性成立。这导出了代数外推准则，识别出几个现有可识别性和外推结果背后的共同基于势的证明对象，并且当与这些工作中的模型特定刚性论证结合时，恢复了它们所述的结论。

英文摘要

Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to Λ\to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

URL PDF HTML ☆

赞 0 踩 0

2606.18898 2026-06-18 cs.LG 新提交

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

基于潜在随机微分方程的稀疏不规则多元时间序列异常检测

Martin Uray, Dominik Geng, Florian Graf, Stefan Huber, Roland Kwitt

发表机构 * Josef Ressel Centre for Intelligent and Secure Industrial Automation, University of Applied Sciences, Salzburg, Austria（约瑟夫·雷斯尔智能与安全工业自动化中心，应用科学大学，萨尔茨堡，奥地利）； University of Salzburg, Austria（萨尔茨堡大学，奥地利）

AI总结针对现实世界中稀疏、不规则采样的多元时间序列，提出基于潜在随机微分方程的生成方法，将观测投影到连续时间随机动力系统，处理缺失和不规则采样，并捕获循环行为，在六个基准数据集上取得最优结果。

Comments Preprint

详情

AI中文摘要

多元时间序列异常检测（MTSAD）在工业监控、网络安全或医疗保健等广泛应用领域至关重要。现实世界的数据通常是稀疏的、不规则采样的或部分观测的，但现有方法假设时间序列均匀采样。我们提出了一种基于潜在随机微分方程的生成方法，将观测到的时间序列投影到一个连续时间随机动力系统上，能够直接处理缺失观测和不规则采样，同时自然捕获许多现实世界用例固有的可能循环行为。在六个异常基准数据集上的实验表明，我们提出的方法在现有最先进基线中排名第一。我们进一步证明，在严重数据稀疏性下，我们的方法保持鲁棒性，而测试的基线方法性能显著下降。这些结果突显了潜在随机微分方程作为多元时间序列异常检测的自然归纳偏置，尤其是在存在现实世界不规则性的情况下。

英文摘要

Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

URL PDF HTML ☆

赞 0 踩 0

2606.18997 2026-06-18 cs.LG 新提交

DIPHINE: Diffusion-based $Φ$-ID Neural Estimator

DIPHINE: 基于扩散的 $\Phi$ID 神经估计器

Simon Pedro Galeano Munoz, Mustapha Bounoua, Giulio Franzese, Pietro Michiardi, Maurizio Filippone

发表机构 * KAUST（卡塔尔科学与技术部）； EURECOM（欧雷康）

AI总结提出首个基于扩散模型的神经估计器 DIPHINE，用于计算连续非高斯动力系统的集成信息分解（$\Phi$ID），通过单个摊销网络联合估计所有互信息项，并利用 Möbius 逆变换恢复十六个原子。

详情

AI中文摘要

揭示真实世界复杂系统的真实信息架构需要厘清其组件如何随时间独特存储、冗余共享和协同整合信息。集成信息分解（$\Phi$ID）是一个框架，用于将多变量系统的信息动态分解为十六个非重叠原子，这些原子表征冗余、独特和协同的信息存储、传输和整合模式。现有的计算 $\Phi$ID 的方法仅限于高斯或离散系统，阻碍了其在连续非高斯动力系统中的应用。我们通过提出 DIPHINE（基于扩散的 $\Phi$ID 神经估计器）来解决这一限制，这是首个利用基于分数的扩散模型从单个摊销网络中联合估计 $\Phi$ID 所需的所有互信息项的神经估计器，并通过 Möbius 逆变换恢复十六个原子。我们提供了通过逆变换的误差传播的理论分析，表明从互信息到原子的映射的雅可比矩阵是整数值的，并且协同到协同原子被证明是最难估计的。我们在合成基准上展示了准确恢复真实原子，与已建立的互信息估计器相比具有优越性能，并在涉及真实数据的应用中无需任何分布假设即可提取生理上可解释的信息动态结构。

英文摘要

Uncovering the true informational architecture of real-world complex systems requires disentangling how their components uniquely store, redundantly share, and synergistically integrate information over time. Integrated Information Decomposition ($Φ$ID) is a framework for decomposing the information dynamics of multivariate systems into sixteen non-overlapping atoms that characterize redundant, unique, and synergistic modes of information storage, transfer, and integration. Existing methods to compute $Φ$ID are restricted to Gaussian or discrete systems, preventing its application to continuous non-Gaussian dynamical systems. We address this limitation by proposing DIPHINE (Diffusion-based $Φ$-ID Neural Estimator), the first neural estimator that leverages score-based diffusion models to jointly estimate all the mutual information terms required by $Φ$ID from a single amortized network, recovering the sixteen atoms through Möbius inversion. We provide a theoretical analysis of error propagation through the inversion, showing that the Jacobian of the mapping from mutual informations to atoms is integer-valued and that the synergy-to-synergy atom is provably the hardest to estimate. We demonstrate accurate recovery of ground-truth atoms on synthetic benchmarks, superior performance compared to established mutual information estimators, and the ability to extract physiologically interpretable information-dynamic structure on an application involving real data without any distributional assumptions.

URL PDF HTML ☆

赞 0 踩 0

2606.19162 2026-06-18 cs.LG cs.CV 新提交

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

奖励一直就在你的数据中：用判别器引导的强化学习纠正流匹配

Nicolas Beltran-Velez, Felix Friedrich, Zhang Xiaofeng, Reyhane Askari-Hemmat, Xiaochuang Han, Adriana Romero-Soriano, Michal Drozdzal

发表机构 * FAIR at Meta（Meta FAIR）； Columbia University（哥伦比亚大学）； McGill University（麦吉尔大学）； Canada CIFAR AI Chair（加拿大CIFAR人工智能主席）

AI总结针对流匹配模型因损失函数与样本质量不匹配导致的视觉缺陷，提出判别器引导的强化学习（DRL），利用预训练空间中判别器的logit作为奖励，显著提升无引导FID和语义FD，并改善偏好对齐。

Comments 84 pages, including appendices

详情

AI中文摘要

得分匹配和流匹配模型通常依赖基于偏好的强化学习来实现两个目的：与主观偏好对齐，以及令人惊讶地恢复视觉真实性和连贯对象结构等属性——而这些属性本应通过匹配训练从数据本身学习。我们认为这反映了结构上的不匹配。匹配损失衡量训练时边缘分布下速度或得分场的$\ell_2$回归误差，这一代理指标与决定推理时样本质量的视觉和语义属性对齐不良。给定一个与这些属性对齐的奖励，强化学习通过评估模型自身生成的样本并直接遵循奖励景观来规避不匹配。挑战在于如何在不依赖人类偏好的情况下获得这样的奖励，因为人类偏好昂贵且会将数据真实性与标注者倾向混为一谈。我们提出判别器引导的强化学习（DRL）。DRL训练一个判别器，在预训练表示空间中区分数据样本和基础模型样本，并将其logit作为KL正则化强化学习中的奖励。预训练空间将判别器限制在感知有意义的方向上，而logit估计数据与模型之间的对数似然比，这是针对数据分布的最优奖励。在SiT、JiT、REPA和RAE上，DRL降低了无引导FID（例如，SiT上从9.38降至2.62）和语义空间FD（例如，SiT上DINOv3从88.2降至19.3），在所有骨干网络上均有一致提升，并且在没有经过偏好奖励训练的情况下改善了人类偏好奖励。在后续基于偏好的后训练中，DRL还在偏好奖励与图像保真度之间产生了更好的帕累托前沿，在提高对齐度的同时减少了过饱和和过亮等低级伪影。

英文摘要

Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure $\ell_2$ regression error on the velocity or score field under training-time marginals, a proxy poorly aligned with the visual and semantic properties that determine sample quality at inference. Given a reward aligned with these properties, RL sidesteps the mismatch by evaluating the model on its own samples and following the reward landscape directly. The challenge is to obtain such a reward without relying on human preferences, which are expensive and conflate data realism with annotator inclinations. We propose Discriminator-Guided RL (DRL). DRL trains a discriminator to separate data from base-model samples in a pretrained representation space and uses its logit as the reward in KL-regularized RL. The pretrained space restricts the discriminator to perceptually meaningful directions, and the logit estimates the log-likelihood ratio between data and model, which is the optimal reward for targeting the data distribution. Across SiT, JiT, REPA, and RAE, DRL reduces guidance-free FID (e.g., $9.38 \to 2.62$ on SiT) and semantic-space FD (e.g., $88.2 \to 19.3$ on DINOv3 for SiT), with consistent gains across all backbones, and improves human-preference rewards without training on them. It also yields a better Pareto frontier between preference reward and image fidelity under subsequent preference-based post-training, increasing alignment while reducing low-level artifacts such as oversaturation and excessive brightness.

URL PDF HTML ☆

赞 0 踩 0

2606.19264 2026-06-18 cs.LG cs.CL 新提交

Structured Inference with Large Language Gibbs

大语言吉布斯结构化推理

Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

AI总结提出大语言吉布斯方法，利用大语言模型的条件分布作为转移算子进行结构化概率推理，通过迭代重采样变量避免顺序偏差，在合成分布、一致性推理和贝叶斯结构学习中验证有效性。

Comments Code: https://github.com/hyeok9855/large-language-gibbs

详情

AI中文摘要

大型语言模型（LLMs）中编码的知识可以作为描述复杂世界变量的结构化推理的基础，但以概率一致的方式访问这些知识构成了一个困难的推理问题。我们提出了大语言吉布斯，一种结构化概率推理方案，它使用LLM的条件分布作为转移算子。不是通过单次自回归生成来采样结构化对象，而是利用LLM的下一个标记条件分布，在给定其他变量的条件下迭代地重采样单个变量。这种方法避免了顺序依赖偏差，并产生一个反映所有局部条件分布之间折衷的平稳分布。我们将这种方法应用于从合成分布中采样、一致性推理任务和贝叶斯结构学习。结果表明，在通过噪声LLM条件分布可访问的世界先验下，MCMC中使用LLM条件分布是用于结构化概率推理的一次性生成的实际替代方案。

英文摘要

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult inference problem. We propose Large Language Gibbs, a scheme for structured probabilistic inference that uses conditional distributions of an LLM as transition operators. Rather than sampling structured objects through single-pass autoregressive generation, we iteratively resample individual variables conditioned on others using an LLM's next-token conditionals. This approach avoids order-dependent biases and produces a stationary distribution that reflects a compromise between all local conditionals. We apply this approach to sampling from synthetic distributions, consistent reasoning tasks, and Bayesian structure learning. The results suggest that the use of LLM conditionals in MCMC is a practical alternative to one-pass generation for structured probabilistic inference under a world prior accessible through noisy LLM conditionals.

URL PDF HTML ☆

赞 0 踩 0

2606.19315 2026-06-18 cs.LG 新提交

测量噪声限制了非线性模型在生物医学预测中相对于线性模型的优势

Marc-Andre Schulz, Kerstin Ritter

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen（赫蒂人工智能脑健康研究所，图宾根大学）； Tübingen AI Center, University of Tübingen（图宾根人工智能中心，图宾根大学）； Department of Psychiatry and Neurosciences, Charité – Universitätsmedizin Berlin（精神病学与神经科学系，柏林夏里特医学院）； Bernstein Center for Computational Neuroscience, Berlin（伯恩斯坦计算神经科学中心，柏林）； German Center for Mental Health (DZPG), partner site Tübingen（德国心理健康中心（DZPG），图宾根合作站点）

AI总结本文指出，在生物医学表格数据中，测量噪声会削弱非线性结构，导致非线性模型与线性模型性能相当，并提出了一个精确的超额风险恒等式，揭示了测量可靠性、样本量和特征表示三个条件必须同时满足才能体现非线性优势。

详情

AI中文摘要

在生物医学表格数据上，诸如深度网络、梯度提升树和核方法等灵活模型，在给定相同特征的情况下，反复被线性回归和逻辑回归匹配或击败。通常的反应是将其视为模型方面的不足，需要通过更多数据、更好的架构或调参来修复，假设非线性结构存在而模型未能捕捉到。我们认为，当限制因素是测量而非模型时（这在生物医学中经常发生），这些修复无法奏效。加性噪声模糊了群体最优预测器，并且由于模糊在去除函数的广泛形状之前先去除精细、快速变化的细节，它比线性结构更快地抹去非线性结构。一个k阶交互作用被特征可靠性的k次幂衰减，而线性部分只衰减一次。在生物医学测量典型的可靠性下，即使底层生物学是强非线性的，非线性优势也可能消失，并且噪声所移除的部分无法通过更大的队列或更灵活的模型恢复，只能通过更好的测量。非线性是隐藏的，而非缺失，线性模型与灵活模型之间的平局本身并不能对生物学做出定论。这些片段是经典的，来自测量误差统计、心理测量学和高斯分析，我们将它们组合成一个精确的超额风险恒等式。测量可靠性是与样本量和特征表示并列的三个条件之一，必须对齐才能使灵活模型发挥作用，而它们共同只留下一个狭窄的窗口，大多数生物医学任务落在此窗口之外。在140个英国生物银行任务中，灵活模型与线性模型之间的差距（如果存在）带有预测的噪声特征，并且这三个条件可以通过干预而非仅通过基准测试来分离。

英文摘要

On biomedical tabular data, flexible models such as deep networks, gradient-boosted trees, and kernel methods are repeatedly matched or beaten by linear and logistic regression given the same features. The usual reaction is to treat this as a model-side shortfall, to be fixed with more data, a better architecture, or tuning, on the assumption that the nonlinear structure is there and the model has failed to capture it. We argue that these fixes cannot help when the binding limit is the measurement rather than the model, as it frequently is in biomedicine. Additive noise blurs the population-optimal predictor, and because blurring removes a function's fine, rapidly varying detail before its broad shape, it erases nonlinear structure faster than linear structure. A degree-$k$ interaction is attenuated by the $k$-th power of feature reliability, while the linear part is attenuated only once. At the reliabilities typical of biomedical measurement, the nonlinear advantage can vanish even when the underlying biology is strongly nonlinear, and what the noise removes cannot be recovered by a larger cohort or a more flexible model, only by better measurement. The nonlinearity is hidden, not absent, and a tie between linear and flexible models is not by itself a verdict on the biology. These pieces are classical, drawn from measurement-error statistics, psychometrics, and Gaussian analysis, and we assemble them into an exact excess-risk identity. Measurement reliability is one of three conditions, alongside sample size and feature representation, that must align for a flexible model to help, and together they leave only a narrow window that most biomedical tasks fall outside. Across 140 UK Biobank tasks, the gap between flexible and linear models, where it exists, carries the predicted noise signature, and the three conditions can be separated by intervention but not by a benchmark alone.

URL PDF HTML ☆

赞 0 踩 0

2606.18465 2026-06-18 cs.LG cs.AI 新提交

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

权重范数在Grokking中控制什么？交叉熵下的对数尺度中介作用

Truong Xuan Khanh

发表机构 * H&K Research Studio, Clevix LLC

AI总结本文通过固定权重范数并改变输出温度，发现Grokking延迟主要由对数尺度（logit scale）决定，权重范数仅通过影响对数尺度间接起作用。

Comments 16 papges, 10 tables and 4 figures. Code and data to reproduce all numbers, tables, and figures: https://github.com/ClevixLab/grokking-logit-scale

详情

AI中文摘要

Grokking，即从记忆到泛化的延迟跳跃，通常与权重范数相关：范数越小，泛化越早。我们探究范数实际控制什么。通过钳位固定权重范数并仅改变输出温度，我们在交叉熵下将Grokking延迟滑动到其整个范数诱导范围；将有效对数尺度匹配回基线可恢复两个模数下约85%的延迟。在范数和温度的网格上，延迟仅由对数尺度决定（R2 = 0.97），范数仅额外贡献1-2%。该效应依赖于损失函数：在均方误差下，对数尺度被固定，范数通过不同路径起作用。记忆控制、float64 softmax崩溃审计和无LayerNorm的Transformer均指向同一通道。从同一状态分叉，延迟遵循钳位的范数值而非钳位操作本身，这排除了重缩放伪影。近端变量是对数尺度及其驱动的softmax饱和；权重范数仅是上游手柄。所有数字、表格和图表均可从发布的代码和数据中复现。

英文摘要

Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying only an output temperature, we slide the grokking delay across its entire norm-induced range under cross-entropy; matching the effective logit scale back to baseline recovers about 85% of the delay at two moduli. Across a grid of norms and temperatures the delay collapses onto the logit scale alone (R2 = 0.97), with the norm adding 1-2% beyond it. The effect is loss-dependent: under mean-squared error the logit scale is pinned and the norm acts through a different route. A memorization control, a float64 softmax-collapse audit, and a no-LayerNorm transformer point to the same channel. Forking arms from one identical state, the delay follows the held norm value and not the clamp operation, which closes a rescaling-artifact concern. The proximal variable is the logit scale and the softmax saturation it drives; the weight norm is only an upstream handle. All numbers, tables, and figures reproduce from released code and data.

URL PDF HTML ☆

赞 0 踩 0

2606.18538 2026-06-18 cs.LG stat.ML 新提交

Effects of sparsity and superposition on loss in simple autoencoders

稀疏性与叠加对简单自编码器损失的影响

Mriganka Basu Roy Chowdhury, Eric McLaughlin Weiner

发表机构 * Department of Statistics, UC Berkeley（伯克利大学统计学系）； Department of Materials Science, UC Berkeley（伯克利大学材料科学系）

AI总结研究神经网络中多语义性源于叠加现象，通过数学分析稀疏输入下自编码器的L2重构损失上下界，验证并扩展了Elhage等人的实证结果。

Comments 16 pages, 3 figures

详情

AI中文摘要

神经网络机械可解释性的主要困难之一是出现多语义性，即每个神经元通常负责多个不同任务，阻碍了对其功能的清晰解释。Elhage等人（2022）的开创性论文认为，这是由于叠加现象，即神经网络将不同特征表示为低维空间中的非正交方向，这种策略可以在不牺牲保真度的情况下实现更大的数据压缩，因为输入向量具有特征稀疏性。Elhage等人（2022）在一个相当自然且简单的具有稀疏输入的自编码器中实证验证了这些假设。本文的贡献在于分析叠加现象发生和最优性的数学基础，同时严格证实了他们的一些发现。特别地，我们为幂激活函数提供了L2重构损失的上界和下界，在非常稀疏的情况下是紧的。文末还包含一个简短的开放问题列表。

英文摘要

One of the major difficulties in the mechanistic interpretability of neural networks is the occurrence of polysemanticity, which suggests that each neuron is typically responsible for multiple different tasks, impeding a clean interpretation of their function. The seminal paper of Elhage et al. (2022) argues that this occurs due to superposition, a phenomenon where the neural network represents distinct features as non-orthogonal directions in a lower-dimensional space, a strategy that allows much greater compression of the data without sacrificing fidelity due to the feature sparsity of input vectors. Elhage et al. (2022) empirically validates these hypotheses in a rather natural and simple autoencoder with sparse inputs. The contribution of the present work is to analyze the mathematical basis for the occurrence and optimality of superposition, while rigorously corroborating some of their findings. In particular, we provide upper and lower bounds for the L2 reconstruction loss, tight in the very sparse regime, for power activation functions. A short list of interesting open problems are also included at the end.

URL PDF HTML ☆

赞 0 踩 0

2606.18778 2026-06-18 cs.LG stat.ML 新提交

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

漂移与腐败下基于潜在簇几何的在线分布预测

Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan

发表机构 * Indian Institute of Technology, Bombay（印度理工学院，孟买）

AI总结针对非平稳流中的在线分布预测问题，提出一种基于潜在簇几何的吉布斯准后验方法，通过可逆跳跃MCMC采样变维后验，并引入重启变体应对漂移，在亚线性腐败预算和运输代价下实现亚线性Wasserstein遗憾。

详情

AI中文摘要

非平稳流中的在线学习通常被表述为跟踪点估计，但许多应用需要预测完整的数据生成分布。我们研究漂移和对抗性腐败下的在线分布预测。我们的方法通过潜在簇几何表示每个候选律：一个可变大小的中心配置，组织概率质量并诱导预测分布。这些配置上的吉布斯准后验通过后验平均产生在线预测器，所得变维后验可通过可逆跳跃MCMC采样。因此，该方法避免了指定参数化流律，同时保留了用于不确定性、正则化和比较的结构化潜在空间。我们通过累积Wasserstein-1遗憾相对于时变真实律来评估性能。分析分离了两种效应：腐败扰动基于损失的后验更新，而漂移使长时域后验记忆过时。我们通过一个重启变体来解决后者，该变体在时间上局部化相同的准贝叶斯更新。所得的高概率界分解为PAC-Bayesian复杂度项、腐败敏感的后验扰动项以及由$A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)$驱动的动态最优传输项。在有界支撑、稳定潜在几何、预测映射正则性、预言可实现性、局部化重启窗口、亚线性传输作用和亚线性腐败预算下，重启预测器实现了亚线性累积Wasserstein遗憾。这些保证不需要对流、漂移机制或腐败过程进行参数化建模。

英文摘要

Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distributional prediction under drift and adversarial corruption. Our approach represents each candidate law through a latent cluster geometry: a variable-size configuration of centers that organizes probability mass and induces a predictive distribution. A Gibbs quasi-posterior over these configurations yields an online predictor by posterior averaging, and the resulting variable-dimensional posterior can be sampled with reversible-jump MCMC. The method therefore avoids specifying a parametric streaming law while retaining a structured latent space for uncertainty, regularization, and comparison. We evaluate performance by cumulative Wasserstein-1 regret against the time-varying true law. The analysis separates two effects: corruption perturbs the loss-based posterior update, whereas drift makes long-horizon posterior memory stale. We address the latter with a restarted variant that temporally localizes the same quasi-Bayesian update. The resulting high-probability bounds decompose into a PAC-Bayesian complexity term, a corruption-sensitive posterior perturbation term, and a dynamic optimal-transport term driven by $A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)$. Under bounded support, stable latent geometry, predictive-map regularity, oracle realizability, localized restart windows, sublinear transport action, and sublinear corruption budget, the restarted predictor achieves sublinear cumulative Wasserstein regret. These guarantees require no parametric model for the stream, drift mechanism, or corruption process.

URL PDF HTML ☆

赞 0 踩 0

2606.18834 2026-06-18 cs.LG 新提交

Identifying Structural Biases from Causal Mechanism Shifts

从因果机制变化中识别结构性偏差

Praharsh Nanavati, Jilles Vreeken, David Kaltenpoth

发表机构 * CISPA Helmholtz Center for Information Security（CISPA赫尔姆霍茨信息安全中心）

AI总结提出利用环境间机制变化识别隐藏混淆和选择偏差，基于互信息构建可检验准则，并设计StruBI算法，在合成和真实数据上显著优于现有方法。

详情

AI中文摘要

因果发现方法通常假设所有数据独立同分布（i.i.d.），且系统中没有未测量的变量影响。在实践中，这些假设经常被违反，导致推断不准确。在本文中，我们研究如何从因果机制变化中识别隐藏混淆和选择偏差。特别地，我们表明结构性偏差会导致依赖的机制变化。也就是说，通过考虑在不同环境下的数据中哪些变量的机制发生了变化，我们可以判断哪些变量是无偏的，哪些受到隐藏混淆的影响，哪些正在经历选择偏差。我们将此形式化为一个基于互信息的经验可检验准则，并展示在哪些条件下它能识别结构性偏差。为了判断哪些节点受到何种偏差的影响，我们引入了StruBI算法。在合成和真实数据上的实验表明，StruBI在实践中表现良好，准确恢复了受影响的变量集和偏差类型，以较大优势超越了现有技术水平。

英文摘要

Causal discovery methods commonly assume that all data is independently and identically distributed (i.i.d.) and that there are no unmeasured variables affecting the system. In practice, these assumptions are often violated, leading to inaccurate inference. In this paper, we study how to identify hidden confounding and selection biases from causal mechanism shifts. In particular, we show that structural biases lead to dependent mechanism shifts. That is, by considering for which variables the mechanisms change given data from different environments, we can tell which variables are unbiased, which are subject to hidden confounding, and which are undergoing selection bias. We formalize this into an empirically testable criterion based on mutual information, and show under which conditions it identifies structural biases. To tell which nodes are subject to what kind of bias, we introduce the StruBI algorithm. Experiments on synthetic and real-world data show that StruBI works well in practice, accurately recovering affected variable sets and types of biases, outperforming the state-of-the-art by a wide margin.

URL PDF HTML ☆

赞 0 踩 0

2606.18918 2026-06-18 cs.LG cs.CC 新提交

Some Complexity Results for Robustness Verification for Binarized Neural Networks

二值化神经网络鲁棒性验证的一些复杂性结果

Harshit Goyal, Sudakshina Dutta

发表机构 * Indian Institute of Technology Goa（印度理工学院Goa）

AI总结本文通过从布尔可满足性问题归约证明二值化神经网络的可满足性是NP完全的，并利用均匀遮挡导致的网络输出分段常数结构，提出多项式时间鲁棒性检查算法。

2606.19036 2026-06-18 cs.LG 新提交

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

稀疏混合专家模型中不连续性的几何与随机分析

Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen

发表机构 * Department of Mathematics, National University of Singapore, Singapore（新加坡国立大学数学系）； Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, Ho Chi Minh City, Vietnam（胡志明市技术大学计算机科学与工程学院）

AI总结本文对稀疏混合专家模型中的不连续性进行几何与随机分析，分类不连续阶数，建立渐近体积估计，证明随机路径几乎必然击中一阶不连续，并提出低开销平滑机制以提升性能。

Comments ICML 2026 Spotlight

详情

AI中文摘要

稀疏混合专家（SMoE）架构现已广泛应用于最先进的语言和视觉模型中，其中条件路由允许扩展到非常大的网络。然而，正是这种Top-$k$专家选择使得条件路由成为可能，同时也导致SMoE映射本质上不连续。在这些不连续曲面附近，即使任意接近的输入也可能激活截然不同的专家集，从而产生显著不同的输出。本文对这些不连续性进行了严格的几何和随机分析。首先，我们根据切换事件中并列专家的数量对不连续性进行阶数分类。利用测度论切片论证，我们建立了加厚不连续曲面的渐近体积估计，表明低阶不连续集占主导地位，而高阶不连续集占据的体积相对极小。接着，通过扩散过程对输入空间中的随机扰动建模，我们证明路径最终会遇到不连续，并且首次击中几乎必然发生在阶数为1的不连续上，同时给出了显式的有限时间概率界。我们进一步推导了占据时间界，量化了随机路径在每个不连续阶数邻域内停留的时长。这些理论结果表明输入更可能位于低阶不连续附近。受此启发，我们提出一种简单的平滑机制，可直接应用于现有SMoE，在接近不连续处软性地整合专家；我们的分析保证增加的额外计算开销很小，同时在不连续附近提供局部平滑，跨语言和视觉任务的实验表明，平滑不仅增强了SMoE映射的连续性，还提升了经验性能。

英文摘要

Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that enables conditional routing also renders the SMoE map inherently discontinuous. In the vicinity of these discontinuity surfaces, even inputs that are arbitrarily close may activate substantially different sets of experts resulting in significantly different outputs. In this work we give a rigorous geometric and stochastic analysis of these discontinuities. We first classify them by order, determined by the number of tied experts at a switching event. Using measure-theoretic slicing arguments, we establish asymptotic volume estimates for the thickened discontinuity surfaces, showing that lower-order discontinuity sets dominate, whereas higher-order ones occupy a vanishingly small relative volume. Next, modeling random perturbations in the input space via a diffusion process, we prove that the path eventually encounter a discontinuity, and moreover that the first hit almost surely occurs on an order-1 discontinuity with explicit finite-time probability bounds. We further derive occupation-time bounds that quantify the duration the random path spend in the neighborhoods of each discontinuity order. These theoretical results imply that inputs are more likely to lie near lower order discontinuities. Motivated by this insight, we propose a simple smoothing mechanism that can be directly applied to existing SMoEs, softly incorporating experts near discontinuities; our analysis guarantees that the added computational overhead remains small while providing localized smoothing near discontinuities, and experiments across language and vision tasks show that smoothing not only enforces continuity of the SMoE map but also enhances empirical performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19105 2026-06-18 cs.LG stat.ML 新提交

Smoothness-Based Derandomization of PAC-Bayes Bounds

基于光滑性的PAC-Bayes去随机化

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering（计算机科学与软件工程系）； Université Laval（拉瓦尔大学）

AI总结利用损失和预测器的光滑性，将Gibbs预测器去随机化为后验均值处的确定性预测器，通过Jensen间隙类的Rademacher复杂度控制泛化界，并导出涉及参数雅可比和海森矩阵的正则化器。

详情

AI中文摘要

我们研究光滑损失函数的PAC-Bayes去随机化。我们的目标是通过利用损失和预测器类的光滑性，获得对确定性预测器以高概率成立的泛化界。我们表明，从Gibbs预测器到后验均值处的确定性预测器的转换有一个精确的代价，由Jensen间隙类的泛化间隙给出。我们通过其Rademacher复杂度控制该类，从而得到涉及以参数雅可比和得分图的海森矩阵表示的平坦度量的确定性预测器界。该框架适用于有界和无界光滑损失函数，并将结果专门应用于线性预测器和光滑神经网络。最后，理论中出现的雅可比和海森矩阵量激发了一个实用的正则化器。对于BatchNorm网络，我们通过将BatchNorm变换折叠到相邻的仿射权重中，相对于有效的BatchNorm权重计算该正则化器。在CIFAR-10上的实验说明了该正则化器在不同批量大小下的行为。

英文摘要

We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.

URL PDF HTML ☆

赞 0 踩 0

2606.19145 2026-06-18 cs.LG cs.AI cs.SY eess.SY 新提交

OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical Systems

OrthoReg：混合符号-神经动力系统的正交正则化

Till Richter, Niki Kilbertus

发表机构 * Technical University of Munich（慕尼黑工业大学）； Helmholtz Munich（亥姆霍兹慕尼黑中心）

AI总结针对混合建模中神经部分可能重复学习符号结构导致模型冗余的问题，提出正交正则化方法OrthoReg，直接惩罚符号与神经组件间的重叠，实现互补分解，提升符号恢复和分布外行为。

详情

AI中文摘要

动力系统是建模自然世界的基础，然而建模过程中存在持续的权衡：手动指定的机械模型设计上可解释但通常过于简单且设定错误；相反，灵活的数据驱动神经方法缺乏物理洞察。混合建模旨在通过结合指定的或基于符号的物理组件与灵活的神经网络来兼顾两者优势。然而，一个关键挑战是神经组件可能重新学习机械部分，产生冗余且不可解释的模型，特别是当符号结构本身是从数据中发现时。基于标准$L^2$正则化的现有方法依赖于投影论证，但当符号组件通过稀疏发现学习时，该论证失效，允许神经增强与符号结构重叠。我们引入\textbf{OrthoReg}（正交正则化），直接惩罚符号与神经组件之间的重叠，防止符号结构被神经残差吸收。这产生互补分解：符号部分捕捉库能表达的内容，神经部分捕捉剩余内容。在存在部分库不匹配的基准动力系统上，OrthoReg改善了符号恢复和分布外行为。

英文摘要

Dynamical systems are fundamental to modeling the natural world, yet modeling them involves a persistent trade-off: manually prescribed mechanistic models are interpretable by design but often overly simplistic and misspecified; in contrast, flexible data-driven neural methods lack physical insight. Hybrid modeling aims for the best of both worlds by combining a prescribed or symbolic, physics-based component with a flexible neural network. A critical challenge, however, is that the neural component may relearn mechanistic parts, yielding redundant and uninterpretable models, especially when the symbolic structure itself is discovered from data. Existing methods based on standard $L^2$ regularization rely on a projection argument that breaks when the symbolic component is learned through sparse discovery, allowing the neural augmentation to overlap with symbolic structure. We introduce \textbf{OrthoReg} (Orthogonal Regularization), which directly penalizes overlap between the symbolic and neural components, preventing symbolic structure from being absorbed by the neural residual. This yields a complementary decomposition: the symbolic part captures what the library can express, and the neural part captures what remains. On benchmark dynamical systems with partial library mismatch, OrthoReg improves symbolic recovery and out-of-distribution behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.19179 2026-06-18 cs.LG cs.AI math.OC stat.ML 新提交

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

随机动量方法的计算效率与串行运行时间权衡

Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade

发表机构 * Harvard University（哈佛大学）； Kempner Institute at Harvard University（哈佛大学凯普纳研究所）

AI总结研究随机动量方法（如重球法和加速SGD）在一致线性回归中的批次大小权衡，证明重球法不改善SGD的计算效率前沿但允许更大批次减少串行运行时间，而加速SGD的计算效率与串行运行时间权衡依赖于谱衰减。

详情

AI中文摘要

随机动量方法，如重球法（HB）、Nesterov动量以及加速SGD（ASGD）的变体[Kidambi等人，2018]，在现代训练中被广泛使用，但其随机优势取决于两个不同的量：串行运行时间（达到目标精度所需的迭代次数）和计算效率（CE，总梯度查询或FLOP成本的倒数）。更大的批次在不损害CE的情况下减少串行运行时间，仅当收缩间隙随批次大小线性增长时。我们研究了一致线性回归（具有高斯协变量）的随机HB和ASGD，并证明了其批次大小权衡的有限维离散时间下界。我们的第一个结果表明，HB不会改善任意谱下SGD的CE前沿；相反，它在更大的批次大小窗口内保持SGD级别的CE，允许更大的批次减少串行运行时间，直到HB达到其确定性加速尺度。这个窗口可能比SGD临界批次大小大$\sqrt{\kappa}$倍。对于ASGD，情况更依赖于谱：对于快速衰减的幂律谱，ASGD改善了小批次下的CE（相对于HB/SGD），但随着批次大小增加，它牺牲了这种CE优势以换取改进的串行运行时间。合成线性回归实验验证了这些定性区域，包括慢衰减谱下ASGD和HB的近乎重叠，以及快速衰减谱下预测的CE-串行权衡。

英文摘要

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batches reduce serial runtime without hurting CE only when the contraction gap grows linearly with batch size. We study stochastic HB and ASGD for consistent linear regression with Gaussian covariates and prove finite-dimensional, discrete-time lower bounds on their batch-size tradeoffs. Our first result shows that HB does not improve the CE frontier over SGD for arbitrary spectra; rather, it preserves SGD-level CE over a larger batch-size window, allowing larger batches to reduce serial runtime until HB reaches its deterministic accelerated scale. This window can be a factor $\sqrtκ$ larger than the SGD critical batch size. For ASGD, the picture is more spectrum-dependent: for rapidly decaying power-law spectra, ASGD improves small-batch CE over HB/SGD, but as batch size grows it trades this CE advantage for improved serial runtime. Synthetic linear-regression experiments verify these qualitative regimes, including near-overlap of ASGD and HB for slowly decaying spectra and the predicted CE--serial tradeoff for rapidly decaying spectra.

URL PDF HTML ☆

赞 0 踩 0

2606.18286 2026-06-18 cs.LG 新提交

CODEBLOCK: Learning to Supervise Code at the Right Granularity

CODEBLOCK: 学习在正确的粒度上监督代码

Zhijie Deng, Ling Li, Jinlong Pang, Kaiqin Hu, Qi Xuan, Zhaowei Zhu, Jiaheng Wei

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； UC Santa Cruz（加州大学圣克鲁兹分校）； Ant Group（蚂蚁集团）； BAIA, ZJUT（浙江工业大学智能信息处理实验室）； D5Data.ai

AI总结提出CodeBlock框架，通过选择结构完整的代码块而非孤立token进行稀疏监督，在仅使用1.9%监督token的情况下，在六个代码生成基准上取得优于全token微调的效果。

详情

AI中文摘要

代码大语言模型的监督微调通常对所有响应token应用统一的交叉熵损失，隐含假设每个token提供同等有用的学习信号。最近的token级选择方法通过仅监督高价值token挑战了自然语言SFT中的这一假设。然而，直接将token级掩码迁移到代码可能会破坏语法和语义连贯的程序单元，因为代码依赖于结构完整性和定义-使用关系。因此，我们提出CodeBlock，一个结构感知的稀疏监督框架，选择结构完整的代码证据而非孤立token。CodeBlock首先选择高质量的指令-响应对，然后将代码响应划分为语法连贯的编码项，通过聚合核心逻辑token上的广义交叉熵来估计其效用，并使用数据流可达性和桥接信号重新排序，以优先传播或连接重要程序依赖的块。在训练期间，完整响应仍作为上下文可用，但损失仅应用于选定的代码项和信息性自然语言token。在六个代码生成基准上的实验表明，CodeBlock在仅使用1.9%的监督响应token的情况下，实现了比全tokenSFT和竞争性选择基线更强的平均pass@1。

英文摘要

Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge this assumption in natural-language SFT by supervising only high-value tokens. However, directly transferring token-level masking to code can break syntactically and semantically coherent program units, because code depends on structural completeness and definition-use relations. We therefore propose CodeBlock, a structure-aware sparse supervision framework that selects structure-complete code evidence rather than isolated tokens. CodeBlock first selects high-quality instruction-response pairs, then partitions code responses into syntactically coherent coding items, estimates their utility by aggregating generalized cross-entropy over core logic tokens, and reranks them with data-flow reach and bridge signals to prioritize blocks that propagate or connect important program dependencies. During training, the full response remains available as context, while loss is applied only to selected code items and informative natural-language tokens. Experiments on six code-generation benchmarks show that CodeBlock achieves stronger average pass@1 than full-token SFT and competitive selection baselines, while using only 1.9% of supervised response tokens.

URL PDF HTML ☆

赞 0 踩 0

2606.18304 2026-06-18 cs.LG cs.AI 新提交

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

基于归因引导和覆盖最大化的结构MoE剪枝

Yifu Ding, Jiacheng Wang, Ge Yang, Yongcheng Jing, Jinyang Guo, Xianglong Liu, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Beihang University（北京航空航天大学计算机科学与工程学院）； School of Artificial Intelligence, Beihang University（北京航空航天大学人工智能学院）； Nanyang Technological University（南洋理工大学）

AI总结针对MoE模型专家级剪枝粒度粗、冗余识别不足的问题，提出基于归因引导和覆盖最大化的结构剪枝框架，将剪枝分配转化为通道分数覆盖优化问题，在50%剪枝率下结合4位量化保持精度，内存减少5.27倍。

Comments 9 pages, 5 figures. Submitted to ICML 2026

详情

AI中文摘要

混合专家（MoE）模型在计算上高效扩展，但由于其巨大的内存占用和推理开销，部署成本仍然很高。先前的压缩方法主要在专家级别操作，要么移除整个专家，要么通过粗粒度的重要性分数对专家进行排序。然而，这种专家级别的决策通常过于粗糙，无法捕捉细粒度的冗余，导致剪枝预算分配不当和压缩效果有限。为了解决这个问题，我们观察到MoE专家内的信息高度集中在一小部分通道中，即使在被认为重要的专家中也存在大量冗余。基于这一观察，我们提出了一种针对MoE模型量身定制的结构剪枝框架。我们的方法将剪枝比例分配重新表述为通道分数覆盖最大化问题，并使用基于归因的近似方法高效求解。在DeepSeek和Qwen MoE模型上的实验表明，我们的方法在结合4位量化时，在50%或25%的结构化剪枝下仍能保持模型精度。在Qwen3-30B-A3B上，我们的方法将内存占用减少了5.27倍，并在各种基准测试中持续优于最先进的基线方法。

英文摘要

Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.18431 2026-06-18 cs.LG cs.DC 新提交

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

超越预测：面向LLM推理的尾延迟感知调度

Yueying Li, Yuanfan Chen, Jiayang Chen, Esha Choukse, Haoran Qiu, G. Edward Suh, Rodrigo Fonseca, Ziv Scully, Udit Gupta

发表机构 * Cornell University, Computer Science Department（康奈尔大学计算机科学系）； Cornell University, Electrical and Computer Engineering Department（康奈尔大学电气与计算机工程系）； Cornell University, Operations Research and Information Engineering Department（康奈尔大学运筹学与信息工程系）； Microsoft Azure System Research（微软Azure系统研究）； NVIDIA Corporation（英伟达公司）

AI总结针对LLM推理中长度预测调度在分布偏移和尾延迟控制上的脆弱性，提出无预测的分布感知调度框架，通过轻量统计信号实现软优先级提升，结合缓存感知抢占，在多种工作负载下将P99 TTLT降低35-50%，TTFT降低34-47%。

详情

Journal ref: Forty-Third International Conference on Machine Learning (2026)

AI中文摘要

LLM服务表现出极端的长度可变性，使得基于大小的调度在实践中变得困难。最近的LLM调度器使用预测的解码长度或排名来近似SJF/SRPT，并主要报告均值中心指标如TTFT和TBT。我们表明，这些预测驱动的策略在分布偏移、突发到达和GPU内存压力下可能脆弱，同时对主导用户体验的尾延迟（P90-P99）控制有限，即使拥有完美的解码长度知识。我们引入了一个分布感知、无预测的调度框架，用由轻量统计信号驱动的软优先级提升取代显式长度预测。我们的设计协同优化调度和缓存感知抢占，以考虑跨工作负载混合的内存耦合解码动态。在生产环境和开源轨迹上的评估表明，相对于具有完美长度知识的SRPT，我们的方法将P99 TTLT降低了高达35-50%，并在各种工作负载（包括推理密集型和聊天密集型任务）上将TTFT降低了34-47%。这些结果证明了在在线LLM服务中优化尾延迟的稳健替代方案。

英文摘要

LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such as TTFT and TBT. We show that these prediction-driven policies can be fragile under distribution shifts, bursty arrivals, and GPU memory pressure, while offering limited control over the tail latency (P90-P99) that dominates user experience, even with perfect decode-length knowledge. We introduce a distribution-aware, prediction-free scheduling framework that replaces explicit length prediction with soft priority boosting driven by lightweight statistical signals. Our design co-optimizes scheduling and cache-aware preemption to account for memory-coupled decode dynamics across workload mixes. Evaluated on production and open-source traces, our method reduces P99 TTLT by up to 35-50% relative to SRPT with perfect length knowledge and reduces TTFT by 34-47% across workloads, including reasoning-heavy and chat-heavy tasks. These results demonstrate a robust alternative for optimizing tail latency in online LLM serving.

URL PDF HTML ☆

赞 0 踩 0

2606.18650 2026-06-18 cs.LG 新提交

FoMoE: 打破全副本壁垒的专家混合联邦系统

Lorenzo Sani, Zeyu Cao, Meghdad Kurmanji, Alex Iacob, Andrej Jovanovic, Yan Gao, Wanru Zhao, Nicholas D. Lane

发表机构 * DeepSeek-AI

AI总结提出FoMoE系统，通过跨工作节点分区专家层打破全副本范式，结合部分专家复制和跳跃令牌机制，显著降低通信开销并提升吞吐量。

详情

AI中文摘要

预训练大型语言模型（LLMs）通常需要大规模基础设施，配备紧密耦合的硬件加速器。虽然增加模型和数据集规模仍是性能的主要驱动力，但专家混合（MoE）架构最近通过将参数数量与计算成本解耦，取得了最先进的结果。这种效率使得在受限计算预算下训练大规模模型成为可能，但通常需要单个数据中心的高速互连。为了克服这些物理限制，最近的方法如DiLoCo和Photon使用低通信数据并行方法，使得能够在地理分布、弱连接的数据中心之间进行扩展。然而，这些方法存在根本性的低效问题：它们需要在每个站点拥有完整的模型副本，这带来了高昂的内存约束和通信开销。在这项工作中，我们引入了FoMoE，一个通过跨工作节点分区专家层来打破全副本范式的系统。我们证明FoMoE：（I）通过部分专家复制，在所研究的场景中，相比高效基线降低了高达1.42倍的通信成本，相比DDP降低了45.44倍；（II）通过一种新颖的跳跃令牌机制，实现了高达1.4倍的经验吞吐量加速；（III）在训练代理场景中展示了稳定的路由，并通过系统建模将通信/内存优势推广到100B规模的配置。

英文摘要

Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance, Mixture-of-Experts (MoEs) architectures have recently achieved state-of-the-art results by decoupling parameter count from computational cost. This efficiency enables training massive models on constrained compute budgets, yet it typically requires the high-speed interconnects of a single datacenter. To overcome these physical limits, recent approaches such as DiLoCo and Photon use low-communication data-parallel methods to enable scaling across geographically distributed, weakly connected data centers. However, these methods suffer from a fundamental inefficiency: they require full model replicas at every site, which imposes prohibitive memory constraints and communication overheads. In this work, we introduce FoMoE, a system that breaks the full-replica paradigm by partitioning expert layers across workers. We demonstrate that FoMoE: (I) reduces communication costs by up to 1.42x over efficient baselines and 45.44x over DDP via partial expert replication in the studied regimes; (II) achieves empirical throughput speedups of up to 1.4x through a novel skip-token mechanism; and (III) shows stable routing in the trained proxy regimes and projects the communication/memory benefits to 100B-scale configurations through system modelling.

URL PDF HTML ☆

赞 0 踩 0

2606.19150 2026-06-18 cs.LG 新提交

Complementary Attention Head Pruning for Efficient Transformers

互补注意力头剪枝用于高效Transformer

Yaniv Livertovsky, Shahar Somin, Gonen Singer

发表机构 * Bar-Ilan University（巴伊兰大学）

AI总结提出CAHP框架，将注意力头选择建模为全局图论问题，通过图聚类和信息论距离保留互补头，自动确定剪枝数量，在SST-5和MNLI上优于现有方法。

Comments 9 pages, 4 figures, 3 tables. Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2026

详情

AI中文摘要

基于Transformer的模型在自然语言处理中的显著成功源于架构的规模化，这导致大量参数并阻碍了在资源受限环境中的部署。虽然结构化剪枝提供了一条压缩路径，但现有的最先进方法通常依赖于基于梯度的重要性排序或随机门控，这些方法存在不稳定性、结构退化以及需要大量手动超参数调整的问题。在本文中，我们引入了CAHP（互补注意力头剪枝），一种新颖的事后框架，将头选择重新定义为全局图论问题。CAHP不是孤立地评估头，而是利用基于图的聚类结合信息论距离度量来识别并保留一组拓扑多样化的互补注意力头。无需预定义稀疏度或剪枝比例，该框架通过识别递减的边际性能曲线自动确定各层中保留的注意力头数量，其中根据所选多项式次数，剪除额外头会导致性能急剧下降。在SST-5和MNLI基准上跨不同Transformer模型规模的广泛评估表明，CAHP始终优于竞争基线，特别是在高压缩率情况下。此外，我们的结构分析表明，CAHP避免了基于梯度的剪枝方法的“邻近偏差”（倾向于主要保留靠近输出层的头），而是保留了模型中间层中功能关键的注意力头集合。

英文摘要

The remarkable success of Transformer-based models in natural language processing stems from architectural scaling, which leads to a large number of parameters and hinders deployment in resource-constrained environments. While structured pruning offers a pathway to compression, existing state-of-the-art methods often rely on gradient-based importance ranking or stochastic gating, which suffer from instability, structural degeneration, and the need for extensive manual hyperparameter tuning. In this paper, we introduce CAHP (Complementary Attention Head Pruning), a novel post-hoc framework that redefines head selection as a global graph-theoretical problem. Rather than evaluating heads in isolation, CAHP utilizes graph-based clustering combined with information-theoretic distance measures to identify and preserve a topologically diverse subset of complementary attention heads. Without requiring a predefined sparsity level or pruning ratio, the framework automatically determines the number of selected attention heads across layers by identifying a diminishing marginal performance curve, where pruning additional heads leads to a sharp degradation in performance, as determined by the chosen polynomial degree. Extensive evaluations on the SST-5 and MNLI benchmarks, across different Transformer model scales, demonstrate that CAHP consistently outperforms competitive baselines, particularly in high-compression regimes. Furthermore, our structural analysis shows that CAHP avoids the "proximity bias" of gradient-based pruning methods, which tend to preserve heads mainly in layers close to the output, and instead retains a functionally critical set of attention heads in the model's intermediate layers.

URL PDF HTML ☆

赞 0 踩 0

2606.16290 2026-06-18 cs.LG cs.AI 新提交

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

一种经济实惠的硬件感知神经架构搜索，用于在超低功耗计算平台上部署卷积神经网络

Andrea Mattia Garavagno, Edoardo Ragusa, Antonio Frisoli, Paolo Gastaldo

发表机构 * University of Genoa（热那亚大学）； Scuola Superiore Sant’Anna（圣安娜高等研究学院）

AI总结提出一种轻量级硬件感知神经架构搜索方法，生成可在超低功耗微控制器上运行的微型CNN，在保持分类精度的同时降低搜索成本。

详情

DOI: 10.1109/LSENS.2024.3387056
Journal ref: IEEE Sensors Letters, vol. 8, no. 5, pp. 1-4, May 2024

AI中文摘要

硬件感知神经架构搜索（HW-NAS）通过自动设计能够满足预置硬件约束的神经架构，使得卷积神经网络（CNN）能够集成到微控制器设备中。然而，最先进的HW-NAS针对的是高性能微控制器，其功耗无法满足传感节点的要求。本文提出了一种HW-NAS方法，生成可在超低功耗微控制器上运行的微型CNN，其搜索过程轻量级，甚至可以在嵌入式设备上执行。在三个著名的微型计算机视觉基准测试上的实证结果表明，所提出的HW-NAS能够在保持最先进分类精度的同时生成微型CNN。

英文摘要

Hardware-aware neural architecture search (HW-NAS) allows the integration of Convolutional Neural Networks (CNNs) in microcontrollers devices by automatically designing neural architectures that can fit prearranged hardware constraints. However, state-of-the-art HW-NAS target high-performance microcontrollers, whose power consumption does not meet sensing nodes requirements. This work presents a HW-NAS generating tiny CNNs that can run on ultra-low-power microcontrollers, featuring a lightweight search procedure enabling its execution even on embedded devices. Empirical results on three well-known benchmarks for tiny computer vision proved that the proposed HW-NAS was able to generate tiny CNNs while preserving state-of-the-art classification accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.18309 2026-06-18 cs.LG cs.AI 新提交

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE: 保留感知的最终遗忘向量事后净化

Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang

发表机构 * Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University（上海交通大学图像处理与模式识别研究所）

AI总结提出SAGE方法，通过事后净化最终更新向量，在不重新运行原始遗忘流程的情况下，缓解大语言模型遗忘与保留能力之间的权衡。

详情

AI中文摘要

大语言模型（LLM）遗忘旨在移除不良知识或行为，同时保留已有能力。当前的遗忘方法都涉及遗忘与保留之间的权衡。我们发现，保留激活偏差也可用于量化遗忘方法对保留造成的损害，而无需考虑遗忘过程的具体实现。这使得我们能够通过事后方法恢复任何遗忘方法的保留性能。因此，我们提出一种互补的事后设置，在不重新运行原始遗忘流程的情况下净化最终更新向量。在该设置中，我们设计了SAGE（光谱激活-几何净化），一种对最终遗忘更新的源无关修正。SAGE从一个小型保留代理收集真实模块输入，提取其主导激活几何结构，并求解一个闭式源锚定优化目标，该目标抑制与高能保留方向对齐的更新分量，同时保留源方法的遗忘载体。在多种遗忘方法、模型规模和基准测试中，SAGE持续缓解保留-遗忘权衡，将最终向量的事后净化识别为机器遗忘中一个实用且未被充分探索的维度。

英文摘要

Large Language Model (LLM) unlearning aims to remove undesirable knowledge or behaviors while preserving retained capabilities. Current unlearning methods all involve a trade-off between unlearning and retention. We have found that the retention activation bias can also be used to quantify the damage an unlearning method inflicts on retention, without considering the specific implementation of the unlearning process. This allows us to restore retention performance for any unlearning method using a post-hoc approach. Therefore, we propose a complementary post-hoc setting to sanitize the final update vector without rerunning the original unlearning pipeline. In this setting, we design SAGE, Spectral Activation-GEometry Sanitization, a source-agnostic correction for final unlearning updates. SAGE collects real module inputs from a small retain proxy, extracts their dominant activation geometry, and solves a source-anchored optimization objective in closed form, which suppresses update components aligned with high-energy retained directions while preserving the source method's forgetting carrier. Across multiple unlearning methods, model scales, and benchmarks, SAGE consistently relieves the retain-forget trade-off, identifying post-hoc sanitization of final vectors as a practical and underexplored axis for machine unlearning.

URL PDF HTML ☆

赞 0 踩 0

2606.18384 2026-06-18 cs.LG cs.DC 新提交

面向网络入侵数据集的XGBoost模型机器遗忘

Diana Magalhães, Eva Maia, João Vitorino, Isabel Praça

发表机构 * GECAD, ISEP, Polytechnic of Porto（波尔图理工学院工程学院GECAD研究所）

AI总结针对XGBoost模型提出XGBoost-Forget遗忘方法，在表格型网络入侵数据集上实现高效遗忘，保持模型性能的同时显著提升遗忘速度。

Comments 12 pages, 7 tables, WorldCist'26 Conference

2606.19222 2026-06-18 cs.LG cs.AI 新提交

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

机制引导的选择性遗忘：针对RLVR诱导的推理

Chenyu Zhou, Qiliang Jiang, Shuning Wu, Xu Zhou

发表机构 * School of Engineering, Institute of Science Tokyo, Japan（东京科学大学工学院）； College of Control Science and Engineering, Zhejiang University, China（浙江大学控制科学与工程学院）； Department of Electrical and Computer Engineering, National University of Singapore, Singapore（新加坡国立大学电气与计算机工程系）

AI总结提出MAST方法，通过机制引导选择性更新参数，在遗忘RLVR诱导的推理行为时，显著降低对保留性能的附带损害。

Comments 15 pages, 4 figures, 7 tables

详情

AI中文摘要

我们提出MAST（机制对齐选择性目标），一种机制引导的方法，用于遗忘RLVR诱导的推理，其附带损害远低于标准全参数更新。在Qwen2.5-Math-1.5B和Qwen3-1.7B-Base的匹配SFT/RLVR检查点上，SFT到RLVR的增量在token级delta-log-probability上与SFT更新显著不同，而全参数梯度上升仅通过破坏保留的MATH和GSM8K来实现遗忘。MAST根据离主能量、更新幅度和遗忘梯度耦合幅度对注意力投影张量进行排序，然后仅更新排名最高的子集。在主模型上，MAST诱导了统计上显著的目标遗忘（MATH遗忘从45/150降至37/150；McNemar p=0.0078），同时保留了GSM8K（+0.8个百分点）和MATH保留（-0.5个百分点）。该优势在不同种子、NPO/SimNPO目标以及Qwen3上均得到复现，在Qwen3上MAST保留了GSM8K，而全参数遗忘导致其崩溃。

英文摘要

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.

URL PDF HTML ☆

赞 0 踩 0

2606.19262 2026-06-18 cs.LG 新提交

Detecting Hidden ML Training With Zero-Overhead Telemetry

使用零开销遥测检测隐藏的机器学习训练

Robi Rahman, Sabiha Tajdari

发表机构 * Machine Intelligence Research Institute（机器智能研究所）； University of Virginia（弗吉尼亚大学）

AI总结本文评估了仅使用零开销、隐私保护的NVML遥测（内容无关信号）对GPU工作负载分类的对抗鲁棒性，开发了一个分类器，在识别训练工作负载时达到98.2%的二元准确率，并对最具挑战性的意外工作负载达到43-87%的准确率。

Comments Technical AI Governance Research workshop at ICML 2026

2606.18322 2026-06-18 cs.LG cs.AI 新提交

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

SAE干预不可靠：干预后抑制行为的恢复

Mingyue Cui, Linghui Shen, Xingyi Yang

发表机构 * The Hong Kong Polytechnic University（香港理工大学）

AI总结研究发现稀疏自编码器（SAE）特征干预虽能抑制行为，但存在可恢复的失败模式，通过优化残差扰动可恢复原始行为，揭示特征级控制与行为完整性之间的差距。

Comments Code: https://github.com/Mingyuee88/sae-post-intervention-recovery, Project page: https://mingyuee88.github.io/sae-post-intervention-recovery/

详情

AI中文摘要

稀疏自编码器（SAE）将残差流激活分解为可解释特征。最近的潜在空间防御越来越依赖这些分解，假设识别出的“不安全”SAE特征可作为监控和干预的可操作手柄。在这种范式下，固定特定有害特征预期能可靠地防止模型不当行为。然而，我们表明这种成功可能隐藏一种可恢复的失败模式：固定可能阻止行为的一条可见路径，但并未消除行为本身。我们将这种脆弱性形式化为干预后恢复，这是一个受约束的残差空间优化问题。从干预后的残差状态开始，我们优化残差扰动以恢复干预前的行为，同时保持目标SAE特征的干预后值。即使在干预在优化和生成过程中保持活跃的强威胁模型下，恢复仍然可能。为了排除恢复仅仅是撤销干预的可能性，我们使用编码器正交更新进行单层干预，并在跨层设置中使用相应的特征图雅可比矩阵。在TPP、遗忘、IOI和拒绝引导实验中，这种压力测试揭示了尽管特征级干预成功，行为仍可恢复。特别是在安全关键的拒绝引导设置中，我们在有效样本上实现了95.8%的恢复率，同时将防御特征的相对漂移保持在0.131，远低于基于后缀的基线。恢复路径归因分析进一步将这种恢复定位到SAE重建残差，即SAE未解释的组件。这些结果暴露了特征级控制与行为完整性之间的差距：SAE特征可以支持因果干预，但控制它们并不能保证对底层行为的控制。

英文摘要

Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping a specific harmful feature is expected to reliably prevent model misbehavior. However, we show that this success may hide a recoverable failure mode: the clamp may block one visible route to a behavior without eliminating the behavior itself. We formulate this vulnerability as post-intervention recovery, a constrained residual-space optimization problem. Starting from the post-intervention residual state, we optimize residual perturbations to recover the pre-intervention behavior while preserving the post-intervention values of the targeted SAE features. Even under a strong threat model where the intervention remains active throughout optimization and generation, recovery remains possible. To rule out that recovery simply undoes the intervention, we use encoder-orthogonal updates for single-layer interventions and the corresponding feature-map Jacobian in the cross-layer setting. Across TPP, unlearning, IOI, and refusal steering experiments, this stress test reveals recoverable behavior despite successful feature-level intervention. Especially in the safety-critical refusal-steering setting, we achieve a 95.8% recovery rate on valid samples while keeping defended-feature relative drift to 0.131, substantially below suffix-based baselines. A recovery-path attribution analysis further localizes this recovery to the SAE reconstruction residual, the component left unexplained by the SAE. These results expose a gap between feature-level control and behavioral completeness: SAE features can support causal intervention, but controlling them does not guarantee control over the underlying behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.18418 2026-06-18 cs.LG 新提交

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

P$^2$CE: 模型无关的可行帕累托最优反事实解释

Arthur Hendricks Mendes de Oliveira, Giovani Valdrighi, Marcos Medeiros Raimundo

AI总结提出P$^2$CE算法，利用隔离森林异常检测和SHAP值，生成可行且帕累托最优的反事实解释，平衡可行性、合理性和计算效率。

Comments Under review in the Machine Learning journal

详情

AI中文摘要

机器学习算法在社会应用中的日益普及引发了对公平性和透明度的担忧，从而推动了反事实解释的发展。这些解释通过提供可操作的输入特征更改，帮助个人理解并可能改变在贷款申请、工作选择等领域的不利决策。现有方法往往难以平衡可行性、合理性和计算效率。为此，我们提出了P$^2$CE，一种生成可行帕累托最优反事实解释的算法，为用户提供不同可行性概念之间的多样化最优权衡。P$^2$CE使用辅助隔离森林异常检测器确保解释符合数据分布，并利用SHAP值在短时间内获得最优结果，与底层模型无关。我们在三个数据集上进行了实证评估，结果表明，与相关技术相比，该算法在解决方案质量和计算效率方面均表现出优越性能。

英文摘要

The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

URL PDF HTML ☆

赞 0 踩 0

2606.18430 2026-06-18 cs.LG cs.CR 新提交

使用Tsetlin机器的目标置信度追索：TRUST

K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats

发表机构 * Group Research and Development Det Norske Veritas (DNV)（挪威船级社（DNV）集团研发部）

AI总结提出TRUST框架，通过概率Tsetlin机器和贝叶斯优化直接搜索满足用户指定置信度目标的最小输入变化，生成更稳健和可解释的反事实解释。

详情

AI中文摘要

反事实解释被广泛用于高风险决策系统中的算法追索。大多数现有方法寻求最小化改变输入以翻转模型决策。然而，决策者通常不仅依赖预测标签，还依赖置信度阈值和风险边际。刚好越过决策边界的反事实在噪声或模型变化下可能脆弱且不稳定。本文提出使用Tsetlin机器的目标置信度追索（TRUST），一种用户明确指定追索所需预测置信度的框架。TRUST不是先生成反事实再评估置信度，而是直接搜索满足用户定义置信度目标的最小变化，从而在成本、置信度和鲁棒性方面比较追索选项。我们使用概率Tsetlin机器（PTM）结合贝叶斯优化实例化TRUST。PTM基于概率子句的结构将预测置信度与决策规则的稳定性联系起来。我们表明，满足相同规则的反事实在可靠性上可能差异很大，取决于它们满足这些规则的安全程度，揭示了决策是由稳健还是脆弱的子句激活支持的。在合成和真实数据集上的实验表明，目标置信度反事实比传统的基于边界的方法产生更稳健和可解释的追索。在多个基准测试中，TRUST实现了完美的鲁棒性，同时保持较低的追索成本，包括在Haberman数据集上以0.92置信度达到0.10的L2距离。通过显式控制置信度和暴露规则级稳定性，TRUST为高风险决策支持提供了可操作的追索。

英文摘要

Counterfactual explanations are widely used to provide algorithmic recourse in high-stakes decision-making systems. Most existing methods seek the smallest change to an input that flips a model's decision. However, decision-makers often rely not only on predicted labels but also on confidence thresholds and risk margins. Counterfactuals that barely cross a decision boundary can be fragile and unstable under noise or model variation. In this paper, we propose Target-confidence Recourse Using tSeTlin machines (TRUST), a framework in which users explicitly specify the desired prediction confidence for recourse. Rather than generating counterfactuals and evaluating confidence afterward, TRUST directly searches for minimal changes that satisfy a user-defined confidence target, enabling comparison of recourse options in terms of cost, confidence, and robustness. We instantiate TRUST using a Probabilistic Tsetlin Machine (PTM) combined with Bayesian optimization. The probabilistic clause-based structure of PTM links prediction confidence to the stability of decision rules. We show that counterfactuals satisfying the same rules can still differ substantially in reliability depending on how securely they satisfy those rules, revealing whether decisions are supported by robust or fragile clause activations. Experiments on synthetic and real-world datasets demonstrate that target-confidence counterfactuals produce more robust and interpretable recourse than conventional boundary-based approaches. Across multiple benchmarks, TRUST achieves perfect robustness while maintaining low recourse cost, including an L2 distance of 0.10 on the Haberman dataset at 0.92 confidence. By explicitly controlling confidence and exposing rule-level stability, TRUST provides actionable recourse for high-stakes decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.18839 2026-06-18 cs.LG cs.CV 新提交

Semantic Robustness Certification for Vision-Language Models

视觉语言模型的语义鲁棒性认证

Peiyu Yang, Paul Montague, Feng Liu, Andrew C. Cullen, Amardeep Kaur, Christopher Leckie, Sarah M. Erfani

发表机构 * School of Computing \& Information Systems, University of Melbourne, Australia

AI总结提出首个无需额外数据即可认证视觉语言模型在语义层面（如形状、大小、风格）鲁棒性的框架，通过文本提示作为语义代理并量化决策边界，确保预测类别在语义变换下不变。

Comments Accepted to ICML

详情

AI中文摘要

视觉语言模型（VLM）现在被广泛用于下游任务。然而，现实世界的应用常常使VLM面临由语义变化（例如形状、大小和风格）引起的分布偏移。鲁棒性认证确定当对输入应用变换时模型的预测是否改变。虽然大多数认证框架研究输入的几何或像素级变换，但本文提出了一种新颖的框架，能够在语义级变换下认证VLM的鲁棒性。利用VLM的开放词汇能力，我们使用文本提示作为语义代理来构建由控制语义变化程度的范围参数化的变换。通过以封闭形式表征VLM决策边界，我们的框架定量地认证了在语义变换下预测类别保持不变的范围区间。我们的框架是第一个在语义级变化下认证VLM鲁棒性而无需为每种变化提供额外数据的框架，使其易于应用。在合成数据和真实数据上的实验表明，我们的框架能够在各种场景下认证针对多种语义变化的鲁棒性。

英文摘要

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.18867 2026-06-18 cs.LG cs.CY stat.ML 新提交

Strategic Feature Selection

战略特征选择

Jivat Neet Kaur, Pratik Patil, Divya Shanmugam, Emma Pierson, Michael I. Jordan, Nika Haghtalab, Meena Jagadeesan, Ahmed Alaa, Serena Wang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Texas, Austin（德克萨斯大学奥斯汀分校）； Cornell Tech（康奈尔科技）； Stanford University（斯坦福大学）； University of Pennsylvania（宾夕法尼亚大学）； Harvard University（哈佛大学）； Inria, Paris（巴黎Inria）

AI总结研究通过特征选择和岭正则化应对战略操纵的分类问题，发现仅基于可操纵性排除特征通常次优，提出联合优化特征集与正则化水平的算法，并在医疗支付基准上验证。

详情

AI中文摘要

当算法预测器在高风险领域（如医疗）中指导资源分配时，这些预测器必须考虑输入特征的战略操纵。典型的解决方案是重新设计预测器本身以明确考虑战略互动。然而在实践中，决策者通常受限于调整现有预测管道中的粗粒度杠杆。例如，医疗组织通常根据感知的可操纵性选择排除哪些特征，同时使用标准正则化程序来收缩保留特征的系数。在这项工作中，我们通过特征选择及其与岭正则化的相互作用，启动了对战略分类的形式化研究。我们的主要发现是，仅基于可操纵性排除单个特征通常是次优的。我们提供了在最优正则化下特征子集性能的细粒度刻画，为政策设计提供了新的见解。受此刻画启发，我们开发了一种实用算法，用于联合选择特征集和岭正则化水平。通过一个关于医疗支付基准的真实世界案例研究，我们说明了我们的算法如何指导实践中粗粒度政策杠杆的设计。我们的结果为减轻算法决策系统中战略行为的影响提供了一个有原则的、实用的框架。

英文摘要

When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

URL PDF HTML ☆

赞 0 踩 0

2606.18317 2026-06-18 cs.LG 新提交

Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

使用K跳高斯扩散增强图神经网络

Xuling Zhang, Peng Wang, Daiyan Li, Aoran Huang, Zeiwei Chen, Yongkui Yang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（中国科学院深圳先进技术研究院）； Southern University of Science and Technology（南方科技大学）

AI总结提出K跳高斯扩散核作为预处理模块，通过多跳扩散和高斯权重平衡局部与全局信息，在噪声或结构复杂图中优于传统消息传递和现有扩散方法。

Comments 5page, 3 figures

详情

DOI: 10.1109/ICASSP55912.2026.11462070

AI中文摘要

大多数图神经网络核心依赖于图卷积，通常实现为直接（单跳）邻居之间的消息传递。在许多现实世界的图中，边可能带有噪声或定义不明确，限制了信息传播到局部邻域。现有的扩散核，如个性化PageRank和热核，通过全局传播缓解了这个问题，但仍然难以处理复杂的局部结构和远距离节点噪声。为了解决这些限制，我们提出了一种K跳高斯扩散核作为图数据的预处理模块。KHG引入了多跳扩散，并对远程节点进行高斯加权，在应用标准GNN之前平衡局部和全局信息传播。在多个基准数据集上的实验表明，KHG显著优于传统的消息传递GNN，以及PPR和热核扩散，特别是在噪声或结构复杂的图中。

英文摘要

Most graph neural network (GNN) cores rely on graph convolutions, typically implemented as message passing between direct (single-hop) neighbors. In many real-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods. Existing diffusion kernels, such as Personalized PageRank (PPR) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise. To address these limitations, we propose a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph data. KHG introduces multi-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.18444 2026-06-18 cs.LG cs.AI 新提交

TMR-GGNN: Credit Card Fraud Detection based on Time-Aware Multi-Relational Guided Graph Neural Network

TMR-GGNN：基于时间感知多关系引导图神经网络的信用卡欺诈检测

Rohit Tewari, Shubhankar Shilpi, Navin Chhibber, Devendra Singh Parmar, Sunil Khemka, Piyush Ranjan

发表机构 * Unysis Truist Banks Infinity Tech Group Technical Product（Unysis 信任银行 Infinity 技术集团技术产品）； Fairfax, USA（美国费尔法克斯）； Atlanta, USA（美国亚特兰大）； Sunnyvale, USA（美国 Sunnyvale）； Persistent Systems IEEE Vice Chair AeroSpace Chapter（Persistent 系统 IEEE 副主席航空航天分会）； Discover Financial Services（Discover 金融服务）； Edison, USA（美国埃迪森）

AI总结提出TMR-GGNN框架，通过时间窗口内异构实体交互建模、动态多关系图构建、时间感知注意力机制和对比学习解码器，结合InfoNCE与Focal Loss复合损失函数，解决数据不平衡和欺诈模式演化问题。

Comments 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON), Pages 7

详情

AI中文摘要

近年来，由于高度不平衡的数据、不断演变的欺诈模式以及交易实体间复杂的关联结构，信用卡欺诈检测面临重大挑战。为解决这些问题，本研究提出了一种名为时间感知多关系引导图神经网络（TMR-GGNN）的新框架。具体而言，所提出的TMR-GGNN通过建模客户、商户、设备和IP在时间窗口内的异构交互，扩展了编码器-解码器图神经网络（GNN）架构。随后，该TMR-GGNN方法构建了一个动态的多关系图，并在编码器中引入时间感知关系注意力机制，以基于时间邻近性和语义上下文自适应地权衡交易相关性。因此，解码器采用对比学习模块来区分真实和合成的交易模式，同时提高模型对罕见欺诈案例的泛化能力。此外，为有效管理严重的类别不平衡并强调判别性学习，引入了结合基于信息噪声对比估计（InfoNCE）的对比损失与Focal Loss的复合损失函数。这种集成有助于改进欺诈识别，同时减少假阴性。

英文摘要

In recent years, credit card fraud detection has faced significant challenges due to highly imbalanced data, evolving fraud patterns, and complex relational structures among transaction entities. To address these issues, this research proposes a novel framework called Timeaware Multi Relational Guided Graph Neural Network (TMR GGNN). Particularly, the proposed TMR GGNN extends the encoder decoder Graph Neural Network GNN architecture by modeling heterogeneous interactions across customers, merchants, devices, and IPs over temporal windows. Subsequently, the proposed TMR GGNN approach constructs a dynamic, multi relational graph and incorporates a time aware relational attention mechanism within the encoder to adaptively weigh the transaction relevance based on temporal proximity and semantic context. Consequently, the decoder employs a contrastive learning module to distinguish between real and synthesized transaction patterns, while improving the models generalization of rare fraud cases. Additionally, to effectively manage severe class imbalances and emphasize discriminative learning, a composite loss function combining Information Noise Contrastive Estimation (InfoNCE) based contrastive loss with Focal Loss is introduced. This integration assists in improving fraud identification while mitigating false negatives.

URL PDF HTML ☆

赞 0 踩 0

2606.18621 2026-06-18 cs.LG 新提交

Towards Anomaly Detection on Relational Data

面向关系数据的异常检测

Shiyuan Li, Yunfeng Zhao, Yue Tan, Qingfeng Chen, Yixin Liu, Shirui Pan

发表机构 * Griffith University（格里菲斯大学）； Guangxi University（广西大学）

AI总结提出RelAD框架，通过条件稀疏门控属性重建和双视图多关系边重建，有效检测关系数据中的属性异常和连接模式异常，在6个基准数据集上优于现有方法。

详情

AI中文摘要

关系数据库广泛应用于现实系统中管理结构化数据。从这类关系数据中检测异常对于识别欺诈、风险和异常行为至关重要，但尚未得到充分探索。关键挑战在于关系数据的内在复杂性：多表属性是高维且异质的，使得稀疏的异常线索容易被正常或无关信息淹没；异常还可能表现为跨不同外键关系的异常连接模式，而现有的表格和图异常检测方法难以捕捉。为解决这些问题，我们提出RelAD，一个基于重建的框架，从属性和关系边重建中捕捉异常。RelAD包含两个核心模块：条件稀疏门控属性重建，抑制冗余的多表属性并强调异常语义块；以及双视图多关系边重建，从内在和行为实体画像中检测关系特定的异常连接。得到的属性和关系信号通过轻量级融合模块整合，产生最终异常分数。我们进一步构建了6个具有系统性异常的基准数据集，大量实验表明RelAD在取得竞争性效率的同时，始终优于其他基线方法。

英文摘要

Relational databases are widely used for managing structured data in real-world systems. Detecting anomalies from such relational data is crucial for identifying fraud, risks, and abnormal behaviors, yet remains under-explored. The key challenges lie in the intrinsic complexity of relational data: multi-table attributes are high-dimensional and heterogeneous, making sparse abnormal clues easy to overwhelm by normal or irrelevant information; and anomalies may further manifest as abnormal connection patterns across different foreign-key relations, which existing tabular and graph anomaly detection methods are ill-suited to capture. To address them, we propose RelAD, a reconstruction-based framework that captures anomalies from both attribute and relational edge reconstruction. RelAD contains two core modules: conditional sparse-gated attribute reconstruction, which suppresses redundant multi-table attributes and emphasizes abnormal semantic blocks, and dual-view multi-relational edge reconstruction, which detects relation-specific abnormal connections from both intrinsic and behavioral entity profiles. The resulting attribute and relational signals are integrated through a lightweight fusion module to produce the final anomaly score. We further construct 6 benchmark datasets with systematic anomalies, on which extensive experiments show that RelAD consistently outperforms other baselines while achieving competitive efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.19185 2026-06-18 cs.LG 新提交

AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

AGDN：利用各向异性图扩散网络学习求解旅行商问题

Bolin Shen, Ziwei Huang, Zhiguang Cao, Yushun Dong

发表机构 * Florida State University（佛罗里达州立大学）； Singapore Management University（新加坡管理大学）

AI总结提出各向异性图扩散网络（AGDN），通过MixScore转移矩阵和各向异性扩散策略，有效利用图结构信息求解旅行商问题，在多种实例规模和分布上优于现有方法。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3817789

AI中文摘要

旅行商问题（TSP）是组合优化的基石，出现在许多实际场景中。尽管基于图的学习方法已被探索用于TSP，但如何更有效地利用图结构的问题仍然悬而未决。我们提出了各向异性图扩散网络（AGDN），一种新的图神经网络框架，旨在求解TSP。我们的方法解决了两个核心难点：（1）完全连接TSP图中缺乏信息丰富的拓扑先验，以及（2）在常用的图稀疏化技术后，最优解中丢失连接节点。为了克服这些问题，我们构建了一个MixScore转移矩阵，将节点相似性与成对距离相结合，并开发了一种各向异性图扩散策略，支持跨多跳的高效信息交换。涵盖不同实例规模和节点分布的全面实验表明，AGDN在保持计算时间竞争力的同时，始终优于现有方法。此外，AGDN能够很好地泛化到训练期间未见的问题规模和分布。实现代码已公开在：this https URL。

英文摘要

The Traveling Salesman Problem (TSP) is a cornerstone of combinatorial optimization and arises in many practical scenarios. Although graph-based learning approaches have been explored for TSP, the question of how to exploit graph structure more effectively remains open. We present the Anisotropic Graph Diffusion Network (AGDN), a new Graph Neural Network framework designed to solve TSP. Our method tackles two central difficulties: (1) the lack of informative topological prior in fully connected TSP graphs, and (2) losing connected nodes in the optimal solution after the commonly used graph sparsification techniques. To overcome these issues, we construct a MixScore transition matrix that merges node similarity with pairwise distance, and we develop an anisotropic graph diffusion strategy that supports efficient information exchange across multiple hops. Comprehensive experiments spanning diverse instance sizes and node distributions show that AGDN consistently outperforms existing methods while keeping computation time competitive. Furthermore, AGDN generalizes well to problem sizes and distributions beyond those seen during training. The implementation is publicly available at: https://github.com/LabRAI/AGDN.

URL PDF HTML ☆

赞 0 踩 0

2606.19303 2026-06-18 cs.LG 新提交

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution

P-K-GCN：物理增强的Koopman图卷积网络用于深度时空超分辨率

Xizhuo, Zhang, Zekai Wang, Fei Liu, Bing Yao

发表机构 * Department of Industrial & Systems Engineering, The University of Tennessee, Knoxville（田纳西大学诺克斯维尔分校工业与系统工程系）； Charles F. Dolan School of Business, Fairfield University（费尔菲尔德大学查尔斯·F·多兰商学院）； Department of Electrical Engineering & Computer Science, The University of Tennessee, Knoxville（田纳西大学诺克斯维尔分校电气工程与计算机科学系）

AI总结提出P-K-GCN，结合样条GCN和Koopman算子理论，在非规则几何上实现时空超分辨率，并通过物理损失和理论分析保证误差降低。

详情

AI中文摘要

高保真时空动力学模拟计算成本高昂，因此需要高效的超分辨率技术从粗粒度输入重建高分辨率数据。传统数据驱动方法缺乏物理约束，而简单的物理信息学习难以处理不规则空间几何和复杂时间演化。为解决这些问题，我们提出了一种物理增强的Koopman图卷积网络（P-K-GCN），用于不规则几何上的时空超分辨率。具体地，首先设计了一个基于连续样条的GCN，直接从粗粒度图中提取空间依赖关系，并引入Koopman算子理论将非线性动力学投影到紧凑的潜空间，其中时间演化被线性化。其次，我们通过基于物理的损失增强优化目标，迫使数据驱动重建遵循物理定律，以提高预测保真度和鲁棒性。最后，我们提供了严格的理论分析，证明物理增强和Koopman正则化通过减少Rademacher复杂度和收紧泛化界，数学上保证了超分辨率误差的降低。我们在从稀疏低分辨率测量重建三维心脏几何上的高分辨率心脏电动力学上评估了我们的框架。数值实验表明，我们的方法相比基线模型实现了更高的精度。

英文摘要

High-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, necessitating efficient super-resolution techniques to reconstruct high-resolution data from coarse-grained inputs. Traditional data-driven methods often lack physical constraints, and simple physics-informed learning struggles with irregular spatial geometries and intricately evolving temporal dynamics. To tackle these challenges, we propose a Physics-augmented Koopman-enhanced Graph Convolutional Network (P-K-GCN) for spatiotemporal super-resolution on irregular geometries. Specifically, a continuous spline-based GCN is first designed to extract spatial dependencies directly from coarse graph, and Koopman operator theory is incorporated to project the nonlinear dynamics into a compact latent space where temporal progression is linearized. Second, we augment the optimization objective with a physics-based loss to force the data-driven reconstructions to adhere to physical laws for improving predictive fidelity and robustness. Finally, we provide a rigorous theoretical analysis, establishing that the physics augmentation and Koopman regularization mathematically guarantees a reduction in super-resolution error by diminishing Rademacher complexity and tightening generalization bounds. We evaluate our framework on reconstructing spatially high-resolution cardiac electrodynamics across a 3D heart geometry from sparse low-resolution measurements. Numerical experiments demonstrate that our method achieves superior accuracy compared to baseline models.

URL PDF HTML ☆

赞 0 踩 0

2606.19164 2026-06-18 cs.LG cs.AI 新提交

Essential Subspace Merging for Multi-Task Learning

多任务学习的本质子空间合并

Longhua Li, Lei Qi, Xin Geng, Qi Tian

发表机构 * School of Computer Science and Engineering, Southeast University（东南大学计算机科学与工程学院）； Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education（教育部新一代人工智能技术及其跨学科应用重点实验室（东南大学））； Huawei Inc.（华为公司）

AI总结提出本质子空间分解（ESD）和合并（ESM/ESM++）方法，通过正交化任务更新的主成分来减少多任务合并中的干扰，无需训练即可实现高效多任务学习。

详情

AI中文摘要

模型合并旨在通过将多个从同一预训练检查点微调得到的模型的能力集成到一个单一模型中，从而实现多任务学习。其核心挑战是任务特定参数更新之间的任务间干扰。在本文中，我们分析了任务更新引起的输出偏移，并观察到它们的能量集中在少数主方向上。我们将这些方向张成的子空间称为本质子空间。相比之下，大多数剩余方向携带的任务相关能量很少，但它们在多个任务更新中的累积会在合并过程中引起严重干扰。受此观察启发，我们提出了本质子空间分解（ESD），它根据激活偏移的主成分分解每个任务更新。基于ESD，我们引入了本质子空间合并（ESM），一种无需训练的静态合并方法，它将本质成分正交化并融合成一个紧凑的多任务模型。我们进一步将ESM扩展到ESM++，一种无需训练的动态合并方法，它将任务特定残差分解为低秩专家，并在前向推理过程中通过基于原型的路由选择最相关的专家。跨多个任务集和模型规模的大量实验表明，ESM和ESM++在减少任务间干扰的同时有效保留了任务知识。

英文摘要

Model merging aims to enable multi-task learning by integrating the capabilities of multiple models fine-tuned from the same pre-trained checkpoint into a single model. Its core challenge is inter-task interference among task-specific parameter updates. In this paper, we analyze the output shifts induced by task updates and observe that their energy is concentrated in a small number of principal directions. We call the subspace spanned by these directions the essential subspace. In contrast, most remaining directions carry little task-relevant energy, but their accumulation across multiple task updates can cause severe interference during merging. Motivated by this observation, we propose Essential Subspace Decomposition (ESD), which decomposes each task update according to the principal components of its activation shift. Based on ESD, we introduce Essential Subspace Merging (ESM), a training-free static merging method that orthogonalizes and fuses essential components into one compact multi-task model. We further extend ESM to ESM++, a training-free dynamic merging method that decomposes task-specific residuals into low-rank experts and selects the most relevant expert through prototype-based routing during forward inference. Extensive experiments across multiple task sets and model scales demonstrate that ESM and ESM++ effectively preserves task knowledge while reducing inter-task interference.

URL PDF HTML ☆

赞 0 踩 0

2606.18307 2026-06-18 cs.LG cs.AI 新提交

跨模型VLM评判协议用于单图像3D网格质量（以及为什么廉价代理方法不足）

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结提出可重复的VLM评判协议评估单图3D网格质量，发现几何有效性和渲染CLIP等廉价代理方法无法替代VLM评判。

详情

AI中文摘要

单图像到3D生成器正在快速改进，但目前没有公认的、无需人工的方法来判断生成的网格是否优于另一个。从业者通常依赖廉价的自动代理方法（渲染空间的CLIP相似性和网格几何有效性统计），但这些方法在多大程度上跟踪感知质量尚未确定。我们做出两项贡献。首先，我们提出并验证了一个可重复的VLM评判评估协议：一个固定的24视角无头渲染装置、两个独立的视觉语言评判家族，以及一个强制的位置偏差校正，该校正查询两种呈现顺序并仅保留顺序一致的判决。两个评判家族彼此高度一致（Cohen's kappa = 0.66），远高于随机一致性基线。其次，以该协议为参考，我们证明廉价代理方法无法替代它。几何有效性平均而言仅是一个弱信号（因为，如我们所示，它是双峰的），且低于我们预先注册的目标，而渲染CLIP则处于随机水平。一个学习的Bradley-Terry头部坍缩到一个单一流形统计量（给渲染CLIP赋予负权重），并且与仅几何方法完全匹配，因此学习特征权重毫无收益。该代理方法也是双峰的：在具有可见几何缺陷的对比中显著高于随机水平，但在模糊对比中处于随机水平，这与几何有效性仅在缺陷视觉显著时跟踪评判者的行为一致。因此，我们推荐VLM评判协议作为在测试条件下（Google Scanned Objects上的两个前馈生成器，采用面丢失退化机制）可靠且可重复的评估器，并建议不要将几何/CLIP代理方法作为优化目标。

英文摘要

Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-bias correction that queries both presentation orders and keeps only order-consistent verdicts. The two judge families agree substantially with each other (Cohen's kappa = 0.66), well above the chance-agreement floor. Second, using this protocol as the reference, we show the cheap proxies do not substitute for it. Geometry validity is only a weak signal on average (because, as we show, it is bimodal) and stays below our pre-registered target, while render-CLIP is at chance. A learned Bradley-Terry head collapses onto a single manifoldness statistic (giving render-CLIP a negative weight) and matches geometry-only exactly, so learning the feature weights buys nothing. The proxy is also bimodal: it is significantly above chance on contrasts with visible geometric defects but at chance on ambiguous contrasts, consistent with geometry validity tracking the judge only when the defect is visually salient. We therefore recommend the VLM-judge protocol as a reliable, reproducible evaluator under the conditions tested (two feed-forward generators on Google Scanned Objects, with a face-drop degradation regime) and advise against geometry/CLIP proxies as optimization targets.

URL PDF HTML ☆

赞 0 踩 0

2606.18539 2026-06-18 cs.LG stat.ML 新提交

基于A-Contrario异常检测的种子引导半监督聚类

Nassir Mohammad

发表机构 * Cyber Innovation Lab, Airbus, Newport, UK（空中客车公司网络创新实验室（英国纽波特））

AI总结提出一种基于统计对偶性的半监督聚类框架，通过a-contrario推理和感知算法，利用种子标签初始化并迭代排除异常点，实现鲁棒聚类，在少量种子下达到强性能。

详情

AI中文摘要

本文介绍了一种基于分组原则与异常检测之间统计对偶性的半监督聚类框架。我们解决了噪声环境中鲁棒聚类定义的挑战——在该任务中，划分算法往往过度分配离群点，而基于密度的方法仍对启发式全局参数敏感。借鉴\textit{a-contrario}统计推理和格式塔邻近原则，我们将聚类定义为相对于均匀随机性零假设不包含任何异常点的最大数据点子集。该方法的核心是感知算法，该算法利用基于期望的原则性阈值（$\mathbb{E} < 1$）来识别异常点，无需手动参数调整。通过将聚类视为异常检测的对偶问题，我们采用迭代的“通过排除进行聚类”机制。该算法由种子引导，利用最少的用户提供标签来初始化鲁棒的聚类中位数并形成初始组，随后通过接纳非异常点进行扩展。这种方法自然地隔离了边缘点、孤立噪声和新兴的未知聚类。我们在合成和真实基准数据集上评估了该方法，包括通过原始、线性降维和邻域保持嵌入表示的图像和文本数据集。结果表明，在每个聚类仅使用10-30个种子的情况下，所提出的方法在实用的低调优基准测试协议下实现了具有竞争力且通常非常强的性能，同时在固定种子聚类数和迭代次数下，对观测数和维度均保持线性可扩展性。

英文摘要

This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments -- a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on \textit{a-contrario} statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10--30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

URL PDF HTML ☆

赞 0 踩 0

2606.18970 2026-06-18 cs.LG cs.AI cs.CV 新提交

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics（数学系）； Department of Political and Social Sciences（政治与社会科学系）

AI总结通过受控基准测试，比较量子与经典生成器在脑MRI数据增强中的性能，发现两者均未显著优于仅用真实数据训练，且量子生成器无额外优势。

Comments This work has been submitted to the IEEE for possible publication. This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

医学图像分类常受限于有限的标注数据，因此生成式增强被提出；最近，量子生成模型被用于此目的，并经常报告准确率提升。然而，这些声称通常基于单次训练运行，未匹配量子与经典生成器的参数预算，也未表征任何收益出现的数据范围。我们提出了一个受控基准测试，隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中，在该空间中，使用变分量子生成器或参数数量几乎相同的经典生成器（1648 vs. 1632）训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器，覆盖从5%到100%的标注数据比例，通过八个随机种子进行配对显著性检验（多重比较校正）以及集内多样性和潜在分布分析。在所有比例下，没有增强变体显著优于仅用真实数据训练，且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展：合成样本分布外移，并且在数据稀缺时严重模式崩溃，而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

URL PDF HTML ☆

赞 0 踩 0

2606.19297 2026-06-18 cs.LG cs.RO 新提交

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

VLA 甚至知道基础知识吗？衡量视觉-语言-动作模型中的常识和世界知识保留

Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova, Mikhail Kolosov, Denis Shepelev, Andrey Kuznetsov, Elena Tutubalina, Aleksandr I. Panov, Alexey K. Kovalev, Vlad Shakhuro

发表机构 * CogAI Lab（CogAI实验室）； FusionBrain Lab（FusionBrain实验室）； IAI MSU（莫斯科大学人工智能研究所）； Lomonosov MSU（莫斯科国立罗蒙诺索夫大学）； NUST MISIS（国立研究型技术大学MISIS）； Applied AI Institute（应用人工智能研究所）； HSE University（高等经济大学）； Generalizable AI Systems（通用人工智能系统实验室）； ISP RAS（俄罗斯科学院系统编程研究所）； MIRAI ； Domain-specific NLP Group（领域特定自然语言处理组）

AI总结提出 Act2Answer 协议，通过动作回答评估 VLA 模型的知识保留，发现模型在简单概念上表现良好，但在丰富语义类别上存在差距，且 VQA 联合训练有助于知识保留。

Comments Project page: https://tttonyalpha.github.io/act2answer/

详情

AI中文摘要

具身视觉-语言-动作（VLA）模型通常通过在机器人数据上微调强大的预训练 VLM 获得，但目前尚不清楚它们在适应后保留了多少常识和事实知识。在知识敏感任务上的失败是模糊的，混淆了知识缺失与低级控制泛化能力差。我们引入 Act2Answer，一种轻量级协议，通过要求智能体通过动作来回答，将 VLM 知识基准适配到 VLA 评估。每个问题变成一个简短的桌面场景，其中智能体执行单个物体放置动作以选择候选答案，从而产生动作基础的、减少控制混淆的成功率。我们在不同的常识和世界知识类别中策划了这样的环境测试套件，并引入逐层意图探测以定位 VLM 骨干和动作头中与答案相关的信息。在对 7 个 VLA 模型和 9 个 VLM 基线的大规模研究中，我们系统地跨类别对模型进行排名，发现 VLA 在简单概念上表现稳健，但在更丰富的语义类别上相对于其源 VLM 显示出更大的差距，VQA 联合训练与更好的知识保留相关，并且答案相关信号在 VLA 中间层达到峰值，但在上层减弱。Act2Answer 可在以下网址获取：此 https URL。

英文摘要

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-sensitive tasks are ambiguous, conflating missing knowledge with poor generalization of low-level control. We introduce Act2Answer, a lightweight protocol that adapts VLM knowledge benchmarks to VLA evaluation by requiring agents to answer through action. Each question becomes a short tabletop episode where the agent performs a single object-placement action to select among candidate answers, yielding an action-grounded success rate with reduced control confounds. We curate a test suite of such environments across diverse commonsense and world-knowledge categories and introduce layerwise intent probing to localize answer-relevant information across the VLM backbone and action head. In a large-scale study of 7 VLA models and 9 VLM baselines, we systematically rank models across categories, finding that VLAs show solid performance on simple concepts while exhibiting larger gaps on richer semantic categories relative to their source VLMs, that VQA co-training is associated with better knowledge retention, and that answer-relevant signals peak in middle VLA layers but attenuate in upper layers. Act2Answer is available at https://tttonyalpha.github.io/act2answer/.

URL PDF HTML ☆

赞 0 踩 0

2606.18287 2026-06-18 cs.LG 新提交

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis: 解剖分辨的干预方法用于消除多模态神经影像混杂因素

Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan

发表机构 * University of Pittsburgh（匹兹堡大学）； University of Maryland（马里兰大学）； University of Southern California（南加州大学）； Binghamton University（宾汉姆顿大学）； University of Texas Rio Grande Valley（德克萨斯大学里奥格兰德河谷分校）

AI总结提出Artemis框架，通过区域级因果干预学习特定脑区的混杂因素表示，消除fMRI和DTI多模态神经影像中人口统计学混杂因素对GNN的影响，在三个基准上提升性能。

Comments 11 pages, 8 figures

详情

AI中文摘要

多模态神经影像学整合了来自fMRI的功能连接和来自DTI的结构连接，使得使用图神经网络对脑网络进行无创分析成为可能。然而，年龄和性别等人口统计学因素系统地混淆了脑连接与临床结果之间的关系，导致GNN利用虚假捷径而非学习因果不变表示。尽管最近的因果GNN方法在图建模层面引入因果关系，但其因果机制仍然是领域无关的，没有考虑临床神经影像数据中固有的真实世界混杂因素。此外，脑网络是基于图谱分区构建的，每个区域对人口统计学因素表现出不同的敏感性，因此需要区域感知的调整。我们提出了Artemis，一个区域级因果框架，通过在每个脑区域独立进行因果干预，使用轻量级参数学习区域特定的混杂因素表示，从而弥合了这一差距。我们的调整综合利用多模态功能和结构特征进行图推理，作为一个与任意GNN骨干兼容的插件模块。在三个基准（用于疾病诊断的ADNI、用于痴呆分期的OASIS和用于性别分类的HCP）上的实验表明，与代表性的基于GNN的基线相比，该方法具有一致的改进。多项支持实验进一步证明了统计显著性和神经科学可解释性。

英文摘要

Multimodal neuroimaging, integrating functional connectivity from fMRI and structural connectivity from DTI, enables non-invasive analysis of brain networks using graph neural networks. However, demographic factors such as age and sex systematically confound the relationship between brain connectivity and clinical outcomes, causing GNNs to exploit spurious shortcuts rather than learning causally invariant representations. While recent causal GNN methods introduce causality at the graph-modeling level, their causal mechanisms remain domain-agnostic without accounting for the real-world confounders inherent in clinical neuroimaging data. Moreover, brain networks are constructed from atlas-based parcellations where each region exhibits distinct sensitivity to demographic factors, necessitating region-aware adjustment. We propose Artemis, a region-level causal framework that bridges this gap with causal intervention at each brain region independently by learning region-specific confounder representations with lightweight parameters. Our adjustment comprehensively utilized the multimodal functional and structural features for graph reasoning as a plug-in module compatible with arbitrary GNN backbones. Experiments on three benchmarks, ADNI for disease diagnosis, OASIS for dementia staging, and HCP for sex classification, demonstrate consistent improvements over representative GNN-based baselines. Multiple supporting experiments further demonstrate statistical significance and neuroscientific interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.18316 2026-06-18 cs.LG 新提交

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

基于数据驱动的土壤湿度回归与分类模型综述

Ilektra Tsimpidi, George Georgoulas, Vidya Sumathy, George Nikolakopoulos

发表机构 * Electrical Engineering\ University of Technology\ , Sweden（电气工程\ 技术大学\ ，瑞典）

AI总结综述了基于AI的土壤湿度建模方法，分为五类：统计时间序列、地统计、经典机器学习、深度学习和概率/贝叶斯方法，利用多源数据实现回归或分类。

Comments 14 pages, 3 figures, AIAI 2026 Conference

详情

AI中文摘要

土壤湿度（SM）建模构成一个复杂的时空学习问题，其特点是非线性环境相互作用、异构数据源和有限的地面观测。基于物理的方法，如水量平衡模型，依赖于明确的水文方程和高质量的输入，但其计算成本和可扩展性限制阻碍了大规模部署。数据驱动的人工智能（AI）方法已成为灵活的替代方案，能够以较少的建模假设提取土壤湿度与环境变量之间的经验关系。本文对基于AI的土壤湿度估计和分类模型进行了结构化综述。现有方法被组织为五类：（a）统计时间序列模型，（b）地统计方法，（c）经典机器学习（ML）模型，（d）深度学习（DL）模型和（e）概率/贝叶斯方法。这些模型利用历史土壤湿度记录、气象变量、植被指数、地形、土壤特征和地理位置数据来执行回归或分类任务。

英文摘要

Soil Moisture (SM) modelling constitutes a complex spatiotemporal learning problem characterised by nonlinear environmental interactions, heterogeneous data sources, and limited ground observations. Physics-based approaches, such as water balance models, rely on explicit hydrological equations and high-quality inputs, but their computational cost and scalability limitations restrict large-scale deployment. Data-driven artificial intelligence (AI) methods have emerged as flexible alternatives, enabling the extraction of empirical relationships between soil moisture and environmental variables with reduced modelling assumptions. This work presents a structured survey of AI-based models for soil moisture estimation and classification. Existing approaches are organized into five categories: (a) statistical time-series models, (b) geostatistical methods (c) classical machine learning (ML) models, (d) Deep Learning (DL) models and (e) Probabilistic/Bayesian methods. These models leverage historical soil moisture records, meteorological variables, vegetation indices, topography, soil characteristics, and geolocation data to perform regression or classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.18319 2026-06-18 cs.LG cs.AI cs.HC cs.SE 新提交

可训练光子测量用于物理信息偏微分方程学习

Jiale Linghu, Hao Dong, Yangshuai Wang

发表机构 * Xidian University（西安电子科技大学）； National University of Singapore（新加坡国立大学）

AI总结提出一种光子量子神经场，将坐标编码为可训练光学相位，通过多光子Fock空间干涉混合并从光子数测量解码，作为物理信息残差最小化的可训练表示，在七种PDE基准上展示相位复杂度转变，在困难区域误差低一个数量级且参数少约四分之一。

详情

AI中文摘要

光子量子机器学习提供了一条从相位、干涉和测量构建可训练物理表示的途径。然而，其在科学机器学习中的作用仍 largely unexplored。物理信息神经场提供了一个自然设置，因为微分方程需要保留相位、频率和导数结构的试验空间。这里我们引入一种光子量子神经场，其中坐标成为可训练光学相位，通过多光子Fock空间干涉混合，并从光子数测量解码。光子电路本身作为神经场表示进行优化，而非固定特征图或硬件加速器。因此，光子测量是一种可训练表示，在此基础上最小化物理信息残差。在七个椭圆、波动、非线性色散和逆PDE基准测试中，我们观察到相位复杂度转变：经典坐标和傅里叶特征网络在平滑区域足够，而光子场在残差导数放大相位失配时最准确。在最困难区域，它给出最低误差，差距达一个数量级，且可训练参数约为经典基线四分之一。冻结和打乱控制以及噪声压力测试将这一增益归因于学习到的干涉和在复合扰动下稳定的Fock概率读出。这些结果将光子量子测量识别为科学机器学习的一种表示学习原理。

英文摘要

Photonic quantum machine learning offers a route to trainable physical representations built from phase, interference and measurement. However, its role in scientific machine learning remains largely unexplored. Physics-informed neural fields provide a natural setting, because differential equations require trial spaces that preserve phase, frequency and derivative structure. Here we introduce a photonic quantum neural field in which coordinates become trainable optical phases, are mixed by multi-photon Fock-space interference and are decoded from photon-number measurements. The photonic circuit is optimized as the neural-field representation itself, not as a fixed feature map or hardware accelerator. Photonic measurement is therefore a trainable representation on which the physics-informed residual is minimized. Across seven elliptic, wave, nonlinear dispersive and inverse PDE benchmarks, we observe a phase-complexity transition: classical coordinate and Fourier-feature networks suffice in smooth regimes, whereas the photonic field is most accurate when residual derivatives amplify phase mismatch. In the hardest regimes it gives the lowest errors, with margins reaching an order of magnitude and about one quarter of the trainable parameters of classical baselines. Frozen and shuffled controls, together with noise stress tests, attribute this gain to learned interference and stable Fock-probability readout under compound perturbations. These results identify photonic quantum measurement as a representation-learning principle for scientific machine learning.

URL PDF HTML ☆

赞 0 踩 0

2606.18726 2026-06-18 cs.LG cs.AI 新提交

Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring

基于图锚定交叉注意力Transformer神经网络的预测过程监控中结构约束完整事件序列生成

Fang Wang, Ernesto Damiani

发表机构 * Department of Computer Science, University of Milan（米兰大学计算机科学系）

AI总结提出图锚定交叉注意力Transformer（GGATN），通过全局过程图作为结构化记忆、Transformer自注意力编码序列位置、图锚定交叉注意力注入过程拓扑，结合维特比式图约束解码，一次性生成完整事件序列，在六个基准日志上优于LLM基线。

Comments 40 pages

详情

AI中文摘要

结构约束的事件序列生成仍然具有挑战性，因为生成的路径必须保持转移可行性、时间顺序、终止和属性一致性。在预测过程监控（PPM）中，这一挑战表现为完整事件序列生成，而现有工作主要处理子任务，如下一个活动、剩余时间、结果和属性预测。本文提出了图锚定交叉注意力Transformer神经网络（GGATN）用于这一统一的PPM任务。GGATN使用全局过程图作为结构化活动记忆，通过Transformer自注意力对序列位置进行上下文化，并通过图锚定交叉注意力注入过程拓扑。与自回归解码不同，GGATN一次性生成活动、时间戳、长度以及事件级和序列级属性，随后进行维特比风格的图约束解码以获得可行路径和显式终止。在六个基准事件日志上的实验表明，其生成质量优于局部指令提示的LLM基线。GGATN在序列相似性、Damerau-Levenshtein相似性、基于二元组的控制流相似性和持续时间分布方面取得了强劲性能，同时保持零幻觉活动和零序列级属性不一致。消融分析证实了全局图编码器作为稳定的结构先验。可解释性分析展示了图结构、序列上下文、反馈细化和约束解码如何塑造生成过程。

英文摘要

Structurally constrained event sequence generation remains challenging because generated paths must preserve transition feasibility, temporal order, termination, and attribute consistency. In predictive process monitoring (PPM), this challenge appears as full event sequence generation, whereas existing work mainly addresses component tasks such as next activity, remaining time, outcome, and attribute prediction. This paper proposes the Graph Grounded Cross Attention Transformer Neural Network (GGATN) for this unified PPM task. GGATN uses a global process graph as structured activity memory, contextualizes sequence positions through Transformer self attention, and injects process topology through graph grounded cross attention. Unlike autoregressive decoding, GGATN generates activities, timestamps, length, and event level and sequence level attributes in a single pass, followed by Viterbi style graph constrained decoding for feasible paths and explicit termination. Experiments on six benchmark event logs show more reliable generation quality than local instruction prompted LLM baselines. GGATN achieves strong performance on sequence similarity, Damerau Levenshtein similarity, bigram based control flow similarity, and duration distribution, while maintaining zero hallucinated activities and zero sequence level attribute inconsistency. Ablation analyses confirm the global graph encoder as a stable structural prior. Interpretability analyses show how graph structure, sequence context, feedback refinement, and constrained decoding shape generation.

URL PDF HTML ☆

赞 0 踩 0

2606.18732 2026-06-18 cs.LG cs.CV 新提交

Low-Cost Neuromorphic Fall Detection Using Synthetic Event Data and Hybrid SNNs

低成本神经形态跌倒检测：使用合成事件数据和混合SNN

Guillermo Rojas, Gonzalo Soto, Daniel Yunge

发表机构 * School of Electrical Engineering Pontificia Universidad Católica de Valparaíso, Chile（瓦尔帕莱索天主教大学电气工程学院）

AI总结提出混合SNN-CNN模型，从智能手机视频合成事件相机数据，实现高效准确的跌倒检测。

Comments 4 pages, 6 figures, presented at ICONS 2025 during the Poster Session, but not published

2606.18857 2026-06-18 cs.LG physics.ao-ph 新提交

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

研究理想化Isca模拟中平流层突然增温的机器学习模拟的归纳偏差

Oskar Bohn Lassen, Simon Driscoll, Stephen I. Thomson, Sebastian Schemm, Francisco C. Pereira

发表机构 * Technical University of Denmark（丹麦技术大学）； University of Cambridge（剑桥大学）； University of Exeter（埃克塞特大学）

AI总结测试不同架构的归纳偏差对模拟平流层突然增温动力学的影响，发现三维垂直耦合是关键，但低预测误差不保证物理一致性。

详情

AI中文摘要

机器学习模拟器越来越多地用于天气预报，并有可能通过学习动态重要的可预测性来源，将技能扩展到次季节到季节时间尺度。一个关键挑战是模型能否利用可预测性锚点，例如平流层变率，这些锚点在超出短期超前时间时影响对流层环流。我们使用配对的理想化Isca模拟测试架构归纳偏差如何影响对平流层突然增温（SSW）动力学的模拟，这些模拟仅在施加的波-2加热扰动上有所不同。在用于一步预测的卷积、变换器和基于图的架构中，当平流层动态安静时，模型差异不大，但当类似SSW的变率活跃时，差异显著扩大。我们的结果确定显式三维垂直耦合是机器学习模拟平流层动力学的关键归纳偏差。然而，Eliassen-Palm通量诊断表明，低预测误差并不能保证物理上真实的波-平均流相互作用，平流层波驱动结构中仍存在相干误差。

英文摘要

Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

URL PDF HTML ☆

赞 0 踩 0

2606.18864 2026-06-18 cs.LG cs.AI 新提交

Scaling Learning-based AEB with Massive Unlabeled Data

基于大规模无标签数据的可扩展学习型自动紧急制动

Xiangyu Wang, Yang Zhan, Mengxiang Hao, Chuanchuan Zhong, Yansong Jia, Junjie Zhang, Yu Han, Xin Jiang, Zhen Cao, Ying Wang, Yulun Song, Zhitao Xu

发表机构 * Li Auto

AI总结提出稳定元反馈半监督学习框架，通过噪声感知解耦和运动学门控伪标签，利用大规模无标签数据提升自动紧急制动性能，实现超100:1正误触发比和35%无事故里程提升。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情

AI中文摘要

本文研究如何在生产约束下，利用大规模无标签车队数据扩展基于学习的自动紧急制动（AEB）。我们的方法基于元反馈半监督学习（MF-SSL），其中教师模型为无标签驾驶数据生成伪标签，并使用小型有标签锚定集作为安全关键反馈进行更新。在生产中，锚定歧义和有标签-无标签不匹配会放大系统性的伪标签错误，导致误触发。我们提出了一种稳定的MF-SSL框架，包括：(i) 噪声感知解耦，从教师监督更新路径中移除易产生歧义的锚定；(ii) 运动学门控伪标签，结合教师冲突惩罚，抑制无标签数据上由不匹配引起的风险幻觉，同时保持广泛覆盖。大量实验表明，随着无标签数据从1M扩展到1B窗口，模型性能持续提升，在保持舒适性的同时提高了安全性。经过1B数据训练的学生模型已部署到数十万辆车辆上，并在超过10^9公里的行驶中得到验证，实现了超过100:1的正误触发比，且相比仅基于规则的基线，无事故行驶里程提升了35%。

英文摘要

This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.18882 2026-06-18 cs.LG cs.AI eess.SP 新提交

Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems

面向旋转系统不平衡表征的域偏移感知神经网络

Bernardo Feijó Junqueira, Claudio Kiyoshi Umezu, Bruno Bilhar Karaziack, Tomaz Junior, Daniel Alves Castello

发表机构 * Springer Nature

AI总结提出域偏移感知神经网络，通过最大均值差异策略对齐源域与目标域特征，解决变工况下旋转轴不平衡质量估计的回归问题，实验证明该方法在域偏移未知时显著提升预测精度。

详情

AI中文摘要

本文研究了域偏移感知神经网络在回归任务中的应用，旨在估计不同运行条件下旋转轴的不平衡质量。实验数据来自一个测试台，其中主轴上安装有带不平衡质量的法兰，在不同转速下驱动，同时可选择性地激活副轴以引入域差异。不平衡质量固定在径向距离上，使用三轴加速度计记录系统的动态响应。质量估计的逆问题在域自适应框架中提出，网络采用最大均值差异策略进行训练，以对齐源域和目标域的特征表示。结果表明，显式处理域偏移能有效提高预测精度，尤其是在系统的物理行为和域偏移来源不完全已知且超出训练条件的情况下。这些发现凸显了域偏移感知模型在结构健康监测回归任务中的潜力。

英文摘要

This work investigates the application of a domain-shift aware neural network for regression tasks aimed at estimating unbalance masses in rotating shafts under varying operating conditions. Experimental data were collected from a test rig in which a primary shaft, equipped with a flange carrying unbalanced masses, was driven at different rotational speeds, while a secondary shaft could be optionally activated to introduce domain discrepancy. The unbalance masses were positioned at a fixed radial distance, and the dynamic response of the system was recorded using triaxial accelerometers. The inverse problem of mass estimation is formulated within a domain adaptation framework, where the network is trained with a maximum mean discrepancy strategy to align feature representations across source and target distributions. The results demonstrate the effectiveness of explicitly addressing domain shift in improving prediction accuracy, especially when the system's physical behavior and sources of domain discrepancy are not fully known and fall outside the training conditions. These findings highlight the potential of domain-shift aware models for regression tasks in Structural Health Monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.18933 2026-06-18 cs.LG cs.IR stat.ME 新提交

一种面向约束感知的生物过程开发的人机协同贝叶斯优化框架

Samuel Stricker, Claus Wirnsperger, Alessandro Butté, Laura Helleckes, Gonzalo Guillén Gosálbez, Antonio del Rio Chanona, Mehmet Mercangöz

发表机构 * Imperial College London（伦敦帝国理工学院）； DataHow AG ； ETH Zurich（苏黎世联邦理工学院）

AI总结提出一种扩展的帕累托前沿引导采样框架，通过将高斯过程代理的约束满足概率和鲁棒性作为多目标优化目标，结合交互式仪表盘实现人机协同的约束感知生物过程优化。

详情

AI中文摘要

本文提出了帕累托前沿引导采样（PFGS）的一种扩展，这是一种人机协同（HitL）贝叶斯优化（BO）框架，其中高斯过程（GP）代理导出的量被重新表述为多目标优化问题的目标，得到的帕累托前沿暴露给领域专家进行交互式候选选择，而不是返回单一的自动推荐。该框架在两个方向上进行了扩展：约束优化通过将满足输出规格限的后验概率作为显式的帕累托目标来处理，该概率从GP后验分布解析计算得到；鲁棒优化通过蒙特卡洛采样策略来处理，该策略估计在用户定义的输入扰动变异性下的期望下置信性能，捕捉在可能的实现偏差下的性能退化。由此产生的多维帕累托表示通过交互式仪表盘上的成对二维投影同时显示预测性能、模型不确定性、概率约束满足和输入鲁棒性之间的权衡，使得选择标准能够随着代理模型的改进和开发目标的演变而迭代细化。该框架在一个八维的补料分批中国仓鼠卵巢（CHO）细胞培养模拟器上进行了展示，证明了系统性地识别高性能、满足可行性且对扰动具有鲁棒性的操作条件，并说明了专家定义的需求如何提供原则性的停止标准并支持实验资源的明智分配。

英文摘要

This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

URL PDF HTML ☆

赞 0 踩 0

2606.19255 2026-06-18 cs.LG 新提交

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

SCAN: 通过多尺度邻域中心聚类增强时间序列异常检测

Xingze Zheng, Hanyin Cheng, Siyuan Wang, Yiting Hao, Peng Chen, Yuan Jun, Yang Shu

发表机构 * East China Normal University（华东师范大学）； APPLab, Huawei（华为2012应用实验室）； Huawei（华为）

AI总结提出SCAN方法，通过多尺度聚类增强重建型异常检测，在表示层集成正常模式聚类中心约束重建，在异常判据层结合聚类概率与重建误差，并利用邻域中心表示改进聚类性能，在多个真实数据集上达到最优。

详情

AI中文摘要

时间序列异常检测在广泛的现实应用中扮演着关键角色。基于重建的方法已成为主流范式，但它们面临过度泛化和欠泛化问题，且难以平衡。为了解决这一问题，我们引入多尺度聚类来增强基于重建的方法。在表示层面，我们整合正常模式的聚类中心表示，以约束模型针对代表性正常模式进行重建，防止强大能力和表示能力的主导。在异常判据层面，我们基于聚类成员概率推导异常置信度分数，并将其与重建误差结合，提供双重检测标准。此外，聚类中心表示和异常置信度分数的有效性取决于聚类性能。因此，我们提取邻域中心表示用于多视图聚类，以提高聚类性能。在来自不同应用领域的多个真实数据集上的大量实验表明，SCAN达到了最先进的性能。

英文摘要

Time series anomaly detection plays a crucial role in a wide range of real-world applications. Reconstruction-based methods have become the mainstream paradigm, but they suffer from over-generalization and under-generalization problems, which are challenging to balance. To address this, we introduce multi-scale clustering to enhance reconstruction-based methods. At the representation level, we integrate the cluster center representations of normal patterns to constrain the model to target representative normal patterns for reconstruction, preventing dominance of powerful capacity and representation capability. At the anomaly criterion level, we derive anomaly confidence score based on cluster membership probability and combine it with reconstruction error, providing dual criteria for detection. Furthermore, the effectiveness of the cluster center representations and anomaly confidence score depends on the clustering performance. Accordingly, we extract neighborhood-centered representations for multi-view clustering to improve clustering performance. Extensive experiments on multiple real-world datasets from diverse application domains demonstrate the state-of-the-art performance of SCAN.

URL PDF HTML ☆

赞 0 踩 0

2606.19292 2026-06-18 cs.LG 新提交

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

使用普适环境感知信息进行ICU谵妄风险分层

Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras, Jessica Sena, Yuanfang Ren, Andrea Davidson, Ziyuan Guan, Tezcan Ozrazgat-Baslanti, Subhash Nerella, Azra Bihorac, Parisa Rashidi

发表机构 * University of Florida（佛罗里达大学）； Stanford University（斯坦福大学）

AI总结本研究利用环境声音和光照强度数据，通过高效序列神经网络模型预测ICU患者谵妄风险，发现声音是主要预测因子，结合光照可改善短期预测，AUC达0.80。

详情

AI中文摘要

谵妄是重症监护室（ICU）中常见且严重的并发症，与发病率增加、住院时间延长和医疗成本升高相关。尽管其普遍存在，早期预测和预防仍具挑战性。环境因素如环境声音和光照可能影响谵妄的发生，但在风险评估中常被忽视。在本研究中，我们检验了光照强度和声压级是否能在多个预测时间窗口内独立预测谵妄。我们评估了四种高效的序列神经网络模型，这些模型基于来自9个ICU的309名患者的数据，用于预测10种预测窗口大小的谵妄。我们使用Shapley Additive Explanations分析报告了特征重要性和影响方向。卷积模型实现了最强的区分能力，在声音数据和组合数据上的AUC均为0.80。声音特征是整体上的主要预测因子。将声音与光照结合改善了短期（<1周）预测，组合模型在感知期后立即分配最高风险。这些发现表明，被动环境感知，尤其是声音，可以为谵妄风险评估增加临床上有意义、可解释的信号，并为丰富多模态ICU预测和预防策略提供实用途径。

英文摘要

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, early prediction and prevention remain challenging. Environmental factors such as ambient sound and light may influence the onset of delirium, yet they are often overlooked in risk assessments. In this study, we examined whether light intensity and sound pressure levels can independently predict delirium across multiple prediction horizons. We evaluated four efficient sequential neural network models on data collected from 9 ICUs across 309 patients to predict delirium for 10 prediction-window sizes. We reported feature importance and direction of influence using Shapley Additive Explanations analysis. The convolutional model achieved the strongest discrimination, with AUC = 0.80 on sound data and on combined data. Sound features were the dominant predictors overall. Integrating sound with light improved short-term ($<1$ week) prediction, with the combined model assigning the highest risk immediately after the sensing period. These findings suggest that passive ambient sensing, especially sound, can add a clinically meaningful, interpretable signal for delirium risk estimation and offer a practical pathway to enrich multimodal ICU prediction and prevention strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.19317 2026-06-18 cs.LG cs.AI 新提交

Explaining Attention with Program Synthesis

用程序合成解释注意力机制

Amiri Hayes, Belinda Li, Jacob Andreas

发表机构 * NJIT（新泽西理工学院）； MIT EECS（麻省理工学院电气工程与计算机科学系）； MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）

AI总结提出用可执行程序近似深度网络组件行为的方法，针对Transformer注意力头，通过生成Python程序再现注意力模式，实现可解释性。

详情

AI中文摘要

可解释深度学习研究的一个长期目标是，用人类可理解的符号描述取代不透明的神经计算。本文提出了一种用可执行程序近似深度网络组件行为的方法。我们专注于Transformer语言模型中的注意力头。对于给定的注意力头，我们首先在一组随机选择的训练样本上计算其关联的注意力矩阵。接着，我们向预训练语言模型提供这些矩阵的摘要，并指示它生成一组Python程序，这些程序仅根据输入句子中的文本即可再现相关的注意力模式。最后，我们根据最终程序集在保留输入上预测行为的效果对程序进行重新排序。我们证明，少于1000个这样的生成程序即可再现GPT-2、TinyLlama-1.1B和Llama-3B中注意力头的注意力模式，在TinyStories上平均交并比相似度超过75%。此外，最佳匹配程序可以替代神经注意力头而不会显著影响模型行为：在三个模型中用程序替代25%的注意力头仅导致平均困惑度增加16%，同时在各种下游问答基准上保持性能。这项工作为使用人类可读、可执行的代码逆向工程Transformer模型中的注意力头提供了一个可扩展的流程，推动了神经模型向符号透明性的发展。

英文摘要

A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matrices on a collection of randomly selected training examples. Next, we prompt a pre-trained language model with a summary of these matrices, and instruct it to generate a set of Python programs that can reproduce the associated attention patterns given only text from the input sentence. Finally, we re-rank programs according to how well our final set of programs predict behavior on held-out inputs. We demonstrate that a set of fewer than 1,000 such generated programs can reproduce the attention patterns of heads in GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity above 75% on TinyStories. Moreover, the best-fit programs can replace neural attention heads without substantially affecting model behavior: replacing 25% of attention heads with programmatic surrogates across the three models incurs only a 16% average perplexity increase, while maintaining performance on a variety of downstream question answering benchmarks. This work contributes a scalable pipeline for reverse-engineering attention heads in transformer models using human-readable, executable code, advancing a path toward symbolic transparency in neural models.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 17 篇

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents

Task-Restricted Symmetries in Recurrent Weight Space

SFT Overtraining Predicts Rank Inversion via Entropy Collapse Under RLVR

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

On the Residual Scaling of Looped Transformers: Stability and Transferability

Hierarchical Attention via Domain Decomposition

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

InTrain: Intrinsic Trainability for Zero-Cost Neural Architecture Search

Attention as Frustrated Synchronization

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

GrapNet: A Programmable Dynamic-Architecture Neural Graph Substrate

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

INDEQS: Informed Neural controlled Differential EQuationS

2. 表示学习、自监督与对比学习 5 篇

From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization

3. 强化学习与序列决策 16 篇

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Structured Representation Learning with Locally Linear Embeddings and Adaptive Feature Fusion

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

Do as the Romans Do: Learning Universal Behaviors from Heterogeneous Agents

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

Reinforcement Learning Foundation Models Should Already Be A Thing

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Online Reward-Punishment Learning from Fixed-Channel Perceptual Event Streams without Environment Rewards

Pareto Q-Learning with Reward Machines

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

4. 生成模型与概率建模 6 篇

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

DIPHINE: Diffusion-based $Φ$-ID Neural Estimator

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

Structured Inference with Large Language Gibbs

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

5. 优化、泛化与理论分析 12 篇

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

Measurement noise limits the advantage of nonlinear models over linear models in biomedical prediction

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

Effects of sparsity and superposition on loss in simple autoencoders

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

Identifying Structural Biases from Causal Mechanism Shifts

Some Complexity Results for Robustness Verification for Binarized Neural Networks

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

Smoothness-Based Derandomization of PAC-Bayes Bounds

OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical Systems

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

6. 高效学习、压缩与部署 9 篇

CODEBLOCK: Learning to Supervise Code at the Right Granularity

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

Complementary Attention Head Pruning for Efficient Transformers

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

7. 联邦学习、隐私与安全 7 篇

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

Private Learning with Public Feature Conditioning

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

Detecting Hidden ML Training With Zero-Overhead Telemetry

8. 鲁棒性、不确定性与可信学习 8 篇