arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2408.08399 2026-05-26 cs.LG cs.SY eess.SY

Transformer-based few-shot learning for modeling Electricity Consumption Profiles with minimal data across thousands of domains

基于Transformer的少样本学习：以最少数据跨数千个领域建模电力消费曲线

Weijie Xia, Gao Peng, Chenguang Wang, Peter Palensky, Eric Pauwels, Pedro P. Vergara

AI总结针对电力消费曲线建模中数据稀缺问题，提出一种结合Transformer和高斯混合模型的免微调少样本学习框架，仅需1.6%数据即可准确恢复复杂分布，优于现有方法。

详情

DOI: 10.1016/j.ijepes.2026.111575
Journal ref: International Journal of Electrical Power & Energy Systems, Volume/Issue (February 2026), Article 111575

AI中文摘要

电力消费曲线（ECP）对于配电系统的运行和规划至关重要，尤其是在太阳能电池板和电动汽车等低碳技术日益普及的背景下。传统的ECP建模方法通常假设有足够的ECP数据可用。然而，在实践中，由于隐私问题或缺乏计量设备，ECP数据的可访问性有限。少样本学习（FSL）已成为数据稀缺场景下ECP建模的一种有前景的解决方案。然而，标准的FSL方法（例如用于图像的方法）不适用于ECP建模，因为（1）这些方法通常假设有多个具有充足数据的源域和多个目标域。但在ECP建模中，可能存在数千个源域（例如具有中等数据量的家庭）和数千个目标域（例如需要建模ECP的家庭）。（2）标准FSL方法通常涉及繁琐的知识迁移机制，例如预训练和微调。为了解决这些局限性，本文提出了一种新颖的FSL框架，将Transformer与高斯混合模型（GMM）相结合用于ECP建模。所提出的方法无需微调，计算效率高，即使在数据极其有限的情况下也具有鲁棒性。结果表明，我们的方法可以用最少的ECP数据（例如，仅占完整域数据集的1.6%）准确恢复复杂的ECP分布，并且在ECP建模背景下优于最先进的时间序列建模方法。

英文摘要

Electricity Consumption Profiles (ECPs) are crucial for operating and planning power distribution systems, especially with the increasing number of low-carbon technologies such as solar panels and electric vehicles. Traditional ECP modeling methods typically assume the availability of sufficient ECP data. However, in practice, the accessibility of ECP data is limited due to privacy issues or the absence of metering devices. Few-shot learning (FSL) has emerged as a promising solution for ECP modeling in data-scarce scenarios. Nevertheless, standard FSL methods, such as those used for images, are unsuitable for ECP modeling because (1) these methods usually assume several source domains with sufficient data and several target domains. However, in the context of ECP modeling, there may be thousands of source domains, e.g., households with a moderate amount of data, and thousands of target domains, e.g., households that ECP are required to be modeled. (2) Standard FSL methods usually involve cumbersome knowledge transfer mechanisms, such as pre-training and fine-tuning. To address these limitations, this paper proposes a novel FSL framework that integrates Transformers with Gaussian Mixture Models (GMMs) for ECP modeling. The proposed approach is fine-tuning-free, computationally efficient, and robust even with extremely limited data. Results show that our method can accurately restore the complex ECP distribution with a minimal amount of ECP data (e.g., only 1.6% of the complete domain dataset) and outperforms state-of-the-art time series modeling methods in the context of ECP modeling.

URL PDF HTML ☆

赞 0 踩 0

2406.09079 2026-05-26 cs.LG

Hadamard Representation: Scaffolding Performance Across Model-free RL

Hadamard表示：跨无模型强化学习的性能支撑

Jacob E. Kooi, Zhao Yang, Mark Hoogendoorn, Vincent François-Lavet

AI总结提出Hadamard表示（HR），通过将标准隐藏层替换为两个独立参数化层的逐元素乘积，减少神经元休眠并增加有效秩，从而在多种强化学习算法和领域中一致提升性能。

Comments 26 pages, 17 figures

详情

AI中文摘要

深度强化学习智能体在训练过程中逐渐失去表示能力：神经元变得休眠，从网络中移除活跃容量，有效秩崩溃，使存活的神经元冗余。现有的补救措施如周期性重置和特殊神经网络架构，大多局限于特定算法或领域。我们提出一个简单的架构修复，即Hadamard表示（HR），它将标准隐藏层替换为两个独立参数化层的逐元素乘积。HR通过两种互补机制运作。首先，它降低了神经元变得休眠的概率，这对于连续可微激活函数（如tanh）尤其有价值：与休眠的ReLU神经元（被有效剪枝）不同，饱和的tanh神经元通过将其输出权重转化为固定偏置而暗中破坏下游层。其次，独立于休眠，乘法结构捕获更丰富的特征交互，并在不拓宽层的情况下增加有效秩。我们在五种算法和三个领域上评估HR：基于像素的离散动作Atari上的DQN、PPO和PQN，基于状态连续控制上的SimbaV2，以及视觉连续控制上的MR.Q。HR在无需任何超参数调优的情况下，一致地优于强基线，并且其增益在参数匹配的更宽变体上仍然保持，排除了参数数量作为替代解释的可能性。

英文摘要

Deep reinforcement learning agents progressively lose representational capacity during training: neurons become dormant, removing active capacity from the network, and effective rank collapses, leaving surviving neurons redundant. Existing remedies such as periodic resets, and special neural network architectures, are largely algorithm- or domain-specific. We propose a simple architectural fix, the Hadamard Representation (HR), which replaces a standard hidden layer with the element-wise product of two independently parameterized layers. HR operates through two complementary mechanisms. First, it reduces the probability of a neuron becoming dormant, which is particularly valuable for continuously differentiable activations such as tanh: unlike dormant ReLU neurons, which are effectively pruned, saturated tanh neurons silently corrupt downstream layers by turning their outgoing weights into fixed biases. Second, independently of dormancy, the multiplicative structure captures richer feature interactions and increases effective rank without widening the layer. We evaluate HR across five algorithms and three domains: DQN, PPO, and PQN on pixel-based discrete-action Atari, SimbaV2 on state-based continuous control, and MR.Q on visual continuous control. HR consistently improves performance over the strong baselines without any hyperparameter tuning, and gains persist against parameter-matched wider variants, ruling out parameter count as an alternative explanation.

URL PDF HTML ☆

赞 0 踩 0

2404.10947 2026-05-26 cs.CV

Residual Connections Harm Generative Representation Learning

残差连接损害生成式表示学习

Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

AI总结通过减少残差网络中恒等捷径的权重，显著提升掩码自编码器和扩散模型等生成式表示学习框架中的语义特征学习质量。

Comments accepted to CVPR 2026

详情

AI中文摘要

我们表明，在残差网络中引入一个加权因子以减少恒等捷径的影响，可以显著增强生成式表示学习框架（如掩码自编码器（MAE）和扩散模型）中的语义特征学习。我们的修改显著提高了特征质量，对于使用ViT-B/16骨干网络的MAE，将ImageNet-1K K近邻准确率从27.4%提升至63.9%，线性探测准确率从67.8%提升至72.7%，同时增强了扩散模型的生成质量。这一显著差距表明，虽然残差连接结构在促进梯度传播方面起着重要作用，但它可能通过将浅层表示的“回声”注入深层，从而降低抽象学习能力，产生有害副作用。我们通过一个固定公式来改善这一缺点，该公式随着层深度增加而单调减少恒等连接的贡献。我们的设计促进了特征抽象的逐步发展，且不影响网络的可训练性。分析我们修改后的残差网络学到的表示，我们发现低有效特征秩与下游任务性能之间存在相关性。

英文摘要

We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.

URL PDF HTML ☆

赞 0 踩 0

2403.04545 2026-05-26 cs.LG math.ST stat.TH

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

分支缩放表现为隐式架构正则化以改善过参数化ResNet的泛化能力

Zixiong Yu, Guhan Chen, Jianfa Lai, Bohan Li, Songtao Tian

AI总结本文研究残差网络中分支缩放因子对过参数化ResNet泛化性能的影响，通过理论分析证明快速深度衰减的缩放因子结合早停可实现极小极大最优泛化率，并利用神经正切核（NTK）近似解释其机制。

Comments Accepted by ICML. This version incorporates content from the preprint arXiv:2305.18506. The contributors of the relevant content have consented to its inclusion and have been listed as authors

详情

AI中文摘要

残差分支中的缩放因子已成为提升神经网络性能的流行方法，特别是在无归一化架构中。虽然先前的工作主要从优化角度研究缩放效应，本文通过泛化理论的视角探讨其在残差架构中的作用。具体来说，我们证明具有恒定缩放因子的宽残差网络（ResNet）随着深度增加渐近地变得不可学习。相反，当缩放因子表现出快速的深度方向衰减并结合早停时，过参数化ResNet实现了极小极大最优泛化率。为了建立这一结论，我们证明宽ResNet的泛化能力可以通过与神经正切核（NTK）相关的核回归来近似。我们的理论发现通过合成数据和真实世界分类任务（包括MNIST和CIFAR-100）的实验得到验证。

英文摘要

Scaling factors in residual branches have emerged as a prevalent method for boosting neural network performance, especially in normalization-free architectures. While prior work has primarily examined scaling effects from an optimization perspective, this paper investigates their role in residual architectures through the lens of generalization theory. Specifically, we establish that wide residual networks (ResNets) with constant scaling factors become asymptotically unlearnable as depth increases. In contrast, when the scaling factor exhibits rapid depth-wise decay combined with early stopping, over-parameterized ResNets achieve minimax-optimal generalization rates. To establish this, we demonstrate that the generalization capability of wide ResNets can be approximated by kernel regression associated with the Neural Tangent Kernel (NTK). Our theoretical findings are validated through experiments on synthetic data and real-world classification tasks, including MNIST and CIFAR-100.

URL PDF HTML ☆

赞 0 踩 0

2402.13791 2026-05-26 cs.LG

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

打开黑箱：遥感中可解释人工智能的系统综述

Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

AI总结本文通过系统综述，总结了遥感中可解释AI方法的使用、目标、发现和挑战，揭示了新兴方向并提供了评估方法。

详情

DOI: 10.1109/MGRS.2024.3467001
Journal ref: published in IEEE Geoscience and Remote Sensing Magazine, vol. 12, no. 4, pp. 261-304, Dec. 2024

AI中文摘要

近年来，黑箱机器学习方法已成为遥感知识提取的主导建模范式。尽管通过可解释人工智能揭示这些模型内部运作具有潜在益处，但目前在遥感应用中，仍缺乏全面概述可解释AI方法及其目标、发现和挑战的综述。本文通过系统综述来填补这一空白，识别该领域的关键趋势，并阐明针对特定遥感挑战的新颖可解释AI方法和新兴方向。我们还揭示了解释解释的常见模式，讨论了提取的科学见解，并反思了用于评估可解释AI方法的方法。因此，我们的综述提供了遥感中可解释AI最新技术的完整总结。此外，我们详细展望了挑战和有前景的研究方向，这为新颖方法论的发展奠定了基础，并为该领域的新研究者提供了有用的起点。

英文摘要

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

URL PDF HTML ☆

赞 0 踩 0

2311.11342 2026-05-26 cs.LG cs.DC math.OC

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

去中心化随机双层优化的通信复杂度

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao

AI总结针对异构环境下现有去中心化随机双层优化算法收敛慢、通信成本高的问题，提出基于同步和交替更新策略的两种新算法，实现了更快的收敛速度和更低的通信成本，并首次在温和假设下揭示了异构设置中Hessian逆向量积的计算与通信对收敛率的影响。

详情

AI中文摘要

随机双层优化在机器学习中有着广泛的应用，包括元学习、超参数优化和神经架构搜索。为了将随机双层优化扩展到分布式数据，已经开发了几种去中心化随机双层优化算法。然而，现有方法在异构设置中通常存在收敛速度慢和通信成本高的问题，限制了它们在实际任务中的适用性。为了解决这些问题，我们提出了两种基于 extit{同步}和 extit{交替}更新策略的新型去中心化随机双层梯度下降算法。我们的算法能够实现比现有方法更快的收敛速度和更低的通信成本。重要的是，我们的收敛分析不依赖于关于异构性的强假设。更重要的是，我们的理论分析清晰地揭示了在异构设置下，关于Hessian逆向量积的计算和通信如何影响收敛率。据我们所知，这是首次在异构设置中在温和假设下取得如此有利的理论结果。此外，我们展示了如何在使用方差缩减梯度时建立交替更新策略的收敛率。最后，实验结果证实了我们算法的有效性。

英文摘要

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on \textit{simultaneous} and \textit{alternating} update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analyses clearly disclose how the computation and communication regarding the Hessian-inverse-vector product under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

URL PDF HTML ☆

赞 0 踩 0

2303.07863 2026-05-26 cs.CV cs.AI cs.MM

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

你可以比看见更早定位：一种用于压缩视频中时序句子定位的高效流程

Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

AI总结提出一种三分支压缩域时空融合框架（TCSF），直接从压缩视频中提取I帧、运动向量和残差特征，实现高效准确的时序句子定位。

Comments Accepted by CVPR 2023

详情

AI中文摘要

给定一个未剪辑视频，时序句子定位（TSG）旨在根据句子查询语义上定位目标时刻。尽管先前的工作取得了不错的成功，但它们仅关注从连续解码帧中提取的高级视觉特征，未能处理压缩视频的查询建模，导致训练和测试期间表示能力不足且计算复杂度高。本文提出了一种新的设置——压缩域TSG，直接利用压缩视频而非完全解压的帧作为视觉输入。为了处理原始视频比特流输入，我们提出了一种新颖的三分支压缩域时空融合（TCSF）框架，该框架提取并聚合三种低级视觉特征（I帧、运动向量和残差特征）以实现高效准确的定位。特别地，不像先前工作那样编码整个解码帧，我们仅通过学习I帧特征来捕获外观表示，以减少延迟。此外，我们不仅通过学习运动向量特征来探索运动信息，还通过残差特征探索相邻帧的关系。通过这种方式，进一步设计了一个带有自适应运动-外观融合模块的三分支时空注意力层，以提取和聚合外观和运动信息用于最终定位。在三个具有挑战性的数据集上的实验表明，我们的TCSF以更低的复杂度实现了比现有最先进方法更好的性能。

英文摘要

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target moment semantically according to a sentence query. Although previous respectable works have made decent success, they only focus on high-level visual features extracted from the consecutive decoded frames and fail to handle the compressed videos for query modelling, suffering from insufficient representation capability and significant computational complexity during training and testing. In this paper, we pose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input. To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding. Particularly, instead of encoding the whole decoded frames like previous works, we capture the appearance representation by only learning the I-frame feature to reduce delay or latency. Besides, we explore the motion information not only by learning the motion vector feature, but also by exploring the relations of neighboring frames via the residual feature. In this way, a three-branch spatial-temporal attention layer with an adaptive motion-appearance fusion module is further designed to extract and aggregate both appearance and motion information for the final grounding. Experiments on three challenging datasets shows that our TCSF achieves better performance than other state-of-the-art methods with lower complexity.

URL PDF HTML ☆

赞 0 踩 0

2209.11572 2026-05-26 cs.CV cs.AI cs.IR cs.MM

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

多模态跨域对齐网络用于视频时刻检索

Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

AI总结提出多模态跨域对齐网络，通过域对齐、跨模态对齐和特定对齐三个模块，解决跨域视频时刻检索中域差异和语义鸿沟问题。

Comments Accepted by IEEE Transactions on Multimedia

详情

AI中文摘要

作为多媒体信息检索中日益流行的任务，视频时刻检索（VMR）旨在根据给定的语言查询从未修剪的视频中定位目标时刻。大多数先前的方法严重依赖于大量手动标注（即时刻边界），这在实践中获取成本极高。此外，由于不同数据集之间的域差异，直接将预训练模型应用于未见过的域会导致性能显著下降。本文聚焦于一项新任务：跨域VMR，其中在一个域（“源域”）中有完全标注的数据集，但目标域（“目标域”）仅包含未标注的数据集。据我们所知，我们提出了关于跨域VMR的首项研究。为了解决这一新任务，我们提出了一种新颖的多模态跨域对齐（MMCDA）网络，将标注知识从源域迁移到目标域。然而，由于源域和目标域之间的域差异以及视频和查询之间的语义鸿沟，直接将训练好的模型应用于目标域通常会导致性能下降。为解决此问题，我们开发了三个新颖的模块：（i）域对齐模块，用于对齐每个模态在不同域之间的特征分布；（ii）跨模态对齐模块，旨在将视频和查询特征映射到联合嵌入空间，并对齐目标域中不同模态之间的特征分布；（iii）特定对齐模块，试图获取特定帧与给定查询之间的细粒度相似性以实现最优定位。通过联合训练这三个模块，我们的MMCDA能够学习域不变且语义对齐的跨模态表示。

英文摘要

As an increasingly popular task in multimedia information retrieval, video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. Most previous methods depend heavily on numerous manual annotations (i.e., moment boundaries), which are extremely expensive to acquire in practice. In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain (``source domain''), but the domain of interest (``target domain'') only contains unannotated datasets. As far as we know, we present the first study on cross-domain VMR. To address this new task, we propose a novel Multi-Modal Cross-Domain Alignment (MMCDA) network to transfer the annotation knowledge from the source domain to the target domain. However, due to the domain discrepancy between the source and target domains and the semantic gap between videos and queries, directly applying trained models to the target domain generally leads to a performance drop. To solve this problem, we develop three novel modules: (i) a domain alignment module is designed to align the feature distributions between different domains of each modality; (ii) a cross-modal alignment module aims to map both video and query features into a joint embedding space and to align the feature distributions between different modalities in the target domain; (iii) a specific alignment module tries to obtain the fine-grained similarity between a specific frame and the given query for optimal localization. By jointly training these three modules, our MMCDA can learn domain-invariant and semantic-aligned cross-modal representations.

URL PDF HTML ☆

赞 0 踩 0

2011.11194 2026-05-26 cs.LG cs.CV cs.NE

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

V3H: 面向不完整多视图聚类的视图变异与视图遗传

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

AI总结提出一种受遗传学启发的视图变异与视图遗传方法(V3H)，通过分解子空间为变异矩阵和遗传矩阵分别学习各视图的独特信息和所有视图的一致信息，并利用可调低秩表示恢复底层数据结构，在不完整多视图聚类中同时捕获一致与独特信息，在15个基准数据集上超越现有方法。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情

DOI: 10.1109/TAI.2021.3052425
Journal ref: IEEE Transactions on Artificial Intelligence 2020

AI中文摘要

真实数据常以多个不完整视图的形式出现。不完整多视图聚类是集成这些不完整视图的有效方法。以往的方法仅学习不同视图之间的一致信息，而忽略了每个视图的独特信息，这限制了它们的聚类性能和泛化能力。为克服这一局限，我们提出了一种新颖的视图变异与视图遗传方法(V3H)。受遗传学中变异与遗传的启发，V3H首先将每个子空间分解为对应视图的变异矩阵和所有视图的遗传矩阵，分别表示独特信息和一致信息。然后，通过基于聚类指示矩阵对齐不同视图，V3H集成来自不同视图的独特信息以提高聚类性能。最后，借助基于遗传矩阵的可调低秩表示，V3H恢复潜在的真正数据结构以减少大不完整性的影响。更重要的是，V3H可能是首个将遗传学引入聚类算法以从不完整多视图数据中同时学习一致信息和独特信息的工作。在15个基准数据集上的大量实验结果验证了其相对于其他最先进方法的优越性。

英文摘要

Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V3H). Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V3H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V3H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V3H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts.

URL PDF HTML ☆

赞 0 踩 0

2011.10396 2026-05-26 cs.LG cs.AI

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

双自加权多视图聚类：通过自适应视图融合

Xiang Fang, Yuchong Hu

AI总结提出双自加权多视图聚类框架（DSMC），通过自适应权重矩阵和权重因子分别对特征和图进行加权，去除冗余和噪声，并融合多图进行聚类。

Comments Corresponding author: Xiang Fang

详情

AI中文摘要

多视图聚类已应用于许多实际应用中，其中原始数据通常包含噪声。一些基于图的多视图聚类方法被提出来试图减少噪声的负面影响。然而，以往的基于图的多视图聚类方法即使存在冗余特征或噪声，也平等对待所有特征，这显然是不合理的。在本文中，我们提出了一种新颖的多视图聚类框架——双自加权多视图聚类（DSMC）来克服上述缺陷。DSMC执行双自加权操作，从每个图中去除冗余特征和噪声，从而获得鲁棒的图。对于第一次自加权操作，它通过引入自适应权重矩阵为不同特征分配不同的权重，这可以增强重要特征在联合表示中的作用，并使每个图鲁棒。对于第二次自加权操作，它通过施加自适应权重因子对不同图进行加权，这可以为更鲁棒的图分配更大的权重。此外，通过设计自适应多图融合，我们可以融合不同图中的特征，以整合这些图进行聚类。在六个真实世界数据集上的实验证明了其相对于其他最先进的多视图聚类方法的优势。

英文摘要

Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-view clustering methods have been proposed to try to reduce the negative influence of noises. However, previous graph-based multi-view clustering methods treat all features equally even if there are redundant features or noises, which is obviously unreasonable. In this paper, we propose a novel multi-view clustering framework Double Self-weighted Multi-view Clustering (DSMC) to overcome the aforementioned deficiency. DSMC performs double self-weighted operations to remove redundant features and noises from each graph, thereby obtaining robust graphs. For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust. For the second self-weighting operation, it weights different graphs by imposing an adaptive weight factor, which can assign larger weights to more robust graphs. Furthermore, by designing an adaptive multiple graphs fusion, we can fuse the features in the different graphs to integrate these graphs for clustering. Experiments on six real-world datasets demonstrate its advantages over other state-of-the-art multi-view clustering methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10331 2026-05-26 cs.CV cs.LG

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

ANIMC: 一种自动加权噪声与不完整多视图聚类的软框架

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

AI总结提出ANIMC框架，通过软自动加权策略和双软正则回归模型，处理多视图聚类中的缺失实例和噪声问题。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

详情

Journal ref: IEEE Transactions on Artificial Intelligence 2021

AI中文摘要

多视图聚类在许多图像处理场景中有广泛应用。在这些场景中，原始图像数据通常包含缺失实例和噪声，而大多数多视图聚类方法忽略了这一点。然而，缺失实例可能使这些方法难以直接使用，噪声则会导致不可靠的聚类结果。本文通过软自动加权策略和双软正则回归模型，提出了一种新颖的自动加权噪声与不完整多视图聚类框架（ANIMC）。首先，通过设计自适应半正则化非负矩阵分解（adaptive semi-RNMF），软自动加权策略为每个视图分配适当的权重，并添加软边界以平衡噪声和不完整性的影响。其次，通过提出θ-范数，双软正则回归模型通过选择不同的θ来调整模型的稀疏性。与现有方法相比，ANIMC具有三个独特优势：1）它是一种软算法，可以在不同场景下调整我们的框架，从而提高其泛化能力；2）它自动学习每个视图的适当权重，从而减少噪声的影响；3）它执行双软正则回归，对齐不同视图中的相同实例，从而减少缺失实例的影响。大量实验结果表明，它优于其他最先进的方法。

英文摘要

Multi-view clustering has wide applications in many image processing scenarios. In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods. However, missing instances may make these methods difficult to use directly and noises will lead to unreliable clustering results. In this paper, we propose a novel Auto-weighted Noisy and Incomplete Multi-view Clustering framework (ANIMC) via a soft auto-weighted strategy and a doubly soft regular regression model. Firstly, by designing adaptive semi-regularized nonnegative matrix factorization (adaptive semi-RNMF), the soft auto-weighted strategy assigns a proper weight to each view and adds a soft boundary to balance the influence of noises and incompleteness. Secondly, by proposingθ-norm, the doubly soft regularized regression model adjusts the sparsity of our model by choosing differentθ. Compared with existing methods, ANIMC has three unique advantages: 1) it is a soft algorithm to adjust our framework in different scenarios, thereby improving its generalization ability; 2) it automatically learns a proper weight for each view, thereby reducing the influence of noises; 3) it performs doubly soft regularized regression that aligns the same instances in different views, thereby decreasing the impact of missing instances. Extensive experimental results demonstrate its superior advantages over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10254 2026-05-26 cs.LG cs.AI stat.ML

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

通过视图演化方案的不平衡不完整多视图聚类：弱视图为食，强视图为食

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

AI总结针对不同视图不完整程度不平衡的问题，受生物进化理论启发，提出基于视图演化的不平衡不完整多视图聚类方法UIMC，通过加权多视图子空间聚类和低秩鲁棒表示恢复数据，显著提升聚类性能。

Comments Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

详情

DOI: 10.1109/TETCI.2021.3077909
Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence 2021

AI中文摘要

不完整多视图聚类是处理现实世界中不完整多视图数据的重要技术。以往的工作假设所有视图具有相同的不完整性，即平衡不完整性。然而，不同的视图往往具有不同的不完整性，即不平衡不完整性，这导致了强视图（低不完整性视图）和弱视图（高不完整性视图）。不平衡不完整性阻止我们直接使用先前的方法进行聚类。在本文中，受有效生物进化理论的启发，我们设计了新颖的视图演化方案来聚类强视图和弱视图。此外，我们提出了一种不平衡不完整多视图聚类方法（UIMC），这是第一个基于视图演化的有效方法，用于不平衡不完整多视图聚类。与先前的方法相比，UIMC有两个独特的优势：1）它提出了加权多视图子空间聚类来整合这些不平衡不完整的视图，有效解决了不平衡不完整多视图问题；2）它设计了低秩和鲁棒表示来恢复数据，减少了不完整性和噪声的影响。大量的实验结果表明，UIMC在三个评估指标上相比其他最先进的方法将聚类性能提高了高达40%。

英文摘要

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views). The unbalanced incompleteness prevents us from directly using the previous methods for clustering. In this paper, inspired by the effective biological evolution theory, we design the novel scheme of view evolution to cluster strong and weak views. Moreover, we propose an Unbalanced Incomplete Multi-view Clustering method (UIMC), which is the first effective method based on view evolution for unbalanced incomplete multi-view clustering. Compared with previous methods, UIMC has two unique advantages: 1) it proposes weighted multi-view subspace clustering to integrate these unbalanced incomplete views, which effectively solves the unbalanced incomplete multi-view problem; 2) it designs the low-rank and robust representation to recover the data, which diminishes the impact of the incompleteness and noises. Extensive experimental results demonstrate that UIMC improves the clustering performance by up to 40% on three evaluation metrics over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25250 2026-05-26 cs.AI

LipoAgent: Coordinating Fine-Tuned LLM Agents for Safer Lipid Design

LipoAgent: 协调微调的大语言模型智能体以实现更安全的脂质设计

Leshu Li, An Lu, Haiyu Wang, Zhibin Feng, Conghui Duan, Qing Bao, Zongmin Zhao, Sai Qian Zhang

AI总结提出LipoAgent，一种安全感知的多智能体大语言模型框架，通过条件预测目标强制毒性作为效率预测的前提，并结合多智能体验证，在mRNA转染效率预测上平均相对提升32%。

详情

AI中文摘要

脂质纳米颗粒（LNPs）是核酸递送中最临床成熟的平台之一，但设计既有效又生物学安全的脂质仍是一个主要瓶颈。在实际筛选中，毒性是一个决策层面的约束：如果一种脂质有毒，其效率预测在临床上无关紧要。我们提出LipoAgent，一种用于脂质发现的安全感知多智能体大语言模型框架。LipoAgent将领域特定微调与条件预测目标相结合，强制毒性作为效率预测的前提，并通过多智能体验证进一步提高可靠性，在存在持续分歧时辅以轻量级人工监督。在多个基础模型上，与已报道的其他脂质设计模型相比，LipoAgent在mRNA转染效率预测上实现了平均32%的相对改进。湿实验验证证实，虚拟筛选排名可靠地转化为生物学转染结果。代码公开于https://github.com/SAI-Lab-NYU/LipoAgent.git。

英文摘要

Lipid nanoparticles (LNPs) are among the most clinically mature platforms for nucleic acid delivery, yet designing lipids that are both effective and biologically safe remains a major bottleneck. In practical screening, toxicity is a decision-level constraint: if a lipid is toxic, its efficiency prediction is clinically irrelevant. We propose LipoAgent, a safety-aware multi-agent LLM framework for lipid discovery. LipoAgent combines domain-specific finetuning with a conditional prediction objective that enforces toxicity as a prerequisite for efficiency prediction, and further improves reliability via multi-agent verification with lightweight human oversight when disagreement persists. Across multiple foundation models, LipoAgent achieves an average 32% relative improvement in mRNA transfection efficiency prediction compared with other reported models for lipid design. Wet-lab validation confirms that virtual screening rankings reliably translate to biological transfection outcomes. The code is publicly available at https://github.com/SAI-Lab-NYU/LipoAgent.git.

URL PDF HTML ☆

赞 0 踩 0

2605.25244 2026-05-26 cs.CL

Inference Time Optimization with Confidence Dynamics

基于置信度动态的推理时优化

Yu Wang, Minghao Liu, Jiayun Wang, Jinrui Huang, Ankit Shah, Wei Wei

AI总结本文通过观察推理轨迹中置信度的动态变化，发现正确轨迹置信度上升而错误轨迹下降，据此提出置信度动态增益投票方法，显著提升大语言模型推理性能。

Comments Published in ICML 2026

详情

AI中文摘要

推理时优化技术（如重复采样）显著提升了大语言模型（LLMs）的推理能力。然而，模型不确定性在这些优化策略中的关键作用仍未被充分探索。本文研究了沿推理轨迹的置信度动态，并首次揭示了一个令人惊讶且独特的模式：正确回答轨迹倾向于随时间表现出置信度提升（正置信度增益），而错误轨迹在推理过程中置信度减弱或下降。基于这一观察，我们提出了基于置信度动态增益（CDG）的投票方法，该方法融入了响应置信度轨迹沿推理链的演化方式。在AIME24/25、HMMT25和BRUMO25基准测试上，针对四种开源架构（DeepSeek-R1、gpt-oss、Gemma-3、Qwen-QwQ）的实验表明，CDG相比基线取得了显著的性能提升。这些结果证明，我们的方法为改进LLM推理中的答案选择提供了稳健的判别信号。我们还为这一现象提供了理论见解。代码将在https://github.com/Accenture/CDG.git发布。

英文摘要

Inference time optimization techniques, such as repeated sampling, have significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, the critical role of model uncertainty remains largely underexplored in these optimization strategies. In this paper, we investigate the dynamics of confidence along reasoning trajectories and for first time reveal a surprising and unique pattern: correct answer traces tend to exhibit confidence improvement over time (positive confidence gain), while incorrect traces show attenuated or declining confidence as reasoning proceeds. Based on this observation, we propose Confidence Dynamic Gain (CDG) based voting, which incorporates how the confidence trajectory of the response evolves along the reasoning chain. Experiments across four open-source architectures (DeepSeek-R1, gpt-oss, Gemma-3, Qwen-QwQ) on the AIME24/25, HMMT25, and BRUMO25 benchmarks demonstrate that CDG yields a significant performance boost over baselines. These results demonstrate that our method provides a robust discriminative signal for improving answer selection in LLM reasoning. We also provide theoretical insights for this phenomenon. Code will be released at https://github.com/Accenture/CDG.git.

URL PDF HTML ☆

赞 0 踩 0

2605.25239 2026-05-26 cs.RO eess.SP

FusionCore: A 23-State Unscented Kalman Filter for IMU, Wheel Encoder, GPS, and Visual SLAM Fusion in ROS 2

FusionCore: 用于IMU、轮式编码器、GPS和视觉SLAM融合的23状态无迹卡尔曼滤波器（ROS 2）

Manan Kharwar

AI总结提出FusionCore，一个基于23状态无迹卡尔曼滤波器的开源ROS 2传感器融合包，通过在线估计轮式编码器偏航率偏差、GPS ECEF原生处理、自适应噪声协方差和VSLAM位姿融合，在12个NCLT序列上比robot_localization取得更低的绝对轨迹误差。

Comments 8 pages, 4 figures, 2 tables. Source code: https://github.com/manankharwar/fusioncore (Apache 2.0)

详情

AI中文摘要

我们提出了FusionCore，一个开源的ROS 2传感器融合包，它使用23状态无迹卡尔曼滤波器（UKF）将IMU、轮式编码器里程计、GPS和视觉SLAM位姿融合成单个100 Hz的里程计流。第23个状态是轮式编码器系统性偏航率偏差的在线估计，该偏差通过GPS航向互协方差识别，并在GPS中断期间减去，以减少滑行模式下的航向漂移。FusionCore还将陀螺仪和加速度计偏差估计为显式滤波器状态，在ECEF中本地处理GPS而无需单独的坐标投影节点，应用基于测量自由度的每传感器马氏卡方异常值门控，并根据创新序列自动调整传感器噪声协方差。VSLAM位姿融合使得任何视觉里程计或SLAM系统都能在GPS缺失环境下运行，包括从地图重新初始化中自动恢复。我们在NCLT公开数据集的12个全长序列（每个55-92分钟）上对robot_localization进行了评估。FusionCore在12个序列中的10个上实现了更低的绝对轨迹误差（ATE），在获胜序列上改进范围从1.2倍到22.2倍。robot_localization的UKF在所有12个序列上数值发散。FusionCore可在https://github.com/manankharwar/fusioncore上获取，采用Apache 2.0许可证。

英文摘要

We present FusionCore, an open-source ROS 2 sensor fusion package that fuses IMU, wheel encoder odometry, GPS, and Visual SLAM pose into a single 100 Hz odometry stream using a 23-state Unscented Kalman Filter (UKF). The 23rd state is an online estimate of the wheel encoder's systematic yaw rate bias, identified through GPS heading cross-covariance and subtracted during GPS blackouts to reduce heading drift in coast mode. FusionCore also estimates gyroscope and accelerometer biases as explicit filter states, handles GPS natively in ECEF without a separate coordinate projection node, applies per-sensor Mahalanobis chi-squared outlier gating calibrated to measurement degrees of freedom, and adapts sensor noise covariance automatically from the innovation sequence. VSLAM pose fusion enables GPS-denied operation with any visual odometry or SLAM system, including automatic recovery from map reinitialization. We evaluate against robot_localization on twelve full-length sequences (55-92 min each) from the NCLT public dataset. FusionCore achieves lower Absolute Trajectory Error (ATE) on ten of twelve sequences, with improvements ranging from 1.2x to 22.2x on winning sequences. The robot_localization UKF diverges numerically on all twelve sequences. FusionCore is available at https://github.com/manankharwar/fusioncore under the Apache 2.0 license.

URL PDF HTML ☆

赞 0 踩 0

2605.25235 2026-05-26 cs.LG cs.AI math.OC

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

约束锚定归因：神经组合优化策略的可行性认证反事实与Bonferroni-PAC充分子集

Sohaib Lafifi

AI总结提出一种神经组合优化策略的归因方法，通过LP松弛对偶分解决策、CSP可行性模型认证反事实，并用Bonferroni校正的Hoeffding充分子集测试界定PAC解释大小。

Comments 4 pages, 1 figure, Reference implementation: https://github.com/sohaibafifi/neuro-co-cax (MIT)

详情

AI中文摘要

我们为神经组合优化（CO）策略提供了一种归因方法，该方法（i）通过LP松弛对偶按约束族分解决策，（ii）通过组合可行性模型（实现为CSP可行性决策模型）认证反事实，以及（iii）通过沿贪心顺序的Bonferroni校正Hoeffding充分子集测试界定PAC充分解释的大小。在三个CO问题和三个随机种子上，我们的LP锚定$\Lambda$-归因在CVRPTW（n_cert=344）上匹配CF导出信号的96.5%，在定向问题（n_cert=281）上匹配77.2%，而代理梯度分别为75.0%和35.2%（配对差异+0.215和+0.420；McNemar精确$p \le 10^{-14}$）。在柔性作业车间调度问题的秩对齐机制中，两个后端在每个CSP认证翻转（n_cert=59）上一致，确认了无增益预测。Bonferroni-PAC子集平均每步5.0个节点（$M=70$，$\varepsilon=\delta=0.2$，$k_{\max}=25$）。参考实现：https://github.com/sohaibafifi/neuro-co-cax

英文摘要

We give an attribution method for neural combinatorial-optimisation (CO) policies that (i) decomposes a decision by constraint families via LP-relaxation duals, (ii) certifies counterfactuals through a combinatorial feasibility model (implemented as a CSP feasibility-decision model), and (iii) bounds the size of a PAC-sufficient explanation with a Bonferroni-corrected Hoeffding sufficient-subset test along a greedy ordering. Across three CO problems and three seeds, our LP-anchored $Λ$-attribution matches the CF-derived signal at 96.5% on CVRPTW (n_cert=344) and 77.2% on the Orienteering Problem (n_cert=281) vs 75.0% and 35.2% for proxy gradient (paired diffs +0.215 and +0.420; McNemar exact $p \le 10^{-14}$). In the rank-aligned regime of the Flexible Job-Shop Scheduling Problem, both backends agree on every CSP-certified flip (n_cert=59), confirming the no-gain prediction. Bonferroni-PAC subsets average 5.0 nodes per step ($M=70$, $\varepsilon=δ=0.2$, $k_{\max}=25$). Reference implementation: https://github.com/sohaibafifi/neuro-co-cax

URL PDF HTML ☆

赞 0 踩 0

2605.25234 2026-05-26 cs.LG cs.AI stat.CO stat.ML

On the Epistemic Uncertainty of Overparametrized Neural Networks

关于过参数化神经网络的认知不确定性

David Rügamer

AI总结本文通过非可辨识性视角分析过参数化神经网络的认知不确定性，刻画了离散和连续残余不确定性来源，并以单隐层ReLU网络为例验证理论。

Comments Accepted at ICML 2026 (Main Track)

2605.25233 2026-05-26 cs.AI

Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

Meta-Agent：从任务描述到经过验证的多智能体系统

Andy Xu, Yu-Wing Tai

AI总结提出Meta-Agent两阶段框架，通过任务规划、网络搜索、代码生成和验证机制，自动从自然语言任务描述构建并执行可靠的多智能体系统，在编码、上下文学习和开放推理任务中提升成功率、错误恢复和工作流稳定性。

详情

AI中文摘要

AI智能体越来越多地被用于解决复杂的多步骤任务，但随着工作流规模和深度的增长，现有的多智能体框架仍然脆弱。中间阶段的小错误会通过智能体交互传播，同时不充分的依据和薄弱的验证机制进一步限制了可靠性。我们提出Meta-Agent，一个两阶段框架，能够从自然语言任务描述自动构建并执行专门的多智能体系统。在构建阶段，任务规划器将问题分解为智能体规范的有向无环图，包含明确的输入/输出契约和验证标准。网络搜索模块用外部证据为每个规范提供依据，代码生成模块产生系统提示和工具配置。构建时验证阶段随后验证生成的工件，并在检测到失败时触发有针对性的重新生成。在执行阶段，协调器在智能体图中分配子任务，同时执行时验证对中间输出进行把关。我们进一步引入三级错误归因机制，区分局部、上游和结构性失败，从而实现从局部重试到部分重新执行和重新分解的有针对性的恢复策略。我们在编码、上下文学习和开放式推理任务上评估Meta-Agent。与强多智能体基线及消融实验相比，结果表明在任务成功率、错误恢复和工作流稳定性方面均有持续改进。这些结果凸显了将规划、依据和验证紧密集成以构建可靠多智能体系统的重要性。

英文摘要

AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at intermediate stages can propagate through agent interactions, while insufficient grounding and weak verification mechanisms further limit reliability. We present Meta-Agent, a two-phase framework that automatically constructs and executes specialized multi-agent systems from natural-language task descriptions. In the construction phase, a task planner decomposes a problem into a directed acyclic graph of agent specifications with explicit input/output contracts and verification criteria. A web search module grounds each specification with external evidence, and a code generation module produces system prompts and tool configurations. A construction-time verification stage then validates generated artifacts and triggers targeted regeneration when failures are detected. In the execution phase, a coordinator dispatches subtasks across the agent graph while execution-time verification gates intermediate outputs. We further introduce a three-level error attribution mechanism that distinguishes local, upstream, and structural failures, enabling targeted recovery strategies ranging from localized retries to partial re-execution and re-decomposition. We evaluate Meta-Agent across coding, contextual learning, and open-ended reasoning tasks. Experiments against strong multi-agent baselines and ablation studies demonstrate consistent improvements in task success rate, error recovery, and workflow stability. The results highlight the importance of tightly integrating planning, grounding, and verification for building reliable multi-agent systems.

URL PDF HTML ☆

赞 0 踩 0

2605.25228 2026-05-26 cs.LG

A Blended Likelihood Approach for Achieving Fairness Using Naive Bayes

一种使用朴素贝叶斯实现公平性的混合似然方法

John Arthur Junior, Abdul Lateef Yussif, Maame G. Asante-Mensah, Charles R. Haruna, Sandro Amofa, Elliot Attipoe

AI总结提出一种公平感知的朴素贝叶斯扩展（BMNB），通过混合似然估计和自适应阈值后处理来平衡公平性与准确性，在多个数据集上实现接近公平的指标。

详情

AI中文摘要

随着人工智能被纳入高风险决策，对算法偏见和公平性的担忧日益增加。传统的朴素贝叶斯分类器虽然高效且可解释，但缺乏公平性感知机制，并在招聘、信用评分和刑事司法等敏感领域延续了历史偏见。本研究开发了一种公平感知的朴素贝叶斯分类器扩展，在保持计算效率的同时减轻偏见。我们提出了偏见缓解朴素贝叶斯（BMNB）分类器，整合了处理中和处理后干预。处理中阶段采用混合似然方法，通过可调混合参数alpha结合组特定和合并似然估计，以平衡公平性和准确性。处理后阶段应用具有自适应阈值的输出校准，以微调组特定决策边界。实验结果表明，BMNB在Adult、ProPublica和Framingham数据集上分别达到了1.000、1.171和0.997的差异影响（DI）值，以及-0.217、-0.226和-0.053的均等机会差异（EOD）值，同时保持了计算效率。消融研究证实，混合似然与自适应阈值的组合相比单独使用任一技术都能产生更优的性能。

英文摘要

Concerns about algorithmic bias and fairness have increased as artificial intelligence has been incorporated into high-stakes decision-making. Traditional Naive Bayes classifiers, while efficient and interpretable, lack fairness-awareness mechanisms and perpetuate historical biases in sensitive domains such as hiring, credit scoring, and criminal justice. This study develops a fairness-aware extension of the Naive Bayes classifier that mitigates bias while maintaining computational efficiency. We propose the Bias Mitigating Naive Bayes (BMNB) classifier, integrating in-processing and post-processing interventions. The in-processing stage employs a blended likelihood approach combining group-specific and pooled likelihood estimates through a tunable blending parameter alpha to balance fairness and accuracy. The post-processing stage applies output calibration with adaptive thresholding to fine-tune group-specific decision boundaries. Experimental results indicate that BMNB attains Disparate Impact (DI) values of 1.000, 1.171, and 0.997 and Equal Opportunity Difference (EOD) values of -0.217, -0.226, and -0.053 on the Adult, ProPublica, and Framingham datasets, respectively, while maintaining computational efficiency. Ablation studies confirm that the combination of blended likelihood and adaptive thresholding yields superior performance compared to either technique in isolation.

URL PDF HTML ☆

赞 0 踩 0

2605.25226 2026-05-26 cs.CL

From Automation to Collaboration: Human-in-the-Loop Methods for Safe and Trustworthy NLP

从自动化到协作：面向安全可信NLP的人机协同方法

Most. Sharmin Sultana Samu, MD. Tanvir Ahmed Seum, Md. Rakibul Islam

AI总结本文综述了人机协同方法，通过人类监督支持审计、鲁棒性评估、数据构建和模型引导，以提升NLP在安全可信方面的表现，并指出了可扩展探测、可持续鲁棒性基准、低资源设置和私有系统治理等方面的差距。

Comments Preprint, manuscript under review

详情

AI中文摘要

大型语言模型广泛部署在高风险的NLP任务中，但偏见、幻觉、对抗性脆弱性和不可靠的泛化等风险仍然存在。基于探测的审计揭示了模型行为的不一致性。对抗性文本生成发现了鲁棒性差距，特别是在基准有限的低资源语言中。企业文本到SQL设置暴露了在私有和大规模数据库上验证输出的困难。人类监督对于探测验证、对抗性验证和领域特定标注至关重要，但成本高昂且难以扩展。本综述考察了最近的人机协同方法，这些方法将NLP从自动化转向协作，以实现安全性和可信度。我们回顾了人类专业知识如何支持审计、鲁棒性评估、数据构建和模型引导。我们的发现强调了可扩展探测、可持续鲁棒性基准、低资源设置和私有系统治理方面的差距。我们概述了自适应审计、协作评估和负责任部署的实用研究方向。

英文摘要

Large language models are widely deployed in high-stakes NLP tasks, yet risks such as bias, hallucination, adversarial vulnerability and unreliable generalization remain. Probe-based auditing reveals inconsistencies in model behavior. Adversarial text generation uncovers robustness gaps, especially in lower-resourced languages with limited benchmarks. Enterprise text-to-SQL settings expose the difficulty of validating outputs over private and large-scale databases. Human supervision is essential for probe validation, adversarial verification and domain-specific annotation, but it is costly and hard to scale. This survey examines recent human-in-the-loop methods that shift NLP from automation toward collaboration for safety and trustworthiness. We review how human expertise supports auditing, robustness evaluation, data construction and model steering. Our findings highlight gaps in scalable probing, sustainable robustness benchmarks, low-resource settings and governance of private systems. We outline practical research directions for adaptive auditing, collaborative evaluation and accountable deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.25220 2026-05-26 cs.CV cs.GR cs.RO

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

无需多视图生成的多视图一致3D高斯头部头像

Aviral Chharia, Fernando De la Torre

AI总结提出MVCHead，一种直接从随机采样的2D图像学习3D高斯头部模型的方法，通过层次状态空间块和SE(3)多视图评判器实现多视图一致性，无需多视图数据或3D监督。

Comments CVPR 2026; Project Website: https://humansensinglab.github.io/MVCHead/

详情

Journal ref: CVPR, Denver, CO, USA, 2026, pp. 40163-40174

AI中文摘要

高保真3D高斯头部头像生成对于AR/VR、远程呈现和数字人类等应用至关重要。现有方法依赖于多视图数据集、3D捕获或中间2D视图合成。相比之下，我们仅从随机采样的2D图像中学习条件和非条件3D头部模型，而不使用多视图数据、3D监督或中间视图生成。我们引入MVCHead，一种单次状态空间模型，直接在3D表示中强制执行多视图一致性（MVC），同时在这些约束下回归3D高斯。其核心是，我们提出层次状态空间（HiSS）块，从粗到细逐步细化高斯，同时捕获长距离依赖。在每个HiSS块中，我们修改Mamba的标准单向扫描，提出层次双向状态扫描（HiBiSS），将递归与多视图不一致性最强的轴对齐。最后，我们设计了一个SE(3)多视图评判器，判断一组自渲染是否来自单个底层3D配置，奖励跨视图像素对齐而不观察真实的多视图对。MVCHead实现了最先进的感知质量，在纹理和几何一致性上超越了先前方法，并保持了可比的形状一致性。为了展示可扩展性，我们发布了FaceGS-10K，这是第一个用于训练和评估3D头部模型的大规模即用型3D高斯头部资产数据集。项目页面和代码：https://humansensinglab.github.io/MVCHead/

英文摘要

High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/

URL PDF HTML ☆

赞 0 踩 0

2605.25216 2026-05-26 cs.RO

InvariantCloud: A Globally Invariant, Uniquely Indexed Point Cloud Framework for Robust 6-DoF Tactile Pose Tracking

InvariantCloud：一种全局不变、唯一索引的点云框架，用于鲁棒的6自由度触觉姿态跟踪

Pengfei Ye, Yuxiang Ma, Yi Zhou, Wei Chen, Wenzhen Dong, Molong Duan

AI总结提出InvariantCloud框架，利用视觉触觉传感器上表面标记星座的全局不变性，通过一次性全局不变点云配准实现6自由度物体姿态估计，抑制累积漂移并准确估计偏航旋转，在长序列操作任务中展现出高精度和鲁棒性。

2605.25212 2026-05-26 cs.LG cs.SY eess.SY

Personalized Federated Learning by Energy-Efficient UAV Communications

通过节能无人机通信实现个性化联邦学习

Shiqian Guo, Jianqing Liu, Beatriz Lorenzo

AI总结针对无人机辅助联邦学习中数据异构和能耗问题，提出全局共享骨干与本地个性化头部分离的架构，并设计基于梯度范数的调度策略，在降低能耗的同时提升学习精度。

详情

AI中文摘要

联邦学习是一种在保护数据隐私的同时增强边缘设备学习能力的有效范式。在分布式联邦学习系统中，如偏远地区的传感器网络，无人机可以灵活建立高质量通信链路以支持参数交换。然而，设备异构性和无人机有限的电池容量带来了重大挑战。具体而言，数据异构性会减慢收敛速度，而调度所有设备进行全局协作会导致过高的通信和能量成本。为了克服这些挑战，我们采用全局共享骨干与永久本地个性化头部的严格分离，从而减轻数据异构性的影响。此外，我们提出了一种基于梯度的调度策略，该策略联合考虑了能量效率和学习性能。在每轮通信中，骨干仅由梯度$\ell_{2}$范数排名前$α$的设备更新，确保优化集中在信息量最大的更新上。仿真结果表明，与最先进的方法相比，所提方案实现了更高的学习精度，同时显著降低了无人机能耗。

英文摘要

Federated learning (FL) is an effective paradigm for enhancing the learning capability of edge devices while preserving data privacy. In geographically dispersed FL systems, such as sensor networks in remote areas, unmanned aerial vehicles (UAVs) can flexibly establish high-quality communication links to support parameter exchange. However, device heterogeneity and the limited battery capacity of UAVs pose significant challenges. Specifically, data heterogeneity slows convergence, while scheduling all devices for global collaboration incurs excessive communication and energy costs. To overcome these challenges, we adopt a strict separation between a globally shared backbone and permanently local personalization heads, thereby mitigating the impact of data heterogeneity. Furthermore, we propose a gradient-based scheduling strategy that jointly considers energy efficiency and learning performance. In each communication round, the backbone is updated only by the top-$α$ devices ranked by gradient $\ell_{2}$-norm, ensuring that optimization focuses on the most informative updates. Simulation results demonstrate that the proposed scheme achieves higher learning accuracy than state-of-the-art approaches while significantly reducing UAV energy consumption.

URL PDF HTML ☆

赞 0 踩 0

2605.25211 2026-05-26 cs.LG

Evolving Causal Regulatory Networks (ECR-Net)

演化因果调控网络（ECR-Net）

Govind Vallabhasseri Binish, Abdhul Ahadh, Rano Roy Kavanal, Arya Ukunde

AI总结提出一种受生物启发的自适应因果机制发现框架ECR-Net，通过演化搜索算法动态建模因果图结构，以应对非平稳环境下的分布外泛化问题。

Comments 9 pages, 6 figures. Presents ECR-Net, an evolutionary framework for adaptive causal structure discovery under non-stationarity, with empirical evaluation against NOTEARS, PCMCI+, and related baselines

详情

AI中文摘要

现代机器学习模型在模式识别方面表现出色，但仍然脆弱，常常无法在分布外（OOD）泛化，因为它们捕获的是虚假相关性而非潜在的因果数据生成过程。当前的因果发现方法虽然强大，但通常假设静态图结构，无法建模跨环境适应或发生结构变化的系统。我们提出ECR-Net，即演化因果调控网络，一种新颖的、受生物启发的自适应因果机制发现框架。我们的方法将数据生成过程建模为动态系统，类似于基因调控网络（GRN），由局部递归函数组成，其中变量可以相互激活和抑制。为了发现该网络的潜在结构，我们采用演化搜索算法，演化候选调控图群体，优化适应度函数，该函数衡量模拟系统动力学重建观测数据的程度。ECR-Net的关键创新在于其建模结构适应的能力：它明确地将数据统计特性的变化作为环境冲击的信号。作为响应，演化搜索识别出因果图拓扑的简约修改，例如链接抑制或激活，以解释新的数据状态。我们认为ECR-Net代表了一类新的自适应结构因果模型，能够发现系统基本规则如何以及为何发生变化，为复杂非平稳系统中的鲁棒泛化提供了途径。

英文摘要

Modern machine learning models excel at pattern recognition but remain brittle, often failing to generalize out of distribution (OOD) because they capture spurious correlations rather than the underlying causal data-generating process. Current causal discovery methods, while powerful, typically assume a static graph structure, rendering them unable to model systems that adapt or undergo structural changes across different environments. We introduce ECR-Net, Evolving Causal Regulatory Networks, a novel, bio-inspired framework for adaptive causal mechanism discovery. Our approach models the data-generating process not as a static graph, but as a dynamic system analogous to a Gene Regulatory Network (GRN), composed of localized, recursive functions where variables can activate and inhibit one another. To discover the latent structure of this network, we employ an evolutionary search algorithm that evolves a population of candidate regulatory graphs, optimizing for a fitness function that measures how well the simulated system dynamics reconstruct the observed data. The key innovation of ECR-Net is its ability to model structural adaptation, it explicitly ingests shifts in the data's statistical properties as signals of an environmental shock. In response, the evolutionary search identifies parsimonious modifications to the causal graph topology, such as link inhibitions or activations that explain the new data regime. We posit that ECR-Net represents a new class of adaptive Structural Causal Models capable of discovering how and why a system's fundamental rules change, offering a path toward robust generalization in complex, non-stationary systems.

URL PDF HTML ☆

赞 0 踩 0

2605.25210 2026-05-26 cs.LG cs.AI stat.ML

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

扩散模型的多目标学习：半监督学习下的统计理论

Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng, Somayeh Sojoudi, Jitendra Malik, Pieter Abbeel, Xin Guo

AI总结针对扩散模型在多目标学习中因模型容量增大导致统计成本高的问题，提出半监督两阶段训练方法，利用未标记数据通过伪样本蒸馏，证明所需配对样本量仅取决于专家模型复杂度。

详情

AI中文摘要

扩散模型越来越多地被用作强大的条件生成器，然而实际部署通常涉及来自不同任务的多个目标分布，例如文本到图像生成中的多样化提示域，或机器人技术中具有扩散策略的多个环境。这自然引出了多目标学习（MOL）问题。一个关键挑战是，实现良好的帕累托权衡可能需要一个通用模型类，其容量远大于解决任何单个任务所需的容量，从而增加了统计成本，因为样本复杂度通常随模型复杂度而扩展。为了调和这一点，我们为有限数据下的扩散模型开发了一个原则性的多目标学习框架：一种半监督机制，其中配对（标记）样本稀缺，但（未标记）条件数据丰富。我们提出了一种两阶段训练程序，首先从有限的配对数据中拟合轻量级专家模型，然后通过生成伪样本将它们蒸馏成一个通用模型。我们建立了泛化界限，表明所需的配对样本数量仅取决于专家模型类的复杂度。我们进一步将理论扩展到用于序列决策的扩散策略，以考虑在线策略展开中的分布偏移。在机器人控制和图像恢复任务上进行了大量实验，以验证我们的理论结果。

英文摘要

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.

URL PDF HTML ☆

赞 0 踩 0

2605.25208 2026-05-26 cs.CL

They Are Not the Same: Direct Causes Are Not Grounded Emotion Explanations

它们并不相同：直接原因并非基于情感解释

Zhuangzhuang Pan, Yan Xia, Chee Seng Chan

AI总结本文通过IEMO-MECP数据集分析发现，情感-原因对提取（ECPE）任务中的二元分类代理只能有效提取直接触发原因，而无法提供基于证据的情感解释，因为情感上下文（emo-context）在二元边界处被忽略，且模型在捷径压力下倾向于选择便利归因而非真实解释。

Comments 25 pages, 11 figures, 24 tables. Preprint

详情

AI中文摘要

情感-原因对提取（ECPE）旨在解释情感为何发生，但该目标现在常被简化为二元对/非对预测。这一代理对于直接原因提取有用，但容易被过度解读为基于证据的情感解释。我们表明这种解释仅部分有效。在IEMO-MECP中，90.9%的原始正例仍为情感-原因对，95.0%的原始负例仍为非对，证实了二元ECPE任务在很大程度上得以保留。问题在于，仅直接触发因素并不构成基于的解释。情感上下文（emo-context），即有助于解释目标情感但不直接导致该情感的语句，出现在原始边界的双侧，并在二元不确定性附近富集，表明二元边界对此类话语证据没有稳定位置。在评估的ECPE模型中，直接触发因素的恢复比上下文支持更可靠。在捷径压力下，这种不平衡变得显著。二元训练模型对附近词汇相似的非对候选者分配的对分数高于对证据支持但结构上更困难的情感-原因和情感-上下文对。因此，对分数可能奖励便利归因而非基于的解释。高二元ECPE性能表明模型能识别直接触发因素，但并不表示模型已解释情感。代码公开于https://github.com/panzhzh/ECPExsame。

英文摘要

Emotion-Cause Pair Extraction (ECPE) was introduced to explain why an emotion occurs, but this goal is now often reduced to binary pair/non-pair prediction. This proxy is useful for direct-cause extraction, yet easy to over-read as evidence grounded emotion explanation. We show that this interpretation is only partially valid. In IEMO-MECP, 90.9% of original positives remain emo-cause and 95.0% of original negatives remain non-pair, confirming that the binary ECPE task is largely preserved. The problem is that direct triggers alone do not constitute a grounded explanation. Emo-context, an utterance that helps interpret a target emotion without directly causing it, appears on both sides of the original boundary and is enriched near binary uncertainty, showing that the binary boundary has no stable place for such discourse evidence. Across evaluated ECPE models, direct triggers are recovered more reliably than contextual support. Under shortcut pressure, this imbalance becomes consequential. Binary-trained models assign higher pair scores to nearby lexically similar non-pair candidates than to evidence supported but structurally harder emo-cause and emo-context pairs. Thus, pair scores can reward convenient attributions over grounded explanations. High binary ECPE performance indicates that a model can identify direct triggers; it does not indicate that the model has explained the emotion. Code is publicly available at https://github.com/panzhzh/ECPExsame.

URL PDF HTML ☆

赞 0 踩 0

2605.25204 2026-05-26 cs.CL

Clarification Is Not Enough: Post-Clarification Answering Remains the Bottleneck in Multi-Turn QA

澄清不够：澄清后的回答仍是多轮问答中的瓶颈

Jinyan Su, Jennifer Healey

AI总结本文通过分解多轮问答为澄清策略和澄清后回答两个组件，利用PACIFIC基准实验发现监督微调能快速提升澄清策略，但最终答案准确率仍显著偏低，表明理解并正确解释用户回应是关键瓶颈。

详情

AI中文摘要

多元对齐要求系统适应不同的用户价值观、沟通风格和上下文假设。我们认为，这种对齐的基础前提是，当用户意图不明确或模糊时，能够从用户那里准确引出偏好。我们通过将问题分解为两个组件来研究多轮问答中的偏好引出问题：一个 extbf{澄清策略}，决定是提出澄清问题还是直接回答；以及 extbf{澄清后回答}，在缺失信息提供后产生正确的最终答案。我们使用PACIFIC基准表明，监督微调能快速改善澄清策略，然而，即使模型采取了正确的行动，最终答案的准确率仍然显著较低。这一差距表明，理解并正确解释用户的回应是多轮问答系统中的关键瓶颈。

英文摘要

Pluralistic alignment requires systems to adapt to diverse user values, communication styles, and contextual assumptions. We believe that a foundational prerequisite for such alignment enabling accurate preference elicitation from people when their intent is under-specified or ambiguous. We study the problem of preference elicitation in multi-turn question answering by decomposing the problem into two components: a \textbf{clarification policy}, which decides whether to ask a clarifying question or answer directly, and \textbf{post-clarification answering}, which produces the correct final answer once the missing information is provided. We show, using the PACIFIC benchmark, that supervised fine-tuning rapidly improves the clarification policy, however, final answer accuracy remains substantially lower even when the model takes the correct action. This gap indicates that understanding and correctly interpreting the user's response is the critical gap in multi-turn question-answering systems.

URL PDF HTML ☆

赞 0 踩 0

2605.25203 2026-05-26 cs.LG cs.AI cs.LO

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

基于影响启发的谱旋转用于极端低位LLM量化

Gorgi Pavlov

AI总结本文利用伴随理论论文的影响自适应Walsh几何，通过WHT旋转和列缩放结合重构误差量化器，实现极端低位权重量化，在多个模型上降低困惑度15-58%。

Comments 14 pages, no figures. Companion application paper to arXiv:2605.01637 (theory). Code and pinned eval stack: https://github.com/gogipav14/spectral-llm

详情

AI中文摘要

我们将伴随理论论文（arXiv:2605.01637）的影响自适应Walsh几何应用于极端低位仅权重量化。方法是一个数学不变的变换：对每个线性层的权重矩阵进行WHT旋转，并根据逐坐标Walsh基激活能量重新缩放其列，然后交给重构误差量化器（Intel auto-round）。这使每组整数舍入偏向高谱能量通道。在四个从135M到1.5B参数的预训练仅解码器模型上，BBT-spectral在W2A16下相对于普通auto-round将wikitext-2困惑度降低了15-58%；我们还报告了一个TinyLlama-1.1B辅助数据点。三个扩展将方法迁移到其失败的族：针对Qwen3注意力的每头PCA矩阵-Gamma替换q_norm/k_norm（Qwen3-0.6B上PPL从136.76降至88.99）；与RoPE可交换的SO(2)每对旋转（Qwen2.5-1.5B上PPL从36.93降至21.84）；以及通过架构模糊测试发现的Laguna风格融合专家布局的MoE感知输入侧吸收修复。W2与W4的消融实验给出了一个故意的阴性对照：在W4下，重新分配收益落在±0.5 PPL噪声基底内，这与Schur-凸性直觉一致，即非集中影响成本随噪声预算缩小而消失。所有量化权重导出为OpenVINO IR，并在Intel NPU + Arc dGPU + CPU上运行，PPL在设备间变化在±0.1内。我们不声称将理论论文的majorization论证形式化为布尔到实数值的迁移：这里使用的WHT激活能量不是理论论文的布尔影响，联系是直观的，贡献在于工程价值而非迁移定理。与SpinQuant、QuaRot、QuIP-sharp、AQLM、OmniQuant和ButterflyQuant在匹配校准下的头对头基准测试是未来的主要工作。

英文摘要

We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantization. The recipe is one math-invariant transformation: WHT-rotate each linear layer's weight matrix and rescale its columns by per-coordinate Walsh-basis activation energy before handing off to a reconstruction-error quantizer (Intel auto-round). This biases per-group integer rounding toward high-spectral-energy channels. On four pretrained decoder-only models from 135M to 1.5B parameters, BBT-spectral reduces wikitext-2 perplexity by 15-58% relative to vanilla auto-round at W2A16; we also report a TinyLlama-1.1B auxiliary data point. Three extensions transfer the recipe to families it failed on: a per-head PCA matrix-Gamma replacement of q_norm/k_norm for Qwen3 attention (PPL 136.76 -> 88.99 on Qwen3-0.6B); an SO(2) per-pair rotation that commutes with RoPE (PPL 36.93 -> 21.84 on Qwen2.5-1.5B); and an MoE-aware input-side absorption fix identified by architectural fuzzing of Laguna-style fused-expert layouts. A W2-vs-W4 ablation gives a deliberate negative control: the redistribution payoff falls within the +/-0.5 PPL noise floor at W4, consistent with the Schur-convexity intuition that the cost of unconcentrated influence vanishes as the noise budget shrinks. All quantized weights export to OpenVINO IR and run on Intel NPU + Arc dGPU + CPU with PPL invariant to device within +/-0.1. We do not claim a formal Boolean-to-real-valued transfer of the theory paper's majorization argument: the WHT activation energy used here is not the Boolean influence of the theory paper, the link is intuitive, and the contribution is engineering value rather than a transferred theorem. Head-to-head benchmarks against SpinQuant, QuaRot, QuIP-sharp, AQLM, OmniQuant, and ButterflyQuant at matched calibration are the main future-work item.

URL PDF HTML ☆

赞 0 踩 0

2605.25198 2026-05-26 cs.LG cs.AI

Hide to Guide: Learning via Semantic Masking

隐藏以引导：通过语义掩码学习

Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu, Shang Yang, Luke J. Huang, Zhuoyang Zhang, Han Cai, Song Han

AI总结提出语义掩码专家策略优化（SMEPO），通过掩码专家轨迹中与奖励相关的语义片段，将困难问题转化为填空过程，提升强化学习在推理密集型任务中的探索效率。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）已成为提升语言模型在推理密集型任务上性能的强大范式，但其有效性常受限于探索。例如，模型在困难问题上常常失败，留下很少有用的奖励信号。外部专家轨迹提供了一种自然的引导来源，但它们也可能在通往验证器目标的关键路径上暴露与奖励相关的内容，如最终答案、中间值、可执行实现或与答案相关的实体。这些内容可能创建意外的奖励黑客通道，使策略通过复制轨迹而非学习底层推理或智能体行为来获得奖励。现有的引导式RL方法通过使用部分轨迹来降低这种风险，但它们主要启发式地控制展示多少专家信息，而非控制应隐藏哪些部分。为此，我们提出语义掩码专家策略优化（SMEPO），一种用于专家引导RLVR的细粒度语义掩码策略。SMEPO不是粗略地截断轨迹或原样展示，而是在保留专家分解、计划和过程结构的同时，掩码关键路径上与奖励相关的语义片段。这将困难问题从从头推理转变为填空过程：策略可以遵循专家的问题解决路径，但仍需自行重建缺失的值、代码或实体。SMEPO易于应用，无需更改奖励函数或RL目标。在包括数学、代码和智能体搜索在内的多个领域，SMEPO相比GRPO将准确率提升最多3.2个百分点，并将训练时间减少最多4.2倍。代码已开源：https://github.com/mit-han-lab/SMEPO。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive tasks, but its effectiveness is often limited by exploration. For example, models often fail on hard problems, leaving little useful reward signal. External expert traces offer a natural source of guidance, yet they may also expose reward-relevant content along the critical path to the verifier target, such as final answers, intermediate values, executable implementations, or answer-related entities. This content can create an unintended reward hacking channel, allowing the policy to obtain reward by copying the trace rather than learning the underlying reasoning or agentic behavior. Existing guided-RL methods reduce this risk by using partial trajectories, but they mainly control how much expert information is shown heuristically rather than which parts should be hidden. To this end, we propose Semantic Masked Expert Policy Optimization (SMEPO), a fine-grained semantic masking strategy for expert-guided RLVR. Instead of truncating traces coarsely or revealing them unchanged, SMEPO masks reward-relevant semantic spans along the critical path while preserving the expert's decomposition, plan, and procedural structure. This turns hard problems from reasoning from scratch into a fill-in-the-blank process: the policy can follow the expert's problem-solving route, but must still reconstruct the missing values, code, or entities by itself. SMEPO is simple to apply and requires no changes to the reward function or RL objective. Across diverse domains, including math, code, and agentic search, SMEPO improves accuracy by up to 3.2 points over GRPO and reduces training time by up to 4.2x. The code is available at https://github.com/mit-han-lab/SMEPO.

URL PDF HTML ☆

赞 0 踩 0

2605.25194 2026-05-26 cs.LG

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

先定位再中和：梯度引导的令牌抑制对抗视觉提示注入攻击

Dongpeng Zhang, Ke Ma, Yangbangyan Jiang, Gaozheng Pei, Longtao Huang, Qianqian Xu, Qingming Huang

AI总结针对多模态大语言模型的视觉提示注入攻击，提出梯度令牌掩码（GTM）方法，通过梯度分析定位关键图像令牌并掩码中和，将攻击成功率降至接近零且计算开销极小。

详情

AI中文摘要

对抗性图像通过提示注入对多模态大语言模型构成严重安全威胁。现有防御缺乏对底层机制的原则性理解，且难以平衡效率和防御效用。在这项工作中，我们表明成功的对抗攻击并非均匀依赖整个图像，而是依赖于一小部分关键图像令牌。基于这一见解，我们提出梯度令牌掩码（GTM），通过梯度分析定位这些令牌并通过掩码中和它们。我们发现，当攻击保留预测令牌时，基于第一个生成令牌输出概率的归因会失败。为克服这一点，GTM利用隐藏状态梯度范数分数进行对抗输入下的生成影响归因。我们证明其排名与完整对抗损失梯度的排名一致，为精确定位提供了理论保证。我们的方法仅需一次前向-反向传播即可识别并清零少量高分令牌，有效破坏对抗攻击路径。在提示注入和多模态越狱攻击上的大量实验表明，我们的方法将攻击成功率（ASR）降至接近零，同时以可忽略的计算开销保持模型效用。

英文摘要

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance efficiency and defense utility. In this work, we show that successful adversarial attacks do not rely on the entire image uniformly but instead depend on a small subset of critical image tokens. Based on this insight, we propose Gradient Token Masking (GTM), which localizes these tokens via gradient analysis and neutralizes them through masking. We find that attribution based on the first generated token's output probability fails when attacks preserve the predicted token. To overcome this, GTM utilizes the Hidden-State Gradient Norm score for generation-influence attribution under adversarial inputs. We prove that its ranking is consistent with that of the full adversarial loss gradient, providing a theoretical guarantee for accurate localization. Our method requires only a single forward-backward pass to identify and zero out a small number of high-scoring tokens, effectively disrupting the adversarial attack path. Extensive experiments on prompt injection and multimodal jailbreak attacks demonstrate that our approach reduces attack success rates (ASR) to near zero while preserving model utility with negligible computational overhead.

URL PDF HTML ☆

赞 0 踩 0