arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2409
2507.09266 2026-05-29 cs.CV

SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language Translation

SAGE: 面向令牌高效手语翻译的分段感知无词汇编码

JianHe Low, Ozge Mercanoglu Sincan, Richard Bowden

AI总结 提出分段感知视觉标记化框架,通过手语分段将连续视频转换为离散视觉令牌,结合令牌级对比对齐和双层监督,在减少序列长度50%的同时,在PHOENIX14T基准上超越现有方法。

Comments Accepted in International Conference on Computer Vision (ICCV) Workshops. Code released at https://github.com/JianHe0628/SAGE

详情
AI中文摘要

无词汇手语翻译(SLT)发展迅速,无需词汇标注即可实现强性能。然而,这些进展往往伴随着模型复杂度和计算需求增加,引发了对可扩展性的担忧,尤其是在大规模手语数据集日益普及的背景下。我们提出了一种分段感知视觉标记化框架,利用手语分段将连续视频转换为离散的、基于手语信息的视觉令牌。与先前方法相比,这使输入序列长度减少多达50%,内存使用降低高达2.67倍,并在更大数据集上具有更好的可扩展性。为了桥接视觉和语言模态,我们引入了令牌到令牌的对比对齐目标,以及双层监督,同时对齐语言嵌入和中间隐藏状态。这在不依赖词汇级监督的情况下改善了细粒度跨模态对齐。我们的方法在PHOENIX14T基准上显著超越了现有技术的性能,同时大幅减少了序列长度。进一步实验还表明,在可比序列长度下,我们的性能优于先前工作,验证了我们的标记化和对齐策略的潜力。

英文摘要

Gloss-free Sign Language Translation (SLT) has advanced rapidly, achieving strong performances without relying on gloss annotations. However, these gains have often come with increased model complexity and high computational demands, raising concerns about scalability, especially as large-scale sign language datasets become more common. We propose a segment-aware visual tokenization framework that leverages sign segmentation to convert continuous video into discrete, sign-informed visual tokens. This reduces input sequence length by up to 50% compared to prior methods, resulting in up to 2.67x lower memory usage and better scalability on larger datasets. To bridge the visual and linguistic modalities, we introduce a token-to-token contrastive alignment objective, along with a dual-level supervision that aligns both language embeddings and intermediate hidden states. This improves fine-grained cross-modal alignment without relying on gloss-level supervision. Our approach notably exceeds the performance of state-of-the-art methods on the PHOENIX14T benchmark, while significantly reducing sequence length. Further experiments also demonstrate our improved performance over prior work under comparable sequence-lengths, validating the potential of our tokenization and alignment strategies.

2507.00037 2026-05-29 cs.LG cs.AI

Model Fusion via Retrofitting

通过回溯改造的模型融合

Phoomraphee Luenam, Andreas Spanopoulos, Amit Sant, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

AI总结 提出一种以神经元为中心的融合算法,通过将父模型中间神经元分组为目标表示并训练融合模型子网络逼近,结合神经元归因分数进行显著特征对齐,适用于任意可模块化为有向无环图结构的架构,在零样本和非独立同分布场景下表现最佳。

Comments 5 figures, 15 tables, 23 pages

详情
AI中文摘要

模型融合旨在将独立训练的神经网络组合成一个单一模型而无需重新训练,但由于排列不变性、随机初始化和异构训练数据导致的表示差异,这一过程变得复杂。现有方法在非独立同分布数据分布下的零样本设置中尤其困难,并且通常局限于特定架构或成对融合。我们引入了一类以神经元为中心的融合算法,将融合视为一个原则性的表示匹配问题:父模型中的中间神经元被分组为目标表示,然后训练融合模型的相应子网络来逼近这些表示。与先前工作不同,我们的方法结合了神经元归因分数以偏向于显著特征的对齐,并且可以应用于任何可模块化为有向无环图层次的架构——在VGG、ResNet和ViT上进行了实证验证。在标准基准上的实验显示,与现有融合方法相比,我们的方法取得了一致的改进,在零样本和非独立同分布场景中增益最大。代码可在https://github.com/AndrewSpano/model-fusion-via-retrofitting获取。

英文摘要

Model fusion seeks to combine independently trained neural networks into a single model without retraining, but is complicated by representational divergence arising from permutation invariance, random initialization, and heterogeneous training data. Existing methods struggle particularly in zero-shot settings under non-IID data distributions, and are often limited to specific architectures or pairwise fusion. We introduce a neuron-centric family of fusion algorithms that frames fusion as a principled representation-matching problem: intermediate neurons across parent models are grouped into target representations, which the fused model's corresponding sub-networks are then trained to approximate. Unlike prior work, our approach incorporates neuron attribution scores to bias alignment toward salient features, and can be applied to any architecture modularizable as a DAG of levels -- empirically validated on VGGs, ResNets, and ViTs. Experiments across standard benchmarks show consistent improvements over existing fusion methods, with the largest gains in zero-shot and non-IID scenarios. Code is available at https://github.com/AndrewSpano/model-fusion-via-retrofitting.

2506.12815 2026-05-29 cs.LG

TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

TrojanTO:针对轨迹优化模型的行动级后门攻击

Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen

AI总结 提出TrojanTO,首个针对轨迹优化模型的行动级后门攻击方法,通过交替训练增强触发与目标动作关联,并利用轨迹过滤和批量投毒实现高隐蔽性,在低攻击预算下有效植入后门。

Comments 23 pages, 6 figures

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

轨迹优化(TO)模型的最新进展在离线强化学习中取得了显著成功。然而,它们对后门攻击的脆弱性尚不清楚。我们发现,现有的强化学习后门攻击基于奖励操纵,由于TO模型固有的序列建模特性,这些攻击对其基本无效。此外,高维动作空间带来的复杂性进一步加剧了动作操纵的挑战。为解决这些问题,我们提出了TrojanTO,这是首个针对TO模型的行动级后门攻击。TrojanTO采用交替训练来增强触发器与目标动作之间的关联,以提高攻击有效性。为提高攻击隐蔽性,它通过轨迹过滤进行精确投毒以保持正常性能,并通过批量投毒确保触发器一致性。大量评估表明,TrojanTO能够在低攻击预算(0.3%的轨迹)下,跨不同任务和攻击目标有效植入后门攻击。此外,TrojanTO对DT、GDT和DC具有广泛的适用性,突显了其跨多种TO模型架构的可扩展性。

英文摘要

Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3\% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.

2505.13745 2026-05-29 cs.LG stat.ML

Synthetic Non-stationary Data Streams for Recognition of the Unknown

用于未知识别的合成非平稳数据流

Joanna Komorniczak

AI总结 提出一种同时包含概念漂移和新类出现的合成数据流生成策略,并评估无监督漂移检测器在开放集识别任务中的表现。

详情
AI中文摘要

数据非平稳性问题在数据流处理中常被讨论。在动态环境中,方法应持续准备分析时变数据——因此,它们应支持增量训练并应对概念漂移。非平稳数据流环境中另一个同样重要的变化是新的、先前未知类别的出现。通常,方法专注于这两种现象之一——检测概念漂移或检测新类别——而数据流中可能同时出现这两种困难。此外,关于先前未知的观测,开放类别集的话题近年来变得尤为重要,方法的目标是在已知类别内高效分类,并识别模型能力范围外的对象。本文提出一种合成数据流生成策略,其中同时出现概念漂移和代表未知对象的新类别。所呈现的研究展示了无监督漂移检测器如何处理检测新类别和概念漂移的任务,并演示了生成的数据流如何用于开放集识别任务。

英文摘要

The problem of data non-stationarity is commonly addressed in data stream processing. In a dynamic environment, methods should continuously be ready to analyze time-varying data -- hence, they should enable incremental training and respond to concept drifts. An equally important variability typical for non-stationary data stream environments is the emergence of new, previously unknown classes. Often, methods focus on one of these two phenomena -- detection of concept drifts or detection of novel classes -- while both difficulties can be observed in data streams. Additionally, concerning previously unknown observations, the topic of open set of classes has become particularly important in recent years, where the goal of methods is to efficiently classify within known classes and recognize objects outside the model competence. This article presents a strategy for synthetic data stream generation in which both concept drifts and the emergence of new classes representing unknown objects occur. The presented research shows how unsupervised drift detectors address the task of detecting novelty and concept drifts and demonstrates how the generated data streams can be utilized in the open set recognition task.

2505.02604 2026-05-29 cs.LG

Connecting Independently Trained Modes via Layer-Wise Connectivity

通过逐层连接性连接独立训练的模态

Yongding Tian, Zaid Al-Ars, Maksim Kitsak, Peter Hofstee

AI总结 提出一种新的经验算法,通过逐层连接性构建独立训练神经网络模型之间的连续低损失路径,在多种现代架构上实现更一致的模式连接。

Comments 28 pages, 22 figures, accepted in ICML 2026: https://openreview.net/forum?id=4VOTzpH9MO

详情
AI中文摘要

实证研究表明,可以在独立训练的神经网络模型之间构建连续的低损失路径。这种现象称为模式连接性,指的是在参数空间中不同模式(即训练良好的解)之间存在这样的路径。然而,现有的经验方法不能可靠地连接独立训练的模态,并且主要在一组狭窄的架构(例如,基本的CNN、VGG和ResNet)上进行了评估,使得它们在新模型上的有效性尚不清楚。在这项工作中,我们提出了一种新的经验算法,用于连接独立训练的模态,该算法超越了传统架构,支持更广泛的网络,包括MobileNet、ShuffleNet、EfficientNet、RegNet、深度层聚合(DLA)和紧凑卷积变换器(CCT)。除了更广泛的适用性外,所提出的方法在独立训练的模态对之间产生更一致的连接路径,并支持连接使用不同训练超参数获得的模态。

英文摘要

Empirical studies have shown that continuous low-loss paths can be constructed between independently trained neural network models. This phenomenon, known as mode connectivity, refers to the existence of such paths between distinct modes-i.e., well-trained solutions in parameter space. However, existing empirical methods do not reliably connect independently trained modes and have been evaluated mainly on a narrow set of architectures (e.g., basic CNNs, VGG, and ResNet), leaving their effectiveness on newer models unclear. In this work, we propose a new empirical algorithm for connecting independently trained modes that generalizes beyond traditional architectures and supports a broader range of networks, including MobileNet, ShuffleNet, EfficientNet, RegNet, Deep Layer Aggregation (DLA), and Compact Convolutional Transformers (CCT). In addition to broader applicability, the proposed method yields more consistent connectivity paths across independently trained mode pairs and supports connecting modes obtained with different training hyperparameters.

2505.02069 2026-05-29 cs.LG stat.ML

Neural Logistic Bandits

神经逻辑老虎机

Seoungbin Bae, Dabeen Lee

AI总结 针对神经逻辑老虎机问题,利用一种新型的自归一化向量值鞅的Bernstein型不等式,提出两种算法NeuralLog-UCB-1和NeuralLog-UCB-2,分别实现与有效维度相关的遗憾上界,改进了现有结果。

详情
AI中文摘要

我们研究了神经逻辑老虎机问题,其主要任务是通过神经网络学习逻辑链接函数内的未知奖励函数。现有方法要么对$κ$(其中$1/κ$表示奖励分布的最小方差)有不利的依赖,要么直接依赖于特征维度$d$,而在基于神经网络的设置中$d$可能非常大。在这项工作中,我们引入了一种新型的自归一化向量值鞅的Bernstein型不等式,旨在绕过对环境维度的直接依赖。这使我们能够推导出一个遗憾上界,该上界随有效维度$\widetilde{d}$增长,而不是特征维度,同时保持对$κ$的最小依赖。基于该集中不等式,我们提出了两种算法NeuralLog-UCB-1和NeuralLog-UCB-2,它们分别保证了$\widetilde{O}(\widetilde{d}\sqrt{κT})$和$\widetilde{O}(\widetilde{d}\sqrt{T/κ})$阶的遗憾上界,改进了现有结果。最后,我们在合成数据集和真实数据集上报告了数值结果,以验证我们的理论发现。

英文摘要

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $κ$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{κT})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/κ})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

2504.06022 2026-05-29 cs.CV

CamC2V: Context-aware Controllable Video Generation

CamC2V: 上下文感知的可控视频生成

Luis Denninger, Sina Mokhtarzadeh Azar, Juergen Gall

AI总结 提出CamC2V模型,通过集成多图像条件与3D约束及相机控制,实现上下文感知的连贯视频生成,在RealEstate10K数据集上FVD提升24.09%。

Comments Published at 3DV 2026

详情
AI中文摘要

近年来,图像到视频(I2V)扩散模型展示了令人印象深刻的场景理解和生成质量,通过引入图像条件来指导生成。然而,这些模型主要将静态图像动画化,而不扩展其提供的上下文。引入额外的约束,如相机轨迹,可以增强多样性,但往往会降低视觉质量,限制了它们在需要忠实场景表示的任务中的适用性。我们提出了CamC2V,一种上下文到视频(C2V)模型,它将多个图像条件作为上下文与3D约束以及相机控制集成在一起,以丰富全局语义和细粒度视觉细节。这使得视频生成更加连贯且上下文感知。此外,我们论证了有效上下文表示中时间感知的必要性。我们在RealEstate10K数据集上的全面研究表明,视觉质量和相机可控性提高了24.09%(FVD)。我们的代码公开在:https://github.com/LDenninger/CamC2V。

英文摘要

Recently, image-to-video (I2V) diffusion models have demonstrated impressive scene understanding and generative quality, incorporating image conditions to guide generation. However, these models primarily animate static images without extending beyond their provided context. Introducing additional constraints, such as camera trajectories, can enhance diversity but often degrade visual quality, limiting their applicability for tasks requiring faithful scene representation. We propose CamC2V, a context-to-video (C2V) model that integrates multiple image conditions as context with 3D constraints alongside camera control to enrich both global semantics and fine-grained visual details. This enables more coherent and context-aware video generation. Moreover, we motivate the necessity of temporal awareness for an effective context representation. Our comprehensive study on the RealEstate10K dataset demonstrates a $24.09\%$ (FVD) improvement in visual quality and camera controllability. Our code is publicly available at: https://github.com/LDenninger/CamC2V.

2502.21004 2026-05-29 cs.CV

Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition

软化掩码:自适应时间软掩码用于高效动态面部表情识别

Meng-zhu Li, Quanxing Zha, Hongjun Wu

AI总结 提出一种结合自监督重建与监督分类的AdaTosk网络,通过自适应时间软掩码(类不可知和类语义软掩码)增强关键表情时刻并减少语义冗余,在降低计算成本的同时保持竞争性能。

Comments 6 pages, 3 figures

详情
AI中文摘要

动态面部表情识别(DFER)通过非语言交流促进对心理意图的理解。现有方法难以管理无关信息(如背景噪声和冗余语义),影响效率和有效性。本文提出一种新颖的监督式时间软掩码自编码器网络用于DFER,即AdaTosk,它将并行监督分类分支与自监督重建分支相结合。自监督重建分支应用随机二元硬掩码生成多样化的训练样本,促进可见令牌中的有意义的特征表示。同时,分类分支采用自适应时间软掩码,根据时间重要性灵活地掩盖可见令牌。其两个关键组成部分,即类不可知软掩码和类语义软掩码,分别用于增强关键表情时刻并随时间减少语义冗余。在广泛使用的基准测试上进行的大量实验表明,与当前最先进方法相比,我们的AdaTosk显著降低了计算成本,同时仍保持竞争性能。

英文摘要

Dynamic Facial Expression Recognition (DFER) facilitates the understanding of psychological intentions through non-verbal communication. Existing methods struggle to manage irrelevant information, such as background noise and redundant semantics, which impacts both efficiency and effectiveness. In this work, we propose a novel supervised temporal soft masked autoencoder network for DFER, namely AdaTosk, which integrates a parallel supervised classification branch with the self-supervised reconstruction branch. The self-supervised reconstruction branch applies random binary hard mask to generate diverse training samples, encouraging meaningful feature representations in visible tokens. Meanwhile the classification branch employs an adaptive temporal soft mask to flexibly mask visible tokens based on their temporal significance. Its two key components, respectively of, class-agnostic and class-semantic soft masks, serve to enhance critical expression moments and reduce semantic redundancy over time. Extensive experiments conducted on widely-used benchmarks demonstrate that our AdaTosk remarkably reduces computational costs compared with current state-of-the-art methods while still maintaining competitive performance.

2502.20954 2026-05-29 cs.LG

Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition

鲁棒且高效的独立于书写者的基于IMU的手写识别

Jindong Li, Tim Hamann, Jens Barth, Peter Kämpf, Dario Zanca, Björn Eskofier

AI总结 提出一种结合CNN编码器和BiLSTM解码器的模型,在IMU数据上实现独立于书写者的手写识别,在OnHW数据集和自建数据集上分别达到7.37%和9.44%的字符错误率,并展现出对未见书写风格的鲁棒性。

Comments Accepted at iWOAR 2025. Published in Springer LNCS, 2026. Code available at https://github.com/jindongli24/REWI

详情
Journal ref
Sensor-Based Activity Recognition and Artificial Intelligence (iWOAR 2025), Lecture Notes in Computer Science, pp. 261-286, Springer, Cham, 2026
AI中文摘要

使用惯性测量单元(IMU)数据进行手写识别(HWR)由于书写风格的多样性和数据集的有限性仍然具有挑战性。以往的方法往往难以处理未见过的书写者的手写,使得独立于书写者(WI)的识别成为一个关键但困难的问题。本文提出了一种模型,旨在提高基于IMU数据的WI HWR性能,该模型使用CNN编码器和基于BiLSTM的解码器。我们的方法对未见过的书写风格表现出强大的鲁棒性,在公共OnHW数据集和我们基于单词的数据集的WI划分上均优于现有方法,分别实现了7.37%和9.44%的字符错误率(CER),以及15.12%和32.17%的词错误率(WER)。鲁棒性评估表明,我们的模型在不同年龄组中保持优越性能,并且从一个组学到的知识相比其他方法能更好地泛化到另一个组。在我们基于句子的数据集上的评估进一步展示了识别完整句子的潜力。通过全面的消融研究,我们表明我们的设计选择在性能和效率之间实现了良好的平衡。这些发现支持开发更适应和可扩展的HWR系统用于实际应用。

英文摘要

Handwriting recognition (HWR) using inertial measurement unit (IMU) data remains challenging due to variations in writing styles and the limited availability of datasets. Previous approaches often struggle with handwriting from unseen writers, making writer-independent (WI) recognition a crucial yet difficult problem. This paper presents a model designed to improve WI HWR on IMU data, using a CNN encoder and BiLSTM-based decoder. Our approach demonstrates strong robustness to unseen handwriting styles, outperforming existing methods on the WI splits of both the public OnHW dataset and our word-based dataset, achieving character error rates (CERs) of 7.37% and 9.44%, and word error rates (WERs) of 15.12% and 32.17%, respectively. Robustness evaluation shows that our model maintains superior performance across different age groups, with knowledge learned from one group generalizing better to another compared to other approaches. Evaluation on our sentence-based dataset further demonstrates the potential for recognizing full sentences. Through comprehensive ablation studies, we show that our design choices achieve a strong balance between performance and efficiency. These findings support the development of more adaptable and scalable HWR systems for real-world applications.

2502.20838 2026-05-29 cs.SD cs.AI cs.LG eess.AS

Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data

弱监督检测与长时间生物声学数据中鲸叫声的时间定位

Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Takeshi Ashizawa, Kazuhiro Nakadai

AI总结 提出DSMIL-LocNet框架,利用弱监督多实例学习仅使用录音级标签实现鲸叫声的分类和时间定位,在长录音上优于全监督基线。

Comments Accepted in European Signal Processing Conference (EUSIPCO) 2026

详情
AI中文摘要

被动声学监测(PAM)系统生成持续数月连续录音,但自动化生物声学分析鲸叫声需要两种独立的标注工作:用于分类的二元存在标签和用于定位的精确时间边界。一个多分钟录音的二元标签可以在几秒钟内分配,但对其中的每个叫声打时间戳需要数小时的专家努力。在操作规模上同时提供两者是不可行的。我们提出DSMIL-LocNet,一个弱监督多实例学习(MIL)框架,仅使用录音级存在/缺失标签执行分类和时间定位。我们的双流架构整合频谱和时间特征,处理2-30分钟的录音,而无需现有CNN方法在长输入上退化的时间压缩。在AcousticTrends BlueFinLibrary上,DSMIL-LocNet在300-1800秒录音上达到F1分数0.88-0.91,而全监督CNN基线退化为0.19-0.64。它还提供这些基线在没有帧级标注的情况下无法产生的时间定位。代码:https://github.com/Ragib-Amin-Nihal/DSMIL-LocNet

英文摘要

Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc

2502.10330 2026-05-29 cs.LG

Diffusion-based learning framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement

基于扩散的约束非凸优化学习框架与加权自举细化

Shutong Ding, Yimiao Zhou, Ke Hu, Xi Yao, Junchi Yan, Xiaoying Tang, Ye Shi

AI总结 提出DiOpt框架,通过监督预热和自举训练两阶段学习噪声到约束区域的映射,解决扩散模型在约束非凸优化中的分布错位问题,实现高约束满足和最优性。

Comments accepted by ICML2026

详情
AI中文摘要

扩散模型的最新进展显示出通过利用其多模态性来加速非凸问题求解的潜力。然而,现有的大多数基于扩散的优化方法依赖于监督学习,并且缺乏强制执行约束满足的机制,而这在现实应用中是需要满足的。在这种情况下,我们研究并理论分析了监督扩散求解器的固有问题,并识别出分布错位问题,即生成的解分布在可行区域上的概率质量通常较低。为了解决这个问题,我们提出了DiOpt,一种新的基于扩散的约束非凸优化学习框架,它有效地学习了从噪声到约束区域的映射。具体来说,该框架在两个不同的阶段运行:初始的预热阶段,通过监督学习实现,随后是自举训练阶段。这种双阶段架构旨在迭代地细化解,从而在高度满足约束的情况下改进目标函数。最后,我们还在推理中采用解选择技术以获得更好的最优性。值得注意的是,DiOpt是首次成功将扩散求解器集成到约束非凸优化中。在多样化的非凸任务上的评估显示了DiOpt在最优性和约束满足方面的优越性。我们的官方页面发布在https://dingsht.tech/diopt-webpage。

英文摘要

Recent advances in diffusion models show promising potential to accelerate nonconvex problem solving by leveraging their multimodality. However, most existing diffusion-based optimization approaches rely on supervised learning and lack a mechanism to enforce constraint satisfaction, which is required in real-world applications. In that case, we investigate and theoretically analyze the inherent problem of supervised diffusion solvers and identify the distributional misalignment problem, i.e., the generated solution distribution often exhibits low probability mass on the feasible region. To resolve this issue, we propose DiOpt, a new diffusion-based learning framework for constrained nonconvex optimization, which effectively learns the mapping from noise to the constraint region. Specifically, this framework operates in two distinct phases: an initial warm-start phase, implemented via supervised learning, followed by a bootstrapping training phase. This dual-phase architecture is designed to iteratively refine solutions, thereby improving the objective function with high constraint satisfaction. Finally, we also employ a solution selection technique in inference for better optimality. Notably, DiOpt is the first successful integration of the diffusion solver in constrained nonconvex optimization. Evaluations on diverse nonconvex tasks demonstrate the superiority of DiOpt in both optimality and constraint satisfaction. Our official page is released at https://dingsht.tech/diopt-webpage.

2502.10205 2026-05-29 cs.LG

Looking around you: external information enhances representations for event sequences

环顾四周:外部信息增强事件序列的表示

Petr Sokerin, Maria Kovaleva, Ekaterina Boyarina, Pavel Tikhomirov, Denis Vorobiyov, Alexey Zaytsev

AI总结 针对事件序列表示学习中忽略同时发生序列上下文的问题,提出通过聚合多个用户表示来增强特定用户表示的方法,其中可学习注意力机制在多个数据集上显著提升指标。

详情
AI中文摘要

表示学习在不同领域产生模型,例如商店购买、客户交易和一般人的行为。然而,这类用于事件序列的模型通常孤立地处理每个序列,忽略了那些在时间上同时发生的序列的上下文。这种限制在金融和电子商务等条件快速变化的领域,或当某些序列缺乏近期事件时尤其成问题。我们开发了一种方法,从多个用户表示中聚合信息,在多个同时发生的事件序列的设置中增强特定用户的表示,实现了比独立处理每个序列更好的质量。我们的研究考虑了多种聚合方法,从简单的池化技术到可学习注意力聚合,后者可以突出其他用户之间更复杂的信息流。所提出的方法在现有编码器之上运行,并支持其高效微调。在九个多样化的事件序列数据集(金融、电子商务、娱乐等)和下游任务中,可学习注意力在有无微调的情况下均改善了指标分数,而均值池化虽然增益较小但仍然显著。

英文摘要

Representation learning produces models in different domains, such as store purchases, client transactions, and general people's behavior. However, such models for event sequences usually process each sequence in isolation, ignoring context from those that co-occur in time. This limitation is particularly problematic in domains with fast-evolving conditions, like finance and e-commerce, or when certain sequences lack recent events. We develop a method that aggregates information from multiple user representations, augmenting a specific user's representation in a setting with multiple co-occurring event sequences, achieving better quality than processing each sequence independently. Our study considers diverse aggregation approaches, ranging from simple pooling techniques to Learnable attention aggregation, that can highlight more complex information flow among other users. The proposed methods operate on top of an existing encoder and support its efficient fine-tuning. Across nine diverse event sequence datasets (finance, e-commerce, entertainment, etc.) and downstream tasks, Learnable attention improves metric scores, both with and without fine-tuning, while mean pooling yields a smaller but still significant gain.

2502.07623 2026-05-29 cs.CL

Lexical categories of stem-forming roots in Mapudüngun verb forms

Mapudüngun动词形式中词干形成根的词汇类别

Andrés Chandía

AI总结 本研究验证并修正了Mapuche语言形态分析系统中动词根的词汇类别分类,以改进计算分析器并澄清该语言词汇类别的模糊性。

Comments 36 pages, 2 large tables, 2 sample tables

详情
AI中文摘要

在开发了Mapuche语言的形态分析计算系统,并用不同作者和风格的文本进行评估后,有必要验证作为实现该工具基础的源语言的 linguistic 假设。本文主要关注用于开发形态分析系统的源语言中识别为动词的Mapudüngun根的词汇类别分类。词汇类别修订的结果直接有益于计算分析器,因为一旦验证就会实施。此外,希望这些结果有助于澄清Mapuche语言中关于词汇类别的一些不确定性。本文处理了一项初步任务,以识别真正动词根的配价,其结果将在后续工作中呈现,作为本文的补充。

英文摘要

After developing a computational system for morphological analysis of the Mapuche language, and evaluating it with texts from various authors and styles, it became necessary to verify the linguistic assumptions of the source used as the basis for implementing this tool. In the present work, the primary focus is on the lexical category classification of Mapudüngun roots recognised as verbal in the source utilised for the development of the morphological analysis system. The results of this lexical category revision directly benefit the computational analyser, as they are implemented as soon as they are verified. Additionally, it is hoped that these results will help clarify some uncertainties about lexical categories in the Mapuche language. This work addresses a preliminary task to identify the valency of true verbal roots, the results of which will be presented in a subsequent work that complements this article.

2502.03805 2026-05-29 cs.CL

CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective

CriticalKV: 从输出扰动角度优化 KV 缓存淘汰

Yuan Feng, Junlin Lv, Haoyu Guo, Yukun Cao, S Kevin Zhou, Xike Xie

AI总结 本文通过分析注意力输出扰动,提出一种基于扰动约束的 KV 缓存条目选择算法,显著降低压缩损失。

Comments ICML 2026

详情
AI中文摘要

大型语言模型彻底改变了自然语言处理,但由于 Transformer 架构对自注意力的依赖,特别是长序列推理中的大型 KV 缓存,面临着高存储和运行时成本的重大挑战。最近通过基于注意力权重剪枝不太重要的条目来减小 KV 缓存大小的努力仍然是经验性的,缺乏形式化基础。本文通过分析注意力输出扰动,对识别关键 KV 缓存条目进行了形式化研究。我们的分析表明,除了注意力权重之外,KV 条目中的值状态和预训练参数矩阵也至关重要。基于此,我们提出了一种扰动约束选择算法,该算法优化最坏情况下的输出扰动以识别关键条目。我们证明了我们的算法是一种通用的、即插即用的增强方法,且计算开销可忽略不计。当与三种最先进的缓存淘汰方法集成在三个不同的 LLM 上时,我们的算法在来自 Ruler 和 LongBench 基准测试的 29 个数据集上,平均将压缩损失减少了超过一半。进一步的头部和层级的扰动分析证实了我们有效性背后的原理。这项工作为缓存淘汰提供了新的、形式化的视角,为未来的研究开辟了有希望的途径。代码公开在 https://github.com/FFY0/DefensiveKV。

英文摘要

Large language models have revolutionized natural language processing but face significant challenges of high storage and runtime costs, due to the transformer architecture's reliance on self-attention, particularly the large KV cache for long-sequence inference. Recent efforts to reduce KV cache size by pruning less critical entries based on attention weights remain empirical and lack formal grounding. This paper presents a formal study on identifying critical KV cache entries by analyzing attention output perturbation. Our analysis reveals that, beyond attention weights, the value states within KV entries and pretrained parameter matrices are also crucial. Based on this, we propose a perturbation-constrained selection algorithm that optimizes the worst-case output perturbation to identify critical entries. We demonstrate that our algorithm is a universal, plug-and-play enhancement that incurs negligible computational overhead. When integrated with three state-of-the-art cache eviction methods on three distinct LLMs, our algorithm significantly reduces the compression loss by more than \textit{half} on average across 29 datasets from the Ruler and LongBench benchmarks. Further perturbation analysis, at both the head and layer levels, confirms the principles underlying our effectiveness. This work offers a new, formally grounded perspective to cache eviction , opening promising avenues for future research. The code is publicly available at https://github.com/FFY0/DefensiveKV.

2502.01360 2026-05-29 cs.LG math.AT q-bio.NC

A Quotient Homology Theory of Representation in Neural Networks

神经网络表示的商同调理论

Kosio Beshkov

AI总结 利用ReLU神经网络的分片线性性质,定义输入数据集上的等价关系并构造商空间,证明在凸性条件下神经表示的同调群与商同调群同构,从而无需外部度量即可计算Betti数。

详情
Journal ref
Transactions on Machine Learning Research, 05/2026, https://openreview.net/forum?id=RluspxztzS
AI中文摘要

先前的研究已经证明,使用ReLU激活函数的神经网络所实现的映射集合与分片线性连续映射的集合相同。此外,这类网络诱导一个超平面排列,将网络的输入域分割成凸多面体$G_J$,网络$Φ$在这些多面体上以仿射方式运行。在本文中,我们利用这些性质在输入数据集上定义一个等价关系$\sim_Φ$,该关系定义了一个商空间,该商空间可被分割成两个集合,分别与$Φ_J$的局部秩以及交集$\cap ext{Im}Φ_{J_i}$相关。我们将后者称为 extit{重叠分解}$\mathcal{O}_Φ$,并证明如果每个多面体与输入流形之间的交集是凸的,则神经表示的同调群与商同调群$H_k(Φ(\mathcal{M})) \simeq H_k(\mathcal{M}/\mathcal{O}_Φ)$同构。这使我们能够在不选择外部度量的情况下内在地计算神经表示的Betti数。我们开发了通过线性规划和并查集算法数值计算重叠分解的方法。利用这一框架,我们在玩具数据集上进行了若干实验,表明与标准持续同调相比,基于重叠同调的Betti数计算追踪的是纯拓扑特征而非几何特征。最后,我们研究了几个分类问题中训练过程中重叠分解的演化,并讨论了该方法的一些缺点。

英文摘要

Previous research has proven that the set of maps implemented by neural networks with a ReLU activation function is identical to the set of piecewise linear continuous maps. Furthermore, such networks induce a hyperplane arrangement splitting the input domain of the network into convex polyhedra $G_J$ over which a network $Φ$ operates in an affine manner. In this work, we leverage these properties to define an equivalence relation $\sim_Φ$ on top of an input dataset, which defines a quotient space that can be split into two sets related to the local rank of $Φ_J$ and the intersections $\cap \text{Im}Φ_{J_i}$. We refer to the latter as the \textit{overlap decomposition} $\mathcal{O}_Φ$ and prove that if the intersections between each polyhedron and an input manifold are convex, the homology groups of neural representations are isomorphic to quotient homology groups $H_k(Φ(\mathcal{M})) \simeq H_k(\mathcal{M}/\mathcal{O}_Φ)$. This lets us intrinsically calculate the Betti numbers of neural representations without the choice of an external metric. We develop methods to numerically compute the overlap decomposition through linear programming and a union-find algorithm. Using this framework, we perform several experiments on toy datasets showing that, compared to standard persistent homology, our overlap homology-based computation of Betti numbers tracks purely topological rather than geometric features. Finally, we study the evolution of the overlap decomposition during training on several classification problems and discuss some shortcomings of our method.

2412.00452 2026-05-29 cs.LG cs.CV

Learning Locally, Revising Globally: Global Reviser for Federated Learning with Noisy Labels

局部学习,全局修正:面向含噪标签联邦学习的全局修正器

Yuxin Tian, Mouxing Yang, Yuhao Zhou, Jian Wang, Qing Ye, Tongliang Liu, Gang Niu, Jiancheng Lv

AI总结 针对联邦学习中标签噪声与数据异质性共存的问题,提出一种利用全局模型慢记忆特性的联邦全局修正器(FedGR),通过三个模块协同修正噪声标签并正则化局部训练,在三个基准上优于八种基线方法。

Comments ICML 2026 Camera Ready

详情
AI中文摘要

传统的联邦学习(FL)严重依赖高质量标签,这在实际应用中往往不现实,导致联邦标签噪声(F-LN)问题。更糟糕的是,FL的异质性加剧了F-LN问题,因为客户端经历不同的标签噪声类型、比率和数据分布。在本研究中,我们首先观察到FL的全局模型表现出对噪声标签的缓慢记忆现象,这表明其在FL中能够维持可靠的预测和鲁棒的表示。受此启发,我们提出了一种名为联邦全局修正器(FedGR)的新方法,这是一种直接而有效的方法,包含三个模块,协同修正噪声标签并正则化局部训练。通过利用这一固有属性,FedGR以自包含的方式提高了FL对标签噪声的鲁棒性。在三个广泛使用的F-LN基准上的大量实验表明,即使在严重的标签噪声和数据异质性下,FedGR也表现出优越的性能,始终优于八个最先进的基线。代码:https://github.com/cs-yuxintian/FedGR-ICML26

英文摘要

Conventional federated learning (FL) heavily depends on high-quality labels, which are often impractical in the real world, leading to the federated label-noise (F-LN) problem. Worse still, the F-LN problem is exacerbated by the heterogeneity of FL, whereas clients experience different label-noise types, ratios, and data distribution. In this study, we first observe an intriguing phenomenon that the global model of FL exhibits a slow memorization of noisy labels, suggesting its ability to maintain reliable predictions and robust representations in FL. Motivated by this, we propose a novel method termed Federated Global Reviser (\method), a straightforward yet effective method comprising three modules that collaboratively rectify noisy labels and regularize local training. By exploiting this inherent property, \method\ improves the label-noise robustness of FL in a self-contained manner. Extensive experiments on three widely used F-LN benchmarks demonstrate the superior performance of FedGR, consistently outperforming eight state-of-the-art baselines even in severe label-noise and data heterogeneity. Code: https://github.com/cs-yuxintian/FedGR-ICML26

2410.23222 2026-05-29 cs.LG cs.AI stat.ML

Dataset-Driven Channel Masks in Transformers for Multivariate Time Series

数据集驱动的Transformer通道掩码用于多变量时间序列

Seunghan Lee, Taeyoung Park, Kibok Lee

AI总结 提出部分通道依赖(PCD)概念,通过数据集特定的通道掩码(CMs)改进Transformer中的通道依赖建模,并在多种任务和数据集上验证有效性。

Comments ICASSP 2026. Preliminary version: NeurIPS Workshop on Time Series in the Age of Large Models 2024 (Oral presentation)

详情
AI中文摘要

最近基础模型的进展已成功扩展到时间序列(TS)领域,这得益于大规模TS数据集的出现。然而,先前的努力主要集中于捕获通道依赖(CD),这对于建模多变量时间序列至关重要,并且基于注意力的方法已被广泛用于此目的。尽管如此,这些方法主要关注修改架构,往往忽略了数据集特定特征的重要性。在这项工作中,我们引入了部分通道依赖(PCD)的概念,通过利用数据集特定信息来增强基于Transformer的模型中的CD建模,从而细化模型捕获的CD。为了实现PCD,我们提出了通道掩码(CMs),通过逐元素乘法将其集成到Transformer的注意力矩阵中。CMs由两个组件组成:1)捕获通道之间关系的相似性矩阵,以及2)数据集特定且可学习的领域参数,用于细化相似性矩阵。我们在多种任务和数据集上使用不同的骨干网络验证了PCD的有效性。代码可在此存储库获取:https://github.com/YonseiML/pcd。

英文摘要

Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts have primarily Capturing channel dependency (CD) is essential for modeling multivariate time series (TS), and attention-based methods have been widely employed for this purpose. Nonetheless, these methods primarily focus on modifying the architecture, often neglecting the importance of dataset-specific characteristics. In this work, we introduce the concept of partial channel dependence (PCD) to enhance CD modeling in Transformer-based models by leveraging dataset-specific information to refine the CD captured by the model. To achieve PCD, we propose channel masks (CMs), which are integrated into the attention matrices of Transformers via element-wise multiplication. CMs consist of two components: 1) a similarity matrix that captures relationships between the channels, and 2) dataset-specific and learnable domain parameters that refine the similarity matrix. We validate the effectiveness of PCD across diverse tasks and datasets with various backbones. Code is available at this repository: https://github.com/YonseiML/pcd.

2409.06439 2026-05-29 cs.LG stat.CO stat.ML

Extending Explainable Ensemble Trees (E2Tree) to regression contexts

将可解释集成树(E2Tree)扩展到回归场景

Massimo Aria, Agostino Gnasso, Carmela Iorio, Marjolein Fokkema

AI总结 本文通过引入新的不相似度度量,将可解释集成树方法从分类扩展到回归,并在真实数据集上验证其解释能力。

详情
Journal ref
Applied Stochastic Models in Business and Industry, Vol. 42, No. 1, e70064 (2026)
AI中文摘要

集成方法如随机森林通过聚合多个弱学习器提供了高精度的预测,改变了监督学习的格局。然而,尽管它们有效,这些方法往往缺乏透明度,阻碍了用户理解随机森林模型如何得出预测。可解释集成树(E2Tree)是一种解释随机森林的新方法,提供了响应变量与预测变量之间关系的图形表示。E2Tree的一个显著特点是它不仅考虑预测变量对响应的影响,还通过计算和使用不相似度度量来考虑预测变量之间的关联。E2Tree方法最初是为分类任务提出的。在本文中,我们将该方法扩展到回归场景。为了展示所提算法的解释能力,我们在真实数据集上进行了演示。

英文摘要

Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.

2409.01159 2026-05-29 cs.RO

Remote telepresence over large distances via robot avatars: case studies

通过机器人化身进行远距离远程呈现:案例研究

Mohamed Elobaid, Stefano Dafarra, Ehsan Ranjbari, Giulio Romualdi, Tomohiro Chaki, Tomohiro Kawakami, Takahide Yoshiike, Daniele Pucci

AI总结 本文探讨了如何调整一种新提出的化身系统架构,以适应不同形态的机器人(轮式、腿式及多种手部与运动学结构),在带宽受限条件下实现洲际远程呈现。

详情
AI中文摘要

本文讨论了必要的考虑因素和调整,使得最近提出的化身系统架构能够与不同的机器人化身形态(包括轮式和腿式机器人,具有各种类型的手部和运动学结构)配合使用,以在通信带宽限制下实现远程(洲际)远程呈现。所报告的案例研究涉及使用位置和力矩控制模式的机器人,独立于其软件中间件。

英文摘要

This paper discusses the necessary considerations and adjustments that allow a recently proposed avatar system architecture to be used with different robotic avatar morphologies (both wheeled and legged robots with various types of hands and kinematic structures) for the purpose of enabling remote (intercontinental) telepresence under communication bandwidth restrictions. The case studies reported involve robots using both position and torque control modes, independently of their software middleware.

2409.01144 2026-05-29 cs.RO

Adaptive Non-linear Centroidal MPC with Stability Guarantees for Robust Locomotion of Legged Robots

具有稳定性保证的自适应非线性质心MPC用于腿式机器人鲁棒运动

Mohamed Elobaid, Giulio Turrisi, Lorenzo Rapetti, Giulio Romualdi, Stefano Dafarra, Tomohiro Kawakami, Tomohiro Chaki, Takahide Yoshiike, Claudio Semini, Daniele Pucci

AI总结 通过自适应控制和李雅普诺夫函数重新表述质心MPC控制器,为腿式机器人在未知负载和恒定扰动下提供闭环稳定性与鲁棒性保证。

详情
AI中文摘要

基于简化质心动力学的非线性模型预测运动控制器如今在腿式机器人中无处不在。这些方案即使假设了机器人动力学的固有简化,也被证明能够赋予机器人对微小推力的步态调整能力,此外,在参数不确定(如未知负载)的情况下,它们能够提供一些实用的、尽管有限的鲁棒性。在这项工作中,我们通过重新表述质心MPC控制器,为其闭环稳定性提供了严格的证明。这是通过一种受自适应控制机制启发的系统化程序以及来自控制李雅普诺夫函数的思想实现的。此外,我们的重新表述为一类未测量的恒定扰动提供了鲁棒性。为了展示我们方法的通用性,我们在新一代人形机器人——56.7千克的ergoCub,以及商用21千克四足机器人Aliengo上验证了我们的公式。

英文摘要

Nonlinear model predictive locomotion controllers based on the reduced centroidal dynamics are nowadays ubiquitous in legged robots. These schemes, even if they assume an inherent simplification of the robot's dynamics, were shown to endow robots with a step-adjustment capability in reaction to small pushes, and, moreover, in the case of uncertain parameters - as unknown payloads - they were shown to be able to provide some practical, albeit limited, robustness. In this work, we provide rigorous certificates of their closed loop stability via a reformulation of the centroidal MPC controller. This is achieved thanks to a systematic procedure inspired by the machinery of adaptive control, together with ideas coming from Control Lyapunov functions. Our reformulation, in addition, provides robustness for a class of unmeasured constant disturbances. To demonstrate the generality of our approach, we validated our formulation on a new generation of humanoid robots - the 56.7 kg ergoCub, as well as on a commercially available 21 kg quadruped robot, Aliengo.

2406.10238 2026-05-29 cs.CL cs.LG cs.SI

Early Detection of Misinformation for Infodemic Management: A Domain Adaptation Approach

信息疫情管理中虚假信息的早期检测:一种领域自适应方法

Minjia Mao, Xiaohang Zhao, Xiao Fang

AI总结 针对信息疫情早期缺乏标注数据的问题,提出一种同时处理协变量偏移和概念偏移的领域自适应虚假信息检测方法,在真实数据集上优于现有方法。

详情
AI中文摘要

信息疫情是指在疾病爆发期间传播的大量真实信息和虚假信息。在信息疫情早期检测虚假信息是减少其对公共健康危害的关键。信息疫情早期的特点是存在大量关于某种疾病的未标注信息。因此,传统的虚假信息检测方法不适合此任务,因为它们依赖信息疫情领域的标注信息来训练模型。为解决这一局限,最先进的方法利用其他领域的标注信息来学习模型,以检测信息疫情领域的虚假信息。这些方法的有效性取决于它们缓解信息疫情领域与利用标注信息的领域之间的协变量偏移(即特征分布差异)和概念偏移(即标注模式差异)的能力。然而,这些方法侧重于缓解协变量偏移而忽略了概念偏移,导致其在该任务上效果不佳。为此,我们从理论上证明了同时处理协变量偏移和概念偏移的必要性,以及如何分别实现它们。基于理论分析,我们开发了一种新颖的虚假信息检测方法,同时解决了协变量偏移和概念偏移。使用真实数据集,我们进行了广泛的实证评估,证明我们的方法在性能上优于最先进的虚假信息检测方法以及可适用于该任务的常见领域自适应方法。

英文摘要

An infodemic refers to an enormous amount of true information and misinformation disseminated during a disease outbreak. Detecting misinformation at the early stage of an infodemic is key to reduce its harm to public health. An early stage infodemic is characterized by a large volume of unlabeled information concerning a disease. As a result, conventional misinformation detection methods are not suitable for this misinformation detection task because they rely on labeled information in the infodemic domain to train their models. To address this limitation, state-of-the-art methods learn their models using labeled information in other domains to detect misinformation in the infodemic domain. The efficacy of these methods depends on their ability to mitigate both covariate shift (i.e., differences in feature distributions) and concept shift (i.e., differences in labeling patterns) between the infodemic domain and the domains from which they leverage labeled information. However, these methods focus on mitigating covariate shift but overlook concept shift, rendering them less effective for the task. In response, we theoretically show the necessity of tackling both covariate and concept shifts as well as how to operationalize each of them. Built on the theoretical analysis, we develop a novel misinformation detection method that addresses both covariate and concept shifts. Using real-world datasets, we conduct extensive empirical evaluations to demonstrate the superior performance of our method over state-of-the-art misinformation detection methods as well as prevalent domain adaptation methods that can be tailored to solve the misinformation detection task.

2405.13003 2026-05-29 cs.CL cs.AI cs.IR

A Survey on Recent Advances in Conversational Data Generation

对话数据生成最新进展综述

Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi

AI总结 本文系统综述了多轮对话数据生成方法,涵盖开放域、任务导向和信息检索三类对话系统,提出了包含种子数据创建、话语生成和质量过滤的通用框架,并讨论了评估指标与未来方向。

详情
AI中文摘要

近年来对话系统的进步显著增强了各领域的人机交互。然而,由于专业对话数据的稀缺,训练这些系统面临挑战。传统上,对话数据集通过众包创建,但该方法成本高、规模有限且劳动密集。作为解决方案,合成对话数据的开发应运而生,利用技术增强现有数据集或将文本资源转换为对话格式,提供了一种更高效且可扩展的数据集创建方法。在本综述中,我们系统全面地回顾了多轮对话数据生成,重点关注三类对话系统:开放域、任务导向和信息检索。我们根据种子数据创建、话语生成和质量过滤方法等关键组件对现有研究进行分类,并引入了一个概述对话数据生成系统主要原则的通用框架。此外,我们考察了评估合成对话数据的指标和方法,探讨了当前领域的挑战,并探索了未来研究的潜在方向。我们的目标是通过概述最先进的方法并强调该领域进一步研究的机会,加速研究人员和从业者的进展。

英文摘要

Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.

2403.09441 2026-05-29 cs.LG

An Empirical Study of the Influence of Adversarial Fine-Tuning on Compressed Neural Networks

对抗性微调对压缩神经网络影响的实证研究

Hallgrimur Thorsteinsson, Valdemar J Henriksen, Daniel I R Cruz, Raghavendra Selvan, Tong Chen

AI总结 通过实验研究压缩模型的对抗性微调,发现其能显著提升鲁棒性,并在计算效率与鲁棒性之间取得平衡。

Comments 23 pages, 4 figures, 9 tables. Accepted to The 15th Scandinavian Conference on Artificial Intelligence (SCAI)

详情
AI中文摘要

随着深度学习模型日益融入日常生活,通过使其抵御对抗性攻击来确保安全性变得至关重要。研究发现,通过引入微小、有针对性的扰动来干扰输入数据,深度学习模型容易受到对抗性攻击。对抗性训练作为一种缓解策略,可以产生更鲁棒的模型。然而,这种对抗鲁棒性伴随着训练过程中设计对抗性攻击所需的额外计算成本。因此,对抗鲁棒性和计算效率这两个目标似乎相互冲突。在这项工作中,我们探讨了神经网络压缩对对抗鲁棒性的影响。我们特别研究了微调对压缩模型的影响,并展示了标准微调与对抗性微调之间的权衡。我们的结果表明,对压缩模型进行对抗性微调可以大幅提升其鲁棒性性能。我们在多个基准数据集上进行了实验,表明压缩模型的对抗性微调可以达到与对抗性训练模型相当的鲁棒性性能,同时提高计算效率。源代码可在此处获取:https://github.com/saintslab/Adver-Fine。

英文摘要

As deep learning (DL) models are increasingly being integrated into our everyday lives, ensuring their safety by making them robust against adversarial attacks has become increasingly critical. DL models have been found to be susceptible to adversarial attacks by introducing small, targeted perturbations to disrupt the input data. Adversarial training has been presented as a mitigation strategy that can result in more robust models. This adversarial robustness comes with additional computational costs required to design adversarial attacks during training. The two objectives -- adversarial robustness and computational efficiency -- then appear to be in conflict with each other. In this work, we explore the effects of neural network compression on adversarial robustness. We specifically explore the effects of fine-tuning on compressed models, and present the trade-off between standard fine-tuning and adversarial fine-tuning. Our results show that adversarial fine-tuning of compressed models can yield large improvements to their robustness performance. We present experiments on several benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency. Source code is available here: https://github.com/saintslab/Adver-Fine.

2401.08197 2026-05-29 cs.LG cs.IT eess.SP math.IT

Matrix Completion with Hypergraphs:Sharp Thresholds and Efficient Algorithms

超图矩阵补全:尖锐阈值与高效算法

Zhongtian Ma, Qiaosheng Zhang, Zhen Wang

AI总结 本文研究基于子采样矩阵条目以及观测到的社交图和超图来补全评分矩阵的问题,证明了存在一个关于采样概率的尖锐阈值,并开发了一种高效算法,该算法在采样概率超过阈值时以高概率成功,且超图能有效降低所需采样概率。

Comments Accepted to LOG24

详情
AI中文摘要

本文考虑基于子采样矩阵条目以及观测到的社交图和超图来补全评分矩阵的问题。我们证明,对于精确补全评分矩阵的任务,存在一个关于采样概率的尖锐阈值——当采样概率高于阈值时任务可实现,否则不可能——展示了相变现象。该阈值可以表示为超图“质量”的函数,从而能够量化利用超图所带来的采样概率减少量。这也凸显了超图在矩阵补全问题中的有用性。在发现尖锐阈值的过程中,我们开发了一种计算高效的矩阵补全算法,该算法有效利用了观测到的图和超图。理论分析表明,只要采样概率超过上述阈值,我们的算法就以高概率成功,这一理论结果通过合成实验得到进一步验证。此外,我们在真实社交网络数据集(包含图和超图)上的实验表明,我们的算法优于其他最先进的矩阵补全算法。

英文摘要

This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrating a phase transition phenomenon. The threshold can be expressed as a function of the ``quality'' of hypergraphs, enabling us to \emph{quantify} the amount of reduction in sample probability due to the exploitation of hypergraphs. This also highlights the usefulness of hypergraphs in the matrix completion problem. En route to discovering the sharp threshold, we develop a computationally efficient matrix completion algorithm that effectively exploits the observed graphs and hypergraphs. Theoretical analyses show that our algorithm succeeds with high probability as long as the sample probability exceeds the aforementioned threshold, and this theoretical result is further validated by synthetic experiments. Moreover, our experiments on a real social network dataset (with both graphs and hypergraphs) show that our algorithm outperforms other state-of-the-art matrix completion algorithms.

2305.10917 2026-05-29 cs.RO

Online Non-linear Centroidal MPC for Humanoid Robots Payload Carrying with Contact-Stable Force Parametrization

面向人形机器人负重任务的在线非线性质心模型预测控制与接触稳定力参数化

Mohamed Elobaid, Giulio Romualdi, Gabriele Nava, Lorenzo Rapetti, Hosameldin Awadalla Omer Mohamed, Daniele Pucci

AI总结 针对人形机器人负重行走问题,提出结合在线非线性质心模型预测控制与接触稳定力参数化的方法,实现给定脚步轨迹的跟踪。

详情
AI中文摘要

本文考虑了一个问题:允许受到持续干扰(以负重任务形式)的人形机器人遵循给定的规划脚步。为解决此问题,我们结合了在线非线性质心模型预测控制器(MPC)与接触稳定力参数化。MPC的成本函数增加了处理干扰和正则化参数的项。通过仿真和人形机器人iCub上的实验验证了所提出控制器的性能。最后,简要研究了使用参数化对控制器计算时间的影响。

英文摘要

In this paper we consider the problem of allowing a humanoid robot that is subject to a persistent disturbance, in the form of a payload-carrying task, to follow given planned footsteps. To solve this problem, we combine an online nonlinear centroidal Model Predictive Controller - MPC with a contact stable force parametrization. The cost function of the MPC is augmented with terms handling the disturbance and regularizing the parameter. The performance of the resulting controller is validated both in simulations and on the humanoid robot iCub. Finally, the effect of using the parametrization on the computational time of the controller is briefly studied.

2205.04297 2026-05-29 cs.RO cs.AI

Learning A Simulation-based Visual Policy for Real-world Peg In Unseen Holes

基于学习的视觉策略用于真实世界中未见过孔洞的插拔

Liang Xie, Hongxiang Yu, Kechun Xu, Tong Yang, Minhang Wang, Haojian Lu, Rong Xiong, Yue Wang

AI总结 提出一种基于学习的视觉插拔方法,通过解耦感知与策略模块,在仿真中训练多种形状,并仅需少量仿真到现实迁移成本即可适应真实世界中任意未见形状。

详情
AI中文摘要

本文提出一种基于学习的视觉插拔方法,能够在仿真中训练多种形状,并在真实世界中以最小的仿真到现实迁移成本适应任意未见形状。核心思想是将感知-运动策略的泛化解耦为快速适应的感知模块和仿真通用策略模块的设计。框架包括分割网络(SN)、虚拟传感器网络(VSN)和控制器网络(CN)。具体地,VSN被训练用于从分割图像中测量未见形状的位姿。然后,给定与形状无关的位姿测量,CN被训练以实现通用插拔。最后,当应用于真实未见孔洞时,我们只需微调仿真VSN+CN所需的分割网络。为进一步最小化迁移成本,我们提出在一分钟人工教学后自动收集和标注分割网络的数据。展示了在眼在外/眼在手配置下的仿真和真实世界结果。采用所提策略的电动汽车充电系统在2-3秒内实现了10/10的成功率,仅使用数百个自动标注样本进行分割网络迁移。

英文摘要

This paper proposes a learning-based visual peg-in-hole that enables training with several shapes in simulation, and adapting to arbitrary unseen shapes in real world with minimal sim-to-real cost. The core idea is to decouple the generalization of the sensory-motor policy to the design of a fast-adaptable perception module and a simulated generic policy module. The framework consists of a segmentation network (SN), a virtual sensor network (VSN), and a controller network (CN). Concretely, the VSN is trained to measure the pose of the unseen shape from a segmented image. After that, given the shape-agnostic pose measurement, the CN is trained to achieve generic peg-in-hole. Finally, when applying to real unseen holes, we only have to fine-tune the SN required by the simulated VSN+CN. To further minimize the transfer cost, we propose to automatically collect and annotate the data for the SN after one-minute human teaching. Simulated and real-world results are presented under the configurations of eye-to/in-hand. An electric vehicle charging system with the proposed policy inside achieves a 10/10 success rate in 2-3s, using only hundreds of auto-labeled samples for the SN transfer.

2605.29582 2026-05-29 cs.LG cs.CL

PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning

PEARL: 使用教学对齐强化学习训练苏格拉底式导师

Qikai Chang, Zhenrong Zhang, Linbo Chen, Pengfei Hu, Jianshu Zhang, Youhui Guo, Jun Du

AI总结 提出PEARL框架,通过可控学生模拟器、生成式奖励模型和稳定多目标强化学习,训练苏格拉底式教学代理,在多个基准上达到开源模型最佳性能并与专有模型竞争。

Comments 16 pages, 7 figures

详情
AI中文摘要

大型语言模型(LLM)在教育辅导方面展现出潜力,但有效的辅导不仅仅是解决问题:它必须提供渐进的苏格拉底式引导,并在多轮交互中平衡多个教学目标。然而,由于学生模拟的保真度有限且可控性弱、教学奖励建模不明确以及多目标优化不稳定,训练这样的导师仍然具有挑战性。为克服这些限制,我们提出了PEARL,一个教学对齐的强化学习框架,用于训练苏格拉底式教学代理,包含三个关键组件。首先,我们引入了一个可控的学生模拟器,将潜在认知状态与响应生成解耦,以模拟多样的能力和误解。其次,我们开发了一个生成式奖励模型,联合评估教学质量和目标正确性以进行策略优化。最后,我们提出了一种稳定的多目标强化学习方案,在每个维度内离散化奖励并跨维度聚合归一化优势,防止高方差目标主导更新。在多个基准上的实验表明,尽管仅使用30B策略模型,PEARL在开源模型中取得了最佳性能,并与领先的专有LLM保持竞争力。

英文摘要

Large Language Models (LLMs) have shown promise as educational tutors, yet effective tutoring requires more than solving problems: it must provide progressive Socratic guidance and balance multiple pedagogical objectives across multi-turn interactions. However, training such tutors remains challenging due to limited-fidelity and weakly controllable student simulation, under-specified pedagogical reward modeling, and unstable multi-objective optimization. To overcome these limitations, we propose PEARL, a pedagogically aligned reinforcement learning framework for training Socratic tutoring agents, consisting of three key components. First, we introduce a controllable student simulator that decouples latent cognitive states from response generation to model diverse abilities and misconceptions. Second, we develop a generative reward model that jointly evaluates pedagogical quality and objective correctness for policy optimization. Finally, we propose a stable multi-objective RL scheme that discretizes rewards within each dimension and aggregates normalized advantages across dimensions, preventing high-variance objectives from dominating updates. Experiments on multiple benchmarks show that PEARL achieves the best performance among open-source models and remains competitive with leading proprietary LLMs, despite using only a 30B policy model.

2605.29580 2026-05-29 cs.LG stat.ML

On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference

基于LoRA的贝叶斯推理中低损失谷的构造与启示

Daniel Dold, Emanuel Sommer, Julius Kobialka, Oliver Dürr, David Rügamer

AI总结 本文提出LoRA-Curve方法,通过分段贝塞尔曲线参数化在LoRA空间中连接独立最优解,形成连续低损失谷,并结合平坦极小扰动和JS散度正则化,在不牺牲性能的前提下提高预测分布的互信息,实现功能多样性。

详情
AI中文摘要

虽然低秩适应(LoRA)等参数高效微调方法已成为大型语言模型的标准方法,但对认知不确定性的原则性估计仍然具有挑战性。最近在LoRA机制下的结果表明,深度集成等离散多模态方法相比单模态方法几乎没有优势。这与深度学习中的更广泛观察相矛盾,在深度学习中,集成独立最优解通常能改善泛化,而通过连续低损失谷连接这些模态能进一步增强贝叶斯模型平均(BMA)。LoRA空间中是否存在这种结构,以及它是否能产生局部或离散方法所遗漏的功能多样性,尚未被研究。我们引入了LoRA-Curve,一种在LoRA空间中的分段贝塞尔曲线参数化,包含两种变体:一种自由配置,联合优化所有控制点;另一种锚定配置,连接独立微调的LoRA最优解。我们证明了损失沿曲线的路径连续性和Lipschitz正则性,并通过Qwen2.5 7B在推理和分类基准上的实验表明,线性插值会遇到损失障碍,而我们的锚定多段曲线通过连续低损失谷连接独立最优解。结合平坦极小扰动和詹森-香农散度正则化,LoRA-Curve在不牺牲性能的情况下,可测量地提高了预测分布的互信息,并将连续参数空间遍历与功能多样性联系起来。

英文摘要

While parameter-efficient fine-tuning methods like low-rank adaptation (LoRA) are standard for large language models, principled estimation of epistemic uncertainty remains challenging. Recent results in the LoRA regime suggest that discrete multi-mode approaches such as deep ensembles offer little benefit over single-mode methods. This contradicts broader observations in deep learning, where ensembling independent optima typically improves generalization, and linking these modes through continuous low-loss valleys further enhances Bayesian model averaging (BMA). Whether such structure exists in the LoRA space and whether it yields functional diversity missed by local or discrete methods has not been studied. We introduce LoRA-Curve, a segmented Bézier curve parameterization in the LoRA space, with two variants: a free configuration that jointly optimizes all control points, and an anchored configuration that connects independently fine-tuned LoRA optima. We prove pathwise continuity and Lipschitz regularity of the loss along the curve and empirically show, across reasoning and classification benchmarks with Qwen2.5 7B, that linear interpolation encounters loss barriers, while our anchored multi-segment curves connect independent optima through continuous low-loss valleys. Combined with flat-minima perturbations and a Jensen-Shannon divergence regularizer, LoRA-Curve yields measurably higher mutual information of the predictive distribution without sacrificing performance, and links continuous parameter-space traversal to functional diversity.

2605.29579 2026-05-29 cs.CV

ReactBench: A Cause-Driven Benchmark for Multimodal Hallucination via Systematic Evaluation

ReactBench:通过系统评估的多模态幻觉因果驱动基准

Shizhe Zhou, Bohan Jia, Kai Wu, Yan Shen, Tongyun Li, Yuyang Wu, Shaohui Lin

AI总结 提出ReactBench基准,通过对抗性图像和诱导幻觉的查询,系统评估多模态大模型在关系擦除、反事实属性、变化追踪和密集计数等任务中的因果幻觉。

详情
AI中文摘要

尽管多模态大语言模型(MLLMs)在视觉-语言理解方面取得了快速进展,但它们仍然容易产生多模态幻觉,即生成与视觉输入不一致的响应。现有基准主要侧重于检测幻觉结果,而非评估这些失败的潜在原因。此外,许多基准依赖于简单的场景和有限的评估格式,不再能挑战最先进的模型。为了解决这些局限性,我们引入了ReactBench,一个因果驱动的幻觉基准,具有多个任务和考试式评估格式。通过生成对抗性图像和诱导幻觉的查询,ReactBench引入了四个目标任务:关系擦除、反事实属性、变化追踪和密集计数。这些任务系统地暴露了共现偏差、语言先验、跨图像比较感知缺陷和细粒度感知瓶颈。除了基于标准准确率的评估外,我们利用思维链推理来识别每个任务中幻觉的细粒度子原因。大量评估表明,当前的MLLMs仍然容易受到特定因果幻觉触发因素的影响,这证明了ReactBench作为诊断和提高多模态模型鲁棒性的系统化和可解释测试平台的价值。项目页面见https://reactbench.github.io/。

英文摘要

While multimodal large language models (MLLMs) have achieved rapid progress in vision-language understanding, they remain prone to multimodal hallucinations, producing responses that are inconsistent with the visual input. Existing benchmarks predominantly focus on detecting hallucination outcomes rather than evaluating the underlying causes of these failures. Moreover, many benchmarks rely on simplistic scenarios and limited evaluation formats that no longer challenge state-of-the-art models. To address these limitations, we introduce ReactBench, a cause-driven hallucination benchmark featuring multiple tasks and an exam-style evaluation format. By generating adversarial images and hallucination-inducing queries, ReactBench introduces four targeted tasks: Relational Erasure, Counterfactual Attribute, Alteration Tracing, and Dense Counting. These tasks systematically expose co-occurrence bias, language priors, cross-image comparative perception deficiencies, and fine-grained perceptual bottlenecks. Beyond standard accuracy-based evaluation, we leverage Chain-of-Thought reasoning to identify fine-grained sub-causes of hallucination within each task. Extensive evaluations reveal that current MLLMs remain notably vulnerable to cause-specific hallucination triggers, demonstrating the value of ReactBench as a systematic and interpretable testbed for diagnosing and improving multimodal model robustness. The project page is available at https://reactbench.github.io/.

2605.29578 2026-05-29 cs.AI

GPS-Enhanced Tourist Mobility Modeling with Seasonal Spatial Priors and LLM-Based Activity Chain Generation

基于季节性空间先验和LLM活动链生成的GPS增强游客移动性建模

Yifan Liu, Yanling Sang, Xishun Liao, Morgan Sun, Bo Yang, Zhiyuan Zhang, Chris Stanford, Haoxuan Ma, Jiaqi Ma

AI总结 提出一种四阶段仿真框架,结合GPS和调查数据推导的月份条件空间先验、游客人口统计信息、距离可行区域序列分配以及基于LLM的活动链生成,以解决游客移动性建模中非例行、吸引驱动且对旅行目的、季节和成员组成高度敏感的问题。

详情
AI中文摘要

游客移动性对城市交通规划提出了独特挑战。与居民通勤不同,游客旅行大多是非例行的、由景点驱动的,并且对旅行目的、旅行季节和旅行成员组成高度敏感。现有方法要么测量聚合的游客空间模式而不生成个人行程,要么合成移动性而不考虑游客特定结构,如旅行持续时间条件、月份变化的景点需求以及家庭共同旅行规则。为了解决这些挑战,我们提出了一个四阶段仿真框架,结合了从GPS和调查数据推导的月份条件空间先验、基于游客人口统计的旅行范围预测、距离可行的区域序列分配,以及在家庭和空间约束下基于LLM的活动链生成。GPS数据仅以隐私保护的聚合形式用作月份条件空间先验,不保留或暴露任何个人轨迹。在东京旅游上的实验表明,基于GPS的游客群体提取恢复了与调查参考一致的空间访问特征,我们的框架生成了人口统计对齐的合成行程,其区域级访问份额与调查分布和停留点导出的月度访问模式紧密对齐。结果证明了该框架作为地理基础、人口统计感知的游客移动性建模方法的有效性。

英文摘要

Tourist mobility poses a distinct challenge for urban transportation planning. Unlike resident commuting, tourist travel is largely non-routine, attraction driven, and highly sensitive to trip purpose, travel season, and trip member composition. Existing approaches either measure aggregate tourist spatial patterns without generating individual schedules, or synthesize mobility without tourist specific structure such as trip duration conditioning, month varying attraction demand, and household co-travel rules. To address these challenges, we propose a four stage simulation framework combining month conditioned spatial priors derived from GPS and survey data, trip extent prediction from tourist demographics, distance feasible ward sequence assignment, and LLM-based activity chain generation under household and spatial constraints. GPS data are used only in privacy preserving aggregated form as month conditioned spatial priors, with no individual traces retained or exposed. Experiments on tourism in Tokyo demonstrate that the GPS based tourist cohort extraction recovers spatial visitation signatures consistent with survey references, and our framework produces demographically aligned synthetic schedules whose ward-level visitation shares align closely with both survey distributions and staypoint derived monthly visitation patterns. The results demonstrate the framework's effectiveness as a geographically grounded, demographically aware approach to tourist mobility modeling.