arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2042
2503.23300 2026-06-05 cs.CV cs.RO

Learning Predictive Visuomotor Coordination

学习预测性视觉-运动协调

Wenqi Jia, Bolin Lai, Miao Liu, Danfei Xu, James M. Rehg

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Georgia Tech(佐治亚理工学院) Meta AI

AI总结 本文提出了一种基于预测的视觉-运动协调建模任务,通过结合第一人称视觉和运动学观测预测头部姿态、目光方向和上半身运动,展示了多模态整合在理解视觉-运动协调中的重要性。

Comments CVPR 2026 Findings

详情
AI中文摘要

理解并预测人类视觉-运动协调对于机器人学、人机交互和辅助技术的应用至关重要。本文介绍了一种基于预测的视觉-运动协调建模任务,目标是从第一人称视觉和运动学观测中预测头部姿态、目光方向和上半身运动。我们提出了一种视觉-运动协调表示(VCR),学习这些多模态信号之间的结构时间依赖性。我们扩展了基于扩散的运动建模框架,整合了第一人称视觉和运动学序列,实现了时间一致且准确的视觉-运动预测。我们的方法在大规模EgoExo4D数据集上进行了评估,展示了在多样化现实活动中的强大泛化能力。我们的结果强调了多模态整合在理解视觉-运动协调中的重要性,为视觉-运动学习和人类行为建模的研究做出了贡献。项目页面:https://vjwq.github.io/VCR/.

英文摘要

Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling. Project Page: https://vjwq.github.io/VCR/.

2411.18343 2026-06-05 cs.LG cs.AI

Comprehensive and Reliable Feature Attribution for Diverse Modalities and Models via Frequency-Domain Insights

通过频域见解实现多样化模态和模型的全面可靠特征归因

Zechen Liu, Feiyang Zhang, Wei Song, Xiang Li, Wei Wei

发表机构 * School of Computational Science, Wuhan University(武汉大学计算科学学院) Brain Research Center, Wuhan University(武汉大学脑科学研究中心) College of Information Science and Technology (School of Cyber Science and Technology), Shihezi University(石河子大学信息科学学院(网络安全科学与技术学院)) Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security Open Fund(新疆生产建设兵团计算智能与网络信息安全重点实验室开放基金)

AI总结 本文提出了一种新的可解释性方法FreqX,结合信号处理和信息理论,以解决个性化联邦学习中非IID数据、异构设备、缺乏公平性和贡献不明确等问题,通过频域分析提高解释性效率和准确性。

Comments 16pages, 9 figures

详情
AI中文摘要

个性化联邦学习(PFL)允许客户端在不披露其私有数据集的情况下协作训练个性化模型。然而,PFL面临非IID、异构设备、缺乏公平性和贡献不明确等挑战,亟需深度学习模型的可解释性来克服这些问题。这些挑战提出了新的可解释性需求,包括低成本、隐私性和详细信息。目前没有现有的可解释性方法能满足这些需求。在本文中,我们提出了一种新的可解释性方法FreqX,通过引入信号处理和信息理论。我们的实验表明,FreqX的解释结果包含属性信息和概念信息。FreqX的运行速度至少比包含概念信息的基线方法快10倍。

英文摘要

Personalized Federal learning(PFL) allows clients to cooperatively train a personalized model without disclosing their private dataset. However, PFL suffers from Non-IID, heterogeneous devices, lack of fairness, and unclear contribution which urgently need the interpretability of deep learning model to overcome these challenges. These challenges proposed new demands for interpretability. Low cost, privacy, and detailed information. There is no current interpretability method satisfying them. In this paper, we propose a novel interpretability method \emph{FreqX} by introducing Signal Processing and Information Theory. Our experiments show that the explanation results of FreqX contain both attribution information and concept information. FreqX runs at least 10 times faster than the baselines which contain concept information.

2503.14295 2026-06-05 cs.CV cs.AI

PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation

PC-Talk: 用于音频驱动说话面部生成的精确面部动画控制

Baiqin Wang, Xiangyu Zhu, Fan Shen, Hao Xu, Zhen Lei

发表机构 * MAIS, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所MAIS部) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Psyche AI.INC(Psyche AI公司) HKUST(香港科技大学) CAIR, HKISI, Chinese Academy of Sciences(中国科学院计算智能研究所) SCSE, FIE, M.U.S.T(M.U.S.T的SCSE、FIE部门)

AI总结 本文针对音频驱动说话面部生成中面部动画控制不足的问题,提出PC-Talk框架,通过改进唇音对齐和情感控制来提升生成视频的多样性和用户友好性。

Comments 10 Pages, 6 figures. Accepted in CVPR2026

详情
AI中文摘要

近年来,音频驱动说话面部生成在唇同步方面取得了显著进展。然而,当前方法往往缺乏对面部动画(如说话风格和情绪表达)的充分控制,导致输出结果单一。本文聚焦于改进两个关键因素:唇音对齐和情感控制,以增强说话视频的多样性和易用性。唇音对齐控制关注说话风格和唇部运动幅度等元素,而情感控制则专注于生成逼真的情绪表达,允许对强度等多属性进行修改。为实现精确的面部动画控制,我们提出了一种新的框架PC-Talk,通过隐式关键点变形实现唇音对齐和情感控制。首先,我们的唇音对齐控制模块实现了对说话风格的精确编辑,并调整唇部运动幅度以模拟不同语音音量水平,保持与音频的同步。其次,我们的情感控制模块生成生动的情绪面部特征,通过纯粹的情绪变形实现。该模块还允许对强度进行精细修改,并在不同面部区域组合多种情绪。我们的方法在广泛的实验中展示了出色的控制能力,并在HDTF和MEAD数据集上取得了最先进的性能。

英文摘要

Recent advancements in audio-driven talking face generation have made great progress in lip synchronization. However, current methods often lack sufficient control over facial animation such as speaking style and emotional expression, resulting in uniform outputs. In this paper, we focus on improving two key factors: lip-audio alignment and emotion control, to enhance the diversity and user-friendliness of talking videos. Lip-audio alignment control focuses on elements like speaking style and the scale of lip movements, whereas emotion control is centered on generating realistic emotional expressions, allowing for modifications in multiple attributes such as intensity. To achieve precise control of facial animation, we propose a novel framework, PC-Talk, which enables lip-audio alignment and emotion control through implicit keypoint deformations. First, our lip-audio alignment control module facilitates precise editing of speaking styles at the word level and adjusts lip movement scales to simulate varying vocal loudness levels, maintaining lip synchronization with the audio. Second, our emotion control module generates vivid emotional facial features with pure emotional deformation. This module also enables the fine modification of intensity and the combination of multiple emotions across different facial regions. Our method demonstrates outstanding control capabilities and achieves state-of-the-art performance on both HDTF and MEAD datasets in extensive experiments.

2503.11910 2026-06-05 cs.LG cs.AI math.AT math.SG

RTD-Lite: Scalable Topological Analysis for Comparing Weighted Graphs in Learning Tasks

RTD-Lite:用于学习任务中比较加权图拓扑结构的可扩展分析

Eduard Tulchinskii, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov

发表机构 * Skoltech, AI Foundation and Algorithm Lab(斯克里普丘尔技术学院,人工智能基础与算法实验室) Skoltech, AIRI(斯克里普丘尔技术学院,人工智能研究机构) Skoltech, CNRS(斯克里普丘尔技术学院,法国国家科学研究中心)

AI总结 本文提出RTD-Lite算法,通过最小生成树辅助图在O(n²)时间内高效比较加权图的拓扑特征,适用于降维和神经网络训练等任务,实验表明其在识别拓扑差异和减少计算时间方面优于现有方法。

Comments Accepted for AISTATS 2025

详情
AI中文摘要

用于比较加权图的拓扑方法在各种学习任务中具有价值,但通常在大规模数据集上计算效率低下。我们介绍了RTD-Lite,一种可扩展算法,能够高效比较两个具有顶点一一对应关系的加权图的拓扑特征,特别是任意尺度下的连通性或聚类结构。通过辅助图的最小生成树,RTD-Lite以O(n²)的时间和内存复杂度捕捉拓扑差异。这种效率使其适用于降维和神经网络训练等任务。在合成和现实数据集上的实验表明,RTD-Lite能够有效识别拓扑差异,同时显著减少计算时间,相较于现有方法。此外,将RTD-Lite作为损失函数组件整合到神经网络训练中,可以增强学习表示中的拓扑结构保持。我们的代码在https://github.com/ArGintum/RTD-Lite上公开可用。

英文摘要

Topological methods for comparing weighted graphs are valuable in various learning tasks but often suffer from computational inefficiency on large datasets. We introduce RTD-Lite, a scalable algorithm that efficiently compares topological features, specifically connectivity or cluster structures at arbitrary scales, of two weighted graphs with one-to-one correspondence between vertices. Using minimal spanning trees in auxiliary graphs, RTD-Lite captures topological discrepancies with $O(n^2)$ time and memory complexity. This efficiency enables its application in tasks like dimensionality reduction and neural network training. Experiments on synthetic and real-world datasets demonstrate that RTD-Lite effectively identifies topological differences while significantly reducing computation time compared to existing methods. Moreover, integrating RTD-Lite into neural network training as a loss function component enhances the preservation of topological structures in learned representations. Our code is publicly available at https://github.com/ArGintum/RTD-Lite

2409.13607 2026-06-05 cs.RO

RECON: Reducing Causal Confusion with Human-Placed Markers

RECON: 通过人类放置的标记减少因果混淆

Robert Ramirez Sanchez, Heramb Nemlekar, Shahabedin Sagheb, Cara M. Nunez, Dylan P. Losey

发表机构 * Collaborative Robotics Lab ( Collab ), Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061(协作机器人实验室(Collab),机械工程系,弗吉尼亚理工学院,布莱克斯堡,VA 24061) Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853(西伯利机械与航空航天工程学院,康奈尔大学,伊萨卡,NY 14853)

AI总结 该研究提出RECON框架,通过人类主动标记任务关键部分来减少机器人学习中的因果混淆,利用标记物数据训练任务相关状态嵌入,从而提高学习效率。

Comments 7 pages, 5 figures

详情
AI中文摘要

模仿学习使机器人能够从人类示例中学习新任务。然而,从人类学习时的一个根本限制是因果混淆。因果混淆发生在机器人观察到的任务相关和无关信息同时存在时:例如,机器人的摄像头可能不仅看到目标,还看到环境中的杂物和光照变化。由于机器人事先不知道哪些观察方面是重要的,它经常误解人类的例子,无法学习所需任务。为了解决这个问题,我们指出——尽管机器人学习者可能不知道该关注什么,但人类教师知道。在本文中,我们提出人类应主动用小型轻量的标记物标记任务关键部分。在我们的框架(RECON)中,人类在提供演示前将这些标记物附着在任务相关对象上:当人类展示任务示例时,标记物跟踪标记对象的位置。我们随后利用这些离线标记数据来训练任务相关状态嵌入。具体来说,我们将机器人的观察嵌入到一个与测量标记读数相关的潜在状态中:在实践中,这使机器人能够自动过滤掉无关观察,并基于从标记数据中学习的特征做出决策。我们的模拟和一个真实机器人实验表明,这种人类放置标记的框架可以缓解因果混淆。确实,我们发现使用RECON显著减少了传达任务所需的演示次数,从而降低人类教学的总体时间。见此处视频:https://youtu.be/oy85xJvtLSU

英文摘要

Imitation learning enables robots to learn new tasks from human examples. One fundamental limitation while learning from humans is causal confusion. Causal confusion occurs when the robot's observations include both task-relevant and extraneous information: for instance, a robot's camera might see not only the intended goal, but also clutter and changes in lighting within its environment. Because the robot does not know which aspects of its observations are important a priori, it often misinterprets the human's examples and fails to learn the desired task. To address this issue, we highlight that -- while the robot learner may not know what to focus on -- the human teacher does. In this paper we propose that the human proactively marks key parts of their task with small, lightweight beacons. Under our framework (RECON) the human attaches these beacons to task-relevant objects before providing demonstrations: as the human shows examples of the task, beacons track the position of marked objects. We then harness this offline beacon data to train a task-relevant state embedding. Specifically, we embed the robot's observations to a latent state that is correlated with the measured beacon readings: in practice, this causes the robot to autonomously filter out extraneous observations and make decisions based on features learned from the beacon data. Our simulations and a real robot experiment suggest that this framework for human-placed beacons mitigates causal confusion. Indeed, we find that using RECON significantly reduces the number of demonstrations needed to convey the task, lowering the overall time required for human teaching. See videos here: https://youtu.be/oy85xJvtLSU

2502.20914 2026-06-05 cs.LG cs.AI cs.CL

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

Maxime Méloux, Silviu Maniu, François Portet, Maxime Peyrard

发表机构 * Université Grenoble Alpes, CNRS, Grenoble INP, LIG(格勒诺布尔阿尔卑斯大学、国家科学研究中心、格勒诺布尔INP、实验室LIG)

AI总结 本文探讨了在机械可解释性(MI)框架下,给定行为是否具有唯一解释的问题,通过统计可识别性理论分析了MI解释的可识别性,并提出了两种主要策略及实验结果。

详情
Journal ref
The Thirteenth International Conference on Learning Representations (ICLR 2025)
AI中文摘要

随着AI系统应用于高风险领域,确保可解释性至关重要。机械可解释性(MI)旨在通过提取人类可理解的算法来解释神经网络的行为。本文探讨了一个关键问题:在给定行为下,根据MI的标准,是否存在唯一的解释?借鉴统计学中的可识别性,其中参数在特定假设下可以唯一推断,我们探索了MI解释的可识别性。我们识别出两种主要的MI策略:(1)“where-then-what”,通过隔离复制模型行为的电路并在之后解释它;(2)“what-then-where”,从候选算法开始,通过因果对齐搜索实现它们的神经激活子空间。我们对布尔函数和小型多层感知机测试了这两种策略,完全枚举了候选解释。实验揭示了系统性的不可识别性:多个电路可以复制行为,一个电路可以有多种解释,多个算法可以与网络对齐,一个算法可以与不同的子空间对齐。是否需要唯一性?一种务实的方法可能只需要预测性和可操作性标准。如果唯一性对理解至关重要,可能需要更严格的条件。我们还参考了内部可解释性框架,该框架通过多种标准验证解释。本文为定义AI中的解释标准做出了贡献。

英文摘要

As AI systems are used in high-stakes applications, ensuring interpretability is crucial. Mechanistic Interpretability (MI) aims to reverse-engineer neural networks by extracting human-understandable algorithms to explain their behavior. This work examines a key question: for a given behavior, and under MI's criteria, does a unique explanation exist? Drawing on identifiability in statistics, where parameters are uniquely inferred under specific assumptions, we explore the identifiability of MI explanations. We identify two main MI strategies: (1) "where-then-what," which isolates a circuit replicating model behavior before interpreting it, and (2) "what-then-where," which starts with candidate algorithms and searches for neural activation subspaces implementing them, using causal alignment. We test both strategies on Boolean functions and small multi-layer perceptrons, fully enumerating candidate explanations. Our experiments reveal systematic non-identifiability: multiple circuits can replicate behavior, a circuit can have multiple interpretations, several algorithms can align with the network, and one algorithm can align with different subspaces. Is uniqueness necessary? A pragmatic approach may require only predictive and manipulability standards. If uniqueness is essential for understanding, stricter criteria may be needed. We also reference the inner interpretability framework, which validates explanations through multiple criteria. This work contributes to defining explanation standards in AI.

2502.14145 2026-06-05 cs.CL eess.AS

LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems

基于大语言模型的全双工语音对话系统对话管理

Hao Zhang, Weiwei Li, Rilin Chen, Vinay Kothapally, Meng Yu, Dong Yu

发表机构 * Tencent AI Lab(腾讯人工智能实验室)

AI总结 本文提出一种基于大语言模型的语义语音活动检测模块,用于高效管理全双工语音对话系统的轮询,通过轻量级大语言模型实现意图和非意图打断的区分,并通过短间隔处理输入语音以实现实时决策,同时减少计算开销。

详情
AI中文摘要

在语音对话系统(SDS)中实现全双工通信需要实时协调听、说和思。本文提出一个语义语音活动检测(VAD)模块作为对话管理器(DM),用于高效管理全双工SDS中的轮询。该模块实现为一个轻量级(0.5B)大语言模型,经过全双工对话数据微调,语义VAD预测四个控制标记以调节轮询和轮询保持,区分意图和非意图打断,同时检测查询完成以处理用户停顿和犹豫。通过短间隔处理输入语音,语义VAD实现了实时决策,而核心对话引擎(CDE)仅在生成响应时被激活,从而减少计算开销。这种设计允许独立优化DM而不需重新训练CDE,平衡了交互准确性和推理效率,以实现可扩展的下一代全双工SDS。

英文摘要

Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD predicts four control tokens to regulate turn-switching and turn-keeping, distinguishing between intentional and unintentional barge-ins while detecting query completion for handling user pauses and hesitations. By processing input speech in short intervals, the semantic VAD enables real-time decision-making, while the core dialogue engine (CDE) is only activated for response generation, reducing computational overhead. This design allows independent DM optimization without retraining the CDE, balancing interaction accuracy and inference efficiency for scalable, next-generation full-duplex SDS.

2502.06434 2026-06-05 cs.CV cs.LG

Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression

统一数据集剪枝与蒸馏以实现高效大规模压缩

Lingao Xiao, Songhua Liu, Yang He, Xinchao Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出一个统一的数据集压缩基准,探讨数据集剪枝与蒸馏的收敛趋势,发现软标签蒸馏在小数据集上表现不如剪枝,提出基于硬标签的数据集压缩方法,通过PCA框架提升图像质量和存储效率。

Comments Accepted by ICML 2026

详情
AI中文摘要

数据集剪枝(DP)和数据集蒸馏(DD)在输出上有根本差异:DP选择原始图像子集,而DD生成合成图像。最近,DD对原始图像的依赖增加表明两种方法趋于融合。为研究这种融合趋势,我们提出统一的数据集压缩(DC)基准。该基准揭示了软标签-DD的有趣权衡:虽然软标签提供有价值信息,但它们可能使蒸馏过程变得不必要,因为蒸馏图像可能不总能优于随机子集。此外,基准表明在当前阶段,数据集剪枝在小数据集上优于数据集蒸馏。鉴于这些观察,我们探索硬标签-DC作为互补方法,强调图像质量的同时提供显著的存储效率。我们的PCA(Prune, Combine, and Augment)是首个不依赖软标签而是聚焦图像质量的框架。(1)

英文摘要

Dataset pruning (DP) and dataset distillation (DD) fundamentally differ in their outputs: DP selects original image subsets, while DD generates synthetic images. Recently, DD's increasing reliance on original images suggests a convergence of the two directions. To investigate this convergence trend, we propose a unified dataset compression (DC) benchmark. This benchmark reveals an interesting trade-off for soft-label-DD: while soft labels provide valuable information, they can make the distillation process less essential, as distilled images may not always outperform random subsets. In addition, the benchmark reveals that in current stages, dataset pruning outperforms dataset distillation at small dataset sizes. Given these observations, we explore hard-label-DC as a complementary approach that emphasizes image quality while offering substantial storage efficiency. Our PCA (Prune, Combine, and Augment) is the first framework that does not rely on soft labels but instead focuses on image quality. (1) "P'' means selecting easy samples based on dataset pruning metrics, (2) "C'' indicates combining these samples effectively, and (3) "A'' is to apply constrained image augmentation during training. Our code is available at https://github.com/ArmandXiao/Unifying-Dataset-Pruning-and-Distillation

2502.02487 2026-06-05 cs.CV

Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives

Hier-EgoPack:具有多样任务视角的层次化眼动视频理解

Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Tatiana Tommasi, Giuseppe Averta

发表机构 * Department of Control and Computer Engineering(控制与计算机工程系)

AI总结 本文提出Hier-EgoPack,通过引入层次化架构和GNN层,扩展了EgoPack在多粒度时间推理上的能力,有效解决了多种下游任务中的视频理解问题。

Comments Project webpage at https://sapeirone.github.io/hier-egopack

详情
AI中文摘要

我们对人类活动视频流的理解本质上是多方面的:在短短几秒钟内,我们能够把握正在发生的事情,识别场景中物体的相关性和互动,并预测即将发生的事情,所有这些都在一起发生。为了赋予自主系统这种整体感知,学习如何关联概念、在不同任务中抽象知识,并在学习新技能时利用任务协同是至关重要的。在这方面的一个重要进展是EgoPack,这是一个统一的框架,用于在多样化的任务中理解人类活动,具有最小的开销。EgoPack促进下游任务之间的信息共享和协作,这对于高效学习新技能至关重要。在本文中,我们介绍了Hier-EgoPack,它通过在不同时间粒度上进行推理来扩展EgoPack,从而将其适用范围扩展到更广泛的下游任务。为此,我们提出了一种新的层次化架构用于时间推理,配备了专门设计的GNN层,以有效应对多粒度推理的挑战。我们在多个Ego4D基准上评估了我们的方法,涉及片段级和帧级推理,展示了我们的层次化统一架构如何同时有效地解决这些多样化任务。

英文摘要

Our comprehension of video streams depicting human activities is naturally multifaceted: in just a few moments, we can grasp what is happening, identify the relevance and interactions of objects in the scene, and forecast what will happen soon, everything all at once. To endow autonomous systems with such a holistic perception, learning how to correlate concepts, abstract knowledge across diverse tasks, and leverage tasks synergies when learning novel skills is essential. A significant step in this direction is EgoPack, a unified framework for understanding human activities across diverse tasks with minimal overhead. EgoPack promotes information sharing and collaboration among downstream tasks, essential for efficiently learning new skills. In this paper, we introduce Hier-EgoPack, which advances EgoPack by enabling reasoning also across diverse temporal granularities, which expands its applicability to a broader range of downstream tasks. To achieve this, we propose a novel hierarchical architecture for temporal reasoning equipped with a GNN layer specifically designed to tackle the challenges of multi-granularity reasoning effectively. We evaluate our approach on multiple Ego4d benchmarks involving both clip-level and frame-level reasoning, demonstrating how our hierarchical unified architecture effectively solves these diverse tasks simultaneously.

2410.13056 2026-06-05 cs.CL cs.AI

Channel-Wise Mixed-Precision Quantization for Large Language Models

通道级混合精度量化用于大语言模型

Zihan Chen, Bike Xie, Jundong Li, Cong Shen

发表机构 * Department of Electrical and Computer Engineering, University of Virginia(电气与计算机工程系,弗吉尼亚大学) Kneron Inc.(芯驰科技)

AI总结 本文提出通道级混合精度量化(CMPQ),通过根据激活分布分配不同精度级别来优化大语言模型的量化过程,从而在低比特范围内实现任意平均比特宽度,并在内存使用增加有限的情况下提升性能。

详情
AI中文摘要

大型语言模型(LLMs)在多种语言任务上表现出色,但其在边缘设备上的部署仍面临挑战,因为其大规模参数导致内存需求大。权重仅量化提供了一种减少LLM内存足迹的有希望的解决方案。然而,现有方法主要集中在整数比特量化上,限制了它们对分数比特量化任务的适应性,并阻碍了设备上可用存储空间的充分利用。在本文中,我们引入了通道级混合精度量化(CMPQ),一种新颖的混合精度量化方法,根据激活分布在通道级分配量化精度。通过将不同精度级别分配给不同的权重通道,CMPQ支持低比特范围(例如2到4比特)内的任意平均比特宽度。CMPQ采用非均匀量化策略,并结合两种异常值提取技术,共同保留关键信息,从而最小化量化损失。在九种不同LLM上的实验表明,CMPQ不仅在整数比特量化任务中提高了性能,而且通过以混合精度方式进行处理,在内存使用增加有限的情况下实现了显著的性能提升。CMPQ代表了一种适应性强且有效的LLM量化方法,在各种设备能力下提供了显著的好处。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter sizes. Weight-only quantization presents a promising solution to reduce the memory footprint of LLMs. However, existing approaches primarily focus on integer-bit quantization, limiting their adaptability to fractional-bit quantization tasks and preventing the full utilization of available storage space on devices. In this paper, we introduce Channel-Wise Mixed-Precision Quantization (CMPQ), a novel mixed-precision quantization method that allocates quantization precision in a channel-wise pattern based on activation distributions. By assigning different precision levels to different weight channels, CMPQ supports arbitrary average bit-widths in the low-bit regime (e.g., between 2 and 4 bits). CMPQ employs a non-uniform quantization strategy and incorporates two outlier extraction techniques that collaboratively preserve the critical information, thereby minimizing the quantization loss. Experiments on nine different LLMs demonstrate that CMPQ not only enhances performance in integer-bit quantization tasks but also achieves significant performance gains with a modest increase in memory usage by performing in a mixed-precision way. CMPQ represents an adaptive and effective approach to LLM quantization, offering substantial benefits across diverse device capabilities.

2407.10486 2026-06-05 cs.AI cs.CL

IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

IDEAL: 利用大型语言模型的无限和动态特性进行查询导向的摘要

Jie Cao, Dian Jiao, Yang Dai, Rolan Yan, Wenqiao Zhang, Siliang Tang

发表机构 * Zhejiang University(浙江大学) Tencent, Wechat(腾讯,微信)

AI总结 本文针对查询导向摘要问题,提出两种核心方法:高效细粒度查询-LLM对齐和长文档摘要,通过Query-aware HyperExpert和Query-focused Infini-attention模块实现,实验验证了方法的有效性和通用性。

详情
AI中文摘要

查询导向摘要(QFS)旨在生成回答特定问题的摘要,使用户能够更好地控制和个性化内容。随着大型语言模型(LLMs)的出现,其通过大规模预训练展现出了强大的文本理解能力,这表明了提取片段生成的巨大潜力。本文系统地研究了LLMs基于QFS模型应具备的两个不可或缺特性,即高效细粒度查询-LLM对齐和长文档摘要。相应地,我们提出了两个模块,称为Query-aware HyperExpert和Query-focused Infini-attention,以访问上述特性。这些创新为QFS技术的更广泛应用和可访问性铺平了道路。在现有QFS基准上的广泛实验表明了所提出方法的有效性和通用性。

英文摘要

Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. The advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically investigated two indispensable characteristics that the LLMs-based QFS models should be harnessed, \emph{Efficiently Fine-grained Query-LLM Alignment} and \emph{Lengthy Document Summarization}, respectively. Correspondingly, we propose two modules called Query-aware HyperExpert and Query-focused Infini-attention to access the aforementioned characteristics. These innovations pave the way for broader application and accessibility in the field of QFS technology. Extensive experiments conducted on existing QFS benchmarks indicate the effectiveness and generalizability of the proposed approach.

2412.07583 2026-06-05 cs.CV cs.AI

Mobile Video Diffusion

移动视频扩散

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian

发表机构 * Qualcomm AI Research(高通人工智能研究)

AI总结 本文提出了一种移动优化的视频扩散模型MobileVD,通过降低帧分辨率、引入多尺度时间表示和两种新的剪枝方案,显著降低了内存和计算成本,同时在移动设备上实现了高效的视频生成。

详情
AI中文摘要

视频扩散模型已实现了出色的现实感和可控性,但受限于高计算需求,限制了其在移动设备上的应用。本文介绍了首个移动优化的视频扩散模型。从Stable Video Diffusion (SVD) 的时空UNet出发,我们通过降低帧分辨率、引入多尺度时间表示以及引入两种新的剪枝方案来减少通道数和时间块数量。此外,我们采用对抗微调将去噪步骤减少到一步。我们的模型,称为MobileVD,在效率上提高了523倍(1817.2 vs. 4.34 TFLOPs),质量略有下降(FVD 149 vs. 171),在Xiaomi-14 Pro上生成14x512x256像素的视频片段仅需1.7秒。我们的结果可在https://qualcomm-ai-research.github.io/mobile-video-diffusion/上查看。

英文摘要

Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-diffusion/

2406.08966 2026-06-05 cs.LG cs.AI

Separation Power of Equivariant Neural Networks

等变神经网络的分离能力

Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

发表机构 * University of Trento(特伦托大学) Fondazione Bruno Kessler(布鲁诺·凯斯勒基金会) University of Oxford(牛津大学) University of Venice(威尼斯大学)

AI总结 本文研究了等变神经网络的分离能力,分析了架构和超参数对分离能力的影响,发现非多项式激活函数在表达能力上等价,深度在阈值后不再提升分离能力,而隐表示的块分解会影响分离能力。

Comments Published as a conference paper at ICLR 2025

详情
Journal ref
International Conference on Learning Representations (ICLR), 2025
AI中文摘要

机器学习模型的分离能力是指其区分不同输入的能力,常被用作表达能力的代理。确实,了解模型家族的分离能力是获得细粒度普遍性结果的必要条件。在本文中,我们分析了等变神经网络(如卷积网络和置换不变网络)的分离能力。我们首先给出了由给定架构导出的模型无法区分的输入的完整特征化。从这些结果中,我们推导出分离能力如何受到超参数和架构选择(如激活函数、深度、隐藏层宽度和表示类型)的影响。值得注意的是,所有非多项式激活函数(包括ReLU和Sigmoid)在表达能力上是等价的,并能达到最大分离能力。深度在达到阈值后提升分离能力,之后进一步增加无效应。在隐表示中添加不变特征不影响分离能力。最后,隐表示的块分解影响分离性,最小的组件形成一个分离能力的层次结构,提供了一种直接比较模型分离能力的方法。

英文摘要

The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power up to a threshold, after which further increases have no effect. Adding invariant features to hidden representations does not impact separation power. Finally, block decomposition of hidden representations affects separability, with minimal components forming a hierarchy in separation power that provides a straightforward method for comparing the separation power of models.

2406.12620 2026-06-05 cs.CL

What Makes Two Language Models Think Alike?

是什么让两个语言模型思考相似?

Louis Jalouzot, Christophe Pallier, Emmanuel Chemla, Yair Lakretz

发表机构 * UNICOG CNRS(法国国家科学研究中心) INSERM(法国国家健康与医学研究院) CEA(法国原子能委员会) Paris-Saclay University(巴黎-萨克雷大学) LSCP(语言科学研究中心) EHESS(高等科学研究所) ENS(巴黎高等师范学校) PSL University(巴黎科学哲学大学)

AI总结 本文研究了语言模型表示和处理语言的方式是否受架构和训练差异影响,提出了一种新的方法来量化模型间相似性和差异性,并发现模型相似性主要由发布日期和模型家族决定。

Comments 25 pages, 13 figures

详情
AI中文摘要

模型的架构和训练差异是否影响它们表示和处理语言的方式?传统相似性度量只能告诉我们两个模型是否具有相似的表示几何,但无法解释原因。本文提出了一种新的、简单的方法来解决这个问题。该方法将每个模型各层的神经活动映射到一组可解释的语言特征,并量化这些特征如何驱动模型间的相似性和差异性。我们使用这种方法比较了43个语言模型,涵盖10个家族,包括解码器Transformer、状态空间模型和循环神经网络。我们发现,模型层面的相似性主要由发布日期(作为通用LLM发展的代理)和模型家族决定,表明语言签名并非主要由规模或架构类别决定。总体而言,我们的方法提供了一种将理论动机的符号描述与神经表示联系起来的方法,并可以轻易扩展到其他领域如语音和视觉,以及到其他神经系统如生物大脑。

英文摘要

Do architectural and training differences influence the way models represent and process language? Traditional similarity metrics tell us whether two models share a similar representational geometry, but they cannot explain why. Here, we propose a new, simple, approach to address this question. This approach maps neural activity in each model layer onto a set of interpretable linguistic features and quantifies how much each of them drives similarities and differences between models. We use this approach to compare 43 language models across 10 families, including decoder Transformers, State-Space Models, and Recurrent Neural Networks. We find that model-level similarity is driven most strongly by release date, a proxy for general LLM development, and model family, suggesting that linguistic signatures are not primarily shaped by scale or architecture class. Overall, our approach provides a way to link theoretically-motivated symbolic descriptions to neural representations and can readily be extended to other domains such as speech and vision, and to other neural systems such as biological brains.

2311.07565 2026-06-05 cs.LG stat.ML

Exploration via linearly perturbed loss minimisation

通过线性扰动损失最小化进行探索

David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

发表机构 * University of Alberta(阿尔伯塔大学)

AI总结 本文提出了一种基于线性扰动损失的探索方法EVILL,通过求解线性扰动的正则化负对数似然函数的最小化问题,解释了随机奖励扰动为何能产生有效的多臂老虎机算法,并展示了数据依赖扰动如何使EVILL在理论和实践中达到与Thompson采样类参数扰动方法相当的性能。

Comments Updated with erratum note: Appendix I contains a gap in the proof; all main-paper claims remain valid via the corrected argument of Perneczky, Abeille & Janz (2026, arXiv:2606.00431)

详情
AI中文摘要

我们引入了通过线性损失扰动进行探索(EVILL),一种用于结构化随机老虎机问题的随机探索方法,其通过求解线性扰动的正则化负对数似然函数的最小化问题来工作。我们证明,在一般线性老虎机的情况下,EVILL简化为扰动历史探索(PHE),一种通过在随机扰动的奖励上进行训练来实现探索的方法。通过这样做,我们提供了一个简单清晰的解释,说明何时以及为什么随机奖励扰动会产生有效的老虎机算法。我们提出了之前PHE类型方法中未出现的数据依赖扰动,使EVILL能够匹配Thompson-sampling风格的参数扰动方法的性能,理论和实践中均如此。此外,我们展示了在一般线性老虎机之外的一个例子,其中PHE导致不一致的估计,从而产生线性遗憾,而EVILL仍然表现良好。与PHE一样,EVILL可以通过几行代码实现。

英文摘要

We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. We propose data-dependent perturbations not present in previous PHE-type methods that allow EVILL to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code.

2308.10897 2026-06-05 cs.CV

Can Language Models Learn to Listen?

语言模型能否学会倾听?

Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出了一种基于说话人话语生成适当面部回应的框架,通过将量化后的面部动作元素作为额外语言token输入到基于transformer的大型语言模型中,从而提升监听响应的质量。

Comments ICCV 2023; Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/

详情
AI中文摘要

我们提出了一种框架,用于在双人社交互动中根据说话人的词语生成适当的面部回应。给定一个包含说话人词语及其时间戳的输入转录,我们的方法自回归地预测听众的回应:一系列听众的面部动作,通过VQ-VAE进行量化。由于动作是语言的一部分,我们提出将量化后的原子动作元素作为额外的语言token输入到基于transformer的大型语言模型中。使用仅在文本上预训练的语言模型权重初始化transformer,可以显著提高听众回应的质量,优于从头开始训练transformer。我们通过定量指标和定性用户研究展示了生成的听众动作流畅且反映了语言语义。在我们的评估中,我们分析了模型利用口语文本的时间和语义方面的能力。项目页面:https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/

英文摘要

We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements as additional language token inputs to a transformer-based large language model. Initializing our transformer with the weights of a language model pre-trained only on text results in significantly higher quality listener responses than training a transformer from scratch. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study. In our evaluation, we analyze the model's ability to utilize temporal and semantic aspects of spoken text. Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/

2306.09712 2026-06-05 cs.LG cs.AI cs.CL

Semi-Offline Reinforcement Learning for Optimized Text Generation

半离线强化学习用于优化文本生成

Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

发表机构 * Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan(未知机构)

AI总结 本文提出了一种半离线强化学习方法,平衡了探索能力和训练成本,并在优化成本、渐近误差和过拟合误差界方面实现了最优的强化学习设置。

Comments In Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

详情
AI中文摘要

在强化学习(RL)中,与环境交互有两种主要设置:在线和离线。在线方法在显著的时间成本下探索环境,而离线方法通过牺牲探索能力高效地获得奖励信号。我们提出了一种半离线RL,一种新的范式,能够从离线过渡到在线设置,平衡探索能力和训练成本,并为比较不同的RL设置提供理论基础。基于半离线公式,我们提出了在优化成本、渐近误差和过拟合误差界方面最优的RL设置。广泛实验表明,我们的半离线方法高效且在与最新方法相比时表现相当或更好。

英文摘要

In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.

2110.06847 2026-06-05 cs.CL cs.CY cs.SI physics.soc-ph

Ousiometrics: The essence of meaning aligns with a power-danger-structure framework instead of valence-arousal-dominance

Ousiometrics: 本质的意义与权力-危险-结构框架相一致,而非价值-唤醒-主导框架

P. S. Dodds, T. Alshaabi, M. I. Fudolig, J. W. Zimmerman, J. Lovato, S. Beaulieu, J. R. Minot, M. V. Arnold, A. J. Reagan, C. M. Danforth

发表机构 * Computational Story Lab, Vermont Advanced Computing Center, University of Vermont, Burlington, VT 05405, United States(计算故事实验室、佛蒙特高级计算中心、佛蒙特大学、伯灵顿,VT 05405,美国) Vermont Complex Systems Institute, MassMutual Center of Excellence for Complex Systems and Data Science, University of Vermont, Burlington, VT 05405, United States(佛蒙特复杂系统研究所、马斯穆特复杂系统和数据科学卓越中心、佛蒙特大学、伯灵顿,VT 05405,美国) Department of Computer Science, University of Vermont, Burlington, VT 05405, United States(计算机科学系、佛蒙特大学、伯灵顿,VT 05405,美国) Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, United States(圣达菲研究所、1399号海德公园路,圣达菲,NM 87501,美国) Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA 20147, United States(霍华德·休斯医学研究所、贾能利亚研究校区、阿什伯恩,VA 20147,美国) Advanced Bioimaging Center, University of California Berkeley, Berkeley, CA 94720, United States(先进生物成像中心、加州大学伯克利分校、伯克利,CA 94720,美国) School of Computer and Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia(计算机与数学科学学院、阿德莱德大学、阿德莱德,SA 5005,澳大利亚) Computational Ethics Lab, University of Vermont, Burlington, VT 05405, United States(计算伦理实验室、佛蒙特大学、伯灵顿,VT 05405,美国)

AI总结 本文提出了一种新的意义本质描述框架GPADS,通过分析英语语料库发现,意义本质应由权力-危险-结构框架描述,并构建了ousiometer原型。

Comments 115 pages (30 page main manuscript, 85 page appendix), 82 figures (9 main, 73 appendix), 3 tables (2 main, 1 appendix)

详情
Journal ref
Science Advances, 12(9): eadr4039, 2026
AI中文摘要

从20世纪中叶以来,意义的本质被广泛接受为由价值、唤醒和主导(VAD)三个正交维度描述。这些基本维度已成为许多领域情感分析的基石。通过重新审视英语语言的第一类型和词素,并利用自动注释的直方图--ousiograms--我们发现:词语传达的意义本质最好由好-权力-攻击-危险结构环形框架(GPADS)描述;大规模英语语料库揭示了对安全、低危险词的系统偏见;并且权力-危险-结构(PDS)框架是代表基本意义的最小框架。我们发现GPADS框架与其他空间如心理状态和虚构原型之间有显著的一致性,并构建并展示了ousiometer原型。

英文摘要

From work emerging through the middle of the 20th century, the essence of meaning has become widely accepted as being described by the three orthogonal dimensions of valence, arousal, and dominance (VAD). These essential dimensions have become the cornerstone of sentiment analysis across many fields. By re-examining first types and then tokens for the English language, and through the use of automatically annotated histograms -- `ousiograms' -- we find here that: The essence of meaning conveyed by words is instead best described by a goodness-power-aggression-danger-structure circumplex framework (GPADS); that large-scale English language corpora reveal a systematic bias toward safe, low-danger words; and that the power-danger-structure (PDS) framework is the minimal framework that represents essential meaning. We find remarkable congruences between the GPADS framework and other spaces including mental states and fictional archetypes, and we construct and demonstrate a prototype ousiometer.

2606.06492 2026-06-05 cs.SE cs.AI cs.CL

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code2LoRA:用于软件演化下代码语言模型的超网络生成适配器

Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出Code2LoRA超网络框架,通过生成仓库特定的LoRA适配器注入仓库知识,无需推理时令牌开销,支持静态和演化两种场景,在RepoPeftBench上达到与逐仓库LoRA相当或更优的性能。

详情
AI中文摘要

代码语言模型需要仓库级上下文来解决导入、API和项目约定。现有方法通过长输入(通过RAG或依赖分析检索)或逐仓库微调和LoRA注入这些知识——这在仓库规模上成本高昂且对演化的代码库脆弱。我们引入Code2LoRA,一个超网络框架,生成仓库特定的LoRA适配器,有效地注入仓库知识,零推理时令牌开销。Code2LoRA支持两种使用场景:Code2LoRA-Static将单个仓库快照转换为适配器,适用于稳定代码库的理解;而Code2LoRA-Evo维护一个由GRU隐藏状态支持的适配器,该状态随每次代码差异更新,适用于演化代码库的活跃开发。为了评估Code2LoRA与参数高效微调基线,我们构建了RepoPeftBench,一个包含604个Python仓库的基准,包含两个轨道:一个静态轨道,包含40K训练和12K测试断言完成任务;一个演化轨道,包含215K提交派生训练和87K提交派生测试任务。在静态轨道上,Code2LoRA-Static实现了63.8%的跨仓库和66.2%的仓库内精确匹配,与逐仓库LoRA上界相当;在演化轨道上,Code2LoRA-Evo实现了60.3%的跨仓库精确匹配(比单个共享LoRA高5.2个百分点)。Code2LoRA的代码可在https://anonymous.4open.science/r/code2lora-6857找到;模型检查点和RepoPeftBench数据集可在https://huggingface.co/code2lora找到。

英文摘要

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Code2LoRA supports two usage scenarios: Code2LoRA-Static converts a single repository snapshot into an adapter, suitable for comprehension of stable codebases; while Code2LoRA-Evo maintains an adapter backed by a GRU hidden state updated per code diff, suitable for active development of evolving codebases. To evaluate Code2LoRA against parameter-efficient fine-tuning baselines, we build RepoPeftBench, a benchmark of 604 Python repositories with two tracks: a static track with 40K training and 12K test assertion-completion tasks, and an evolution track with 215K commit-derived training and 87K commit-derived test tasks. On the static track, Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching the per-repository LoRA upper bound; on the evolution track, Code2LoRA-Evo achieves 60.3% cross-repo exact match (+5.2 pp over a single shared LoRA). Code2LoRA's code can be found at https://anonymous.4open.science/r/code2lora-6857; the model checkpoints and RepoPeftBench datasets can be found at https://huggingface.co/code2lora.

2606.06480 2026-06-05 cs.GT cs.LG

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

DNQ: 用于部分可观测n人博弈的深度纳什Q网络

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

发表机构 * IEEE

AI总结 针对多智能体同时博弈问题,提出DNQ框架,通过求解器在环的均衡监督训练智能体,并对比成对与精确均衡求解方法的可扩展性。

详情
AI中文摘要

许多现实世界的竞争系统要求多个决策者在共享约束、有限信息和重复交互下同时行动,例如拍卖、资源分配和安全竞争。我们将多轮同时竞价作为此类问题的受控测试平台,并提出DNQ,一种求解器在环的均衡监督框架,用于训练竞价智能体。DNQ在轨迹收集、基于评论家的收益估计、均衡计算和策略模仿之间交替进行。在每个访问的状态下,共享评论家预测成对收益矩阵或精确的N人收益张量,外部求解器计算均衡策略,智能体通过最小化其掩码策略与求解器导出的均衡目标之间的KL散度进行训练。我们专注于可扩展的成对公式,与精确公式相比,大大降低了均衡求解成本和训练时间,同时共享评论家跨智能体和状态摊销了收益学习。实验使用评论家损失、策略熵、竞价资源使用和训练成本比较了成对和精确变体,表明成对方法可扩展到更多智能体,而精确方法随着联合博弈的增长在计算上变得不可行。这些结果说明了重复竞争环境中战略保真度与可扩展性之间的权衡。

英文摘要

Many real-world competitive systems require multiple decision-makers to act simultaneously under shared constraints, limited information, and repeated interaction, as in auctions, resource allocation, and security competition. We study multi-turn simultaneous bidding as a controlled testbed for such problems and propose DNQ, a solver-in-the-loop equilibrium supervision framework for training bidding agents. DNQ alternates between trajectory collection, critic-based payoff estimation, equilibrium computation, and policy imitation. At each visited state, a shared critic predicts either pairwise payoff matrices or an exact N-player payoff tensor, an external solver computes equilibrium strategies, and the agents are trained by minimizing the KL divergence between their masked policies and the solver-derived equilibrium targets. We focus on a scalable pairwise formulation that greatly reduces equilibrium-solving cost and training time compared with the exact formulation, while the shared critic amortizes payoff learning across agents and states. Experiments compare the pairwise and exact variants using critic loss, policy entropy, bidding resource usage, and training cost, showing that the pairwise method scales to larger numbers of agents, whereas the exact method becomes computationally impractical as the joint game grows. These results illustrate the trade-off between strategic fidelity and scalability in repeated competitive environments.

2606.06460 2026-06-05 cs.CR cs.AI

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

智能体会自行回避吗?测量LLM智能体对带内拒绝访问信号的遵从性

Thamilvendhan Munirathinam

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种轻量级带内拒绝信号(Recuse Signal),通过实验测量LLM智能体是否自愿遵从该信号,发现信号能有效诱导回避,但高级模型在操作员授权下可能忽略。

Comments 8 pages, 1 figure. Code, specification, and experiment harness: https://github.com/mthamil107/Recuse

详情
AI中文摘要

随着自主LLM智能体越来越多地持有真实凭证并在无人参与的情况下操作基础设施,操作员没有标准方式告知智能体某个资源是禁止访问的。访问控制要么允许智能体进入(它有有效凭证),要么硬性拒绝(与任何其他客户端无法区分)。我们提出第三种模式:一种轻量级的、公开的带内拒绝信号——Recuse Signal——服务器通过协议的现有通道(如SSH横幅、PostgreSQL NOTICE)发出,要求连接的自动化智能体自愿退出。这是一种合作治理控制,类似于实时访问的robots.txt;明确不是安全边界。其价值完全是经验性的,据我们所知,尚未被测量:合规的LLM智能体是否真的会遵守这样的信号?我们将该信号定义为一个开放的小型标准,实现了两个零或低占用适配器(一个SSH横幅/PAM钩子和一个PostgreSQL线路协议代理),将它们部署在实时的生产主机上,并进行受控实验,其中新智能体被赋予一个良性操作任务,并观察其是否回避。在试点中(SSH;OpenAI GPT-4o和GPT-4o-mini;以及作为部署智能体的Claude Code),该信号干净地诱导回避——存在信号时100%回避,而无信号对照组中100%完成任务——并且揭示性地表现为合作信号而非绝对信号:显式的操作员授权框架使最强大的模型继续执行,而其他智能体继续遵从主机策略。我们发布该标准、适配器和实验框架以供复现。

英文摘要

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client). We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a protocol's existing channels (an SSH banner, a PostgreSQL NOTICE) asking a connecting automated agent to voluntarily withdraw. This is a cooperative governance control, the robots.txt analogue for live access; it is explicitly not a security boundary. Its value is entirely empirical and, to our knowledge, unmeasured: do compliant LLM agents actually honor such a signal? We define the signal as an open mini-standard, implement two zero- or low-footprint adapters (an SSH banner/PAM hook and a PostgreSQL wire-protocol proxy), deploy them on a live production host, and run a controlled experiment in which fresh agents are given a benign operations task and observed for recusal. In a pilot (SSH; OpenAI GPT-4o and GPT-4o-mini; and Claude Code as a deployed agent), the signal cleanly induces recusal -- 100% recusal when present versus 100% task completion in a no-signal control -- and, revealingly, behaves as a cooperative rather than absolute signal: an explicit operator-authorization framing flips the most capable model to proceed, while other agents continue to defer to the on-host policy. We release the standard, adapters, and experiment harness for reproduction.

2606.06454 2026-06-05 cs.SE cs.CL

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

脚手架,而非词汇?一项受控、双层、预注册的波普尔式代码生成技能研究

Mehmet Iscan

发表机构 * PythaLab, Yıldız Technical University, Istanbul, Turkey(Pytha实验室,伊兹密尔技术大学,伊斯坦布尔,土耳其)

AI总结 通过双层消融实验(包括长度匹配安慰剂、仅标签脚手架和真实执行测试),研究发现波普尔式提示技能对代码正确性的提升主要来自脚手架结构而非其内容,并在大模型上因天花板效应无法检测,在小模型上仅标签脚手架即可达到类似效果。

Comments 34 pages, 5 figures, 8 tables

详情
AI中文摘要

大型语言模型越来越多地编写、审查和评判代码,一种快速发展的实践是为它们配备提示“技能”,要求模型像科学家一样推理。一个突出的例子是告诉模型扮演波普尔式证伪主义者,据报道这种技能能改进生成的代码。但这些增益几乎总是通过LLM作为评判者来读取,而该评判工具存在已知的位置偏好、自我偏好和风格偏差。我们问:如果它看起来有帮助,那么增益是来自技能的波普尔式内容,还是来自任何脚手架所施加的结构?我们预注册了一个双层消融实验,包含三个对照:长度匹配的安慰剂、仅保留波普尔式标题但去除过程的仅标签脚手架,以及一个执行预言机(HumanEval+单元测试),外加一个词汇光环哨兵和一个同模型自评判审计。在前沿模型(Claude Sonnet 4.6,N=163)上,所有条件都接近基准上限且无法区分,因此预注册的+5点改进未得到支持(上限限制的未检测)。在小模型(Qwen2.5-Coder-0.5B,N=164)上,结构化条件将最佳八次正确率提升了20-22点,但完整技能相比仅标签脚手架没有显示出可分离的益处(聚合F@8=L@8 vs V@8=34.8%),而安慰剂仅落后2.4点。一个应用波普尔式评分标准的0.5B自评判器未能击败随机选择,并将其60%的选择集中在一个索引上。在测试的两种设置中,该技能的波普尔式过程内容在仅标签脚手架之外没有增加可分离的执行正确性收益,因此增益追踪的是脚手架结构。我们贡献了一个校准的负结果和一个可重用的消歧协议;该发现界定了关于一个提示技能家族的工程主张,而不是对波普尔式方法论的总体评价。

英文摘要

Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt 'skills' that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist, and such skills are reported to improve generated code. But these gains are almost always read off an LLM-as-a-judge, an instrument with documented positional, self-preference, and stylistic biases. We ask: if it appears to help, is the gain from the skill's Popperian content, or from the structure any scaffold imposes? We pre-register a two-tier ablation with three controls: a length-matched placebo, a labels-only scaffold that keeps the Popperian headers but strips the procedure, and an execution oracle (HumanEval+ unit tests), plus a vocabulary-halo sentinel and a same-model self-judge audit. On a frontier model (Claude Sonnet 4.6, N=163) all conditions sit near the benchmark ceiling and do not separate, so the pre-registered +5-point improvement is not supported (a ceiling-limited non-detection). On a small model (Qwen2.5-Coder-0.5B, N=164) structured arms lift best-of-eight correctness by 20-22 points, but the full skill shows no separable benefit over a labels-only scaffold (aggregate F@8=L@8 vs V@8=34.8%), and the placebo trails by only 2.4 points. A 0.5B self-judge applying the Popperian rubric does not beat random selection and concentrates 60% of its picks on one index. In the two settings tested, the skill's Popperian procedural content adds no separable execution-correctness benefit beyond a labels-only scaffold, so the gains track scaffold structure. We contribute a calibrated negative result and a reusable disambiguation protocol; the finding bounds an engineering claim about one prompt-skill family and is not an evaluation of Popperian methodology in general.

2606.06444 2026-06-05 eess.AS cs.CL cs.SD

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

USAD 2.0:面向通用音频理解的表征蒸馏规模化

Heng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati, Mrudula Athi, Anton Ratnarajah, Amit Chhetri, James Glass

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室) Amazon(亚马逊)

AI总结 提出USAD 2.0通用音频编码器,通过领域感知蒸馏融合自监督和监督基础模型知识,并扩展至音乐领域,经深度缩放达到十亿参数,在探测和基于LLM的评估中取得领先性能。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

音频编码器对于现代音频应用至关重要,因为大型语言模型(LLM)越来越依赖单一编码器处理多样输入。虽然自监督学习(SSL)已产生强大的领域特定编码器(如语音或音乐专家),但像USAD和SPEAR这样的多领域方法在覆盖范围和评估方面仍然有限。最近的研究也表明,监督编码器与音频LLM的对齐效果更好。我们提出USAD 2.0,一种融合了SSL和监督基础模型知识的通用编码器。USAD 2.0引入了领域感知蒸馏来解决教师不匹配问题,将覆盖范围扩展到音乐领域,并增加了用于下游任务的第二阶段监督蒸馏。我们进一步通过深度缩放将模型扩展到十亿参数。实验表明,USAD 2.0在探测和基于LLM的评估中取得了强劲或最先进的性能。

英文摘要

Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEAR remain limited in coverage and evaluation. Recent studies also suggest supervised encoders align better with audio LLMs. We present USAD 2.0, a universal encoder integrating knowledge from both SSL and supervised foundation models. USAD 2.0 introduces domain-aware distillation to address teacher mismatch, extends coverage to the music domain, and adds second-stage supervised distillation for downstream use. We further scale the model to one billion parameters via depth scaling. Experiments show USAD 2.0 achieves strong or state-of-the-art performance across probing and LLM-based evaluations.

2606.06391 2026-06-05 stat.ML cs.LG

Conformal Risk Sharing: Certified Cost Allocation with Participation Guarantees

共形风险分担:具有参与保证的认证成本分配

Ieva Kazlauskaite

发表机构 * Ieva Kazlauskaite(伊娃·卡祖利特)

AI总结 提出共形风险分担方法,通过可解释的分担策略与分裂共形校准相结合,从有限数据中无分布假设地分配罕见事件的财务影响,为每个参与者提供义务上限并验证无人受损。

详情
AI中文摘要

将罕见不利事件的财务影响在群体中分担可以减轻极端个人负担,但任何因该安排而变得更糟的参与者都有理由退出。因此,一个可信的机制必须为每个代理人提供其未来义务的可信上限,并且只有在参与者之间的总损害有界时才应部署。我们将此形式化为认证分配问题:从有限数据中,无需分布假设,找到一种再分配规则,为每个参与者产生义务上限,并验证没有参与者实质上变得更糟。我们提出共形风险分担,通过将可解释的分担策略与分裂共形校准相结合来解决这个问题。分担强度在训练数据上调整,而保留的校准数据产生无分布假设的每个代理保证(在可交换性下有效)。在合成和真实数据(包括降水和能源合作社数据)上的实验证实,该框架可以显著降低高风险代理的极端义务,同时控制对他人的损害。

英文摘要

Sharing the financial impact of rare adverse events across a group can soften extreme individual burdens, but any participant made worse off by the arrangement has reason to leave. A credible mechanism must therefore provide each agent with a trustworthy cap on their future obligation and should be deployed only if the aggregate harm across participants is bounded. We formalise this as the Certified Allocation Problem: from finite data and without distributional assumptions, find a redistribution rule, produce obligation caps for every participant, and verify that no participant is made materially worse off. We propose Conformal Risk Sharing, which solves this problem by pairing an interpretable sharing policy with split conformal calibration. The sharing intensity is tuned on training data, while held-out calibration data produces distribution-free per-agent guarantees (valid under exchangeability). Experiments on synthetic and real-world data, including precipitation and energy-cooperative data, confirm that the framework can substantially reduce extreme obligations for high-risk agents while controlling harm to others.

2606.06373 2026-06-05 eess.SP cs.AI

LatentWave: JEPA Pretraining for Wireless Foundation Models

LatentWave: 无线基础模型的JEPA预训练

Ahmed Mohamed, Ahmed Aboulfotouh, Hatem Abou-Zeid

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出LatentWave,采用联合嵌入预测架构(JEPA)在潜空间预测掩码区域,学习可迁移的无线信号表示,并在四个下游任务中优于掩码建模基线。

详情
AI中文摘要

无线基础模型已成为为每个无线任务构建单独模型的有前途的替代方案。然而,现有方法依赖于掩码输入重建,这可能会使表示偏向于低级信号细节。在本文中,我们提出了LatentWave,一种无线基础模型,使用联合嵌入预测架构(JEPA)在多样化的无线频谱图和信道状态信息(CSI)上进行预训练。通过在潜空间中预测掩码区域,LatentWave学习到的表示在多种下游任务中具有更好的开箱即用迁移性。所提出的架构在预训练期间采用每通道补丁嵌入和随机通道采样,使其能够处理可变的天线数量,并提高在异构无线配置中的可用性。我们在四个下游任务上评估了LatentWave:射频信号分类、5G NR定位、波束预测和视距/非视距分类,并与在同一数据上预训练的掩码建模基线(WavesFM)进行比较。此外,我们表明掩码几何形状引入了任务相关的归纳偏差:频率掩码强烈有利于与信道相关的任务,如定位和波束预测,而区域掩码则更好地保留信号分类的可区分性。

英文摘要

Wireless foundation models have emerged as a promising alternative to building separate models for each wireless task. However, existing approaches rely on masked input reconstruction, which can bias representations toward low-level signal details. In this paper, we propose LatentWave, a wireless foundation model pretrained using a Joint-Embedding Predictive Architecture (JEPA) on diverse wireless spectrograms and channel state information (CSI). By predicting masked regions in latent space, LatentWave learns representations that are more transferable out of the box across diverse downstream tasks. The proposed architecture employs per-channel patch embeddings with stochastic channel sampling during pretraining, allowing it to process variable antenna counts and improving usability across heterogeneous wireless configurations. We evaluate LatentWave on four downstream tasks: RF signal classification, 5G NR positioning, beam prediction, and LoS/NLoS classification, comparing against a masked-modeling baseline (WavesFM) pretrained on the same data. Additionally, we show that the masking geometry introduces a task-dependent inductive bias: frequency masking strongly favors channel-related tasks such as positioning and beam prediction, while region masking better preserves discriminability for signal classification.

2606.06351 2026-06-05 stat.ML cs.LG

Function-Space Priors for Bayesian Neural ODEs with Application to Vessel Trajectory Prediction

贝叶斯神经常微分方程的函数空间先验及其在船舶轨迹预测中的应用

Jaeyeong Lee, Wonmo Koo, Heeyoung Kim

发表机构 * Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology (KAIST)(工业与系统工程系,韩国科学技术院(KAIST))

AI总结 针对船舶轨迹预测中不规则采样、缺失报告和复杂动力学挑战,提出一种在向量场上施加高斯过程核先验的正则化方法,并结合概率多重打靶实现长序列的不确定性量化。

详情
AI中文摘要

从自动识别系统(AIS)数据预测船舶轨迹对于海上态势感知至关重要,但由于不规则采样、缺失报告和复杂动力学,这仍然具有挑战性。除了准确的点预测外,海事应用还需要良好校准的不确定性估计以支持可靠决策。贝叶斯神经常微分方程(ODE)通过在神经向量场参数上放置先验,为具有不确定性量化的连续时间轨迹建模提供了原则性框架。然而,常用的各向同性高斯权重先验无法编码船舶动力学的信息性结构特性,如平滑性和局部性。现有的函数空间贝叶斯神经网络方法解决了静态映射的这一限制,但不能直接转移到神经常微分方程,因为其主要关注量是轨迹而非向量场本身。原则上,可以直接在ODE解上放置高斯过程(GP)先验,但这需要将分布通过非线性ODE求解器传播,这在分析上是棘手的。为了解决这一挑战,我们采用了一种实用方法,直接在有限测量点集上评估的向量场上施加基于GP核的先验。具体来说,我们用基于核的正则化器增强标准权重空间变分目标,该正则化器惩罚向量场偏离GP先验所隐含的结构。为了处理长且不规则的AIS轨迹,我们进一步将这种函数空间正则化与概率多重打靶相结合,该打靶方法在保持全局一致性的同时解耦跨时间段的推理。

英文摘要

Vessel trajectory prediction from Automatic Identification System (AIS) data is essential for maritime situational awareness, yet it remains challenging due to irregular sampling, missing reports, and complex dynamics. Beyond accurate point forecasts, maritime applications also demand well-calibrated uncertainty estimates for reliable decision-making. Bayesian Neural Ordinary Differential Equations (ODEs) offer a principled framework for continuous-time trajectory modeling with uncertainty quantification by placing a prior over the neural vector field parameters. However, the commonly used isotropic Gaussian weight prior fails to encode informative structural properties of vessel dynamics, such as smoothness and locality. Existing function-space Bayesian neural network methods address this limitation for static mappings, but do not transfer directly to Neural ODEs, where the primary quantity of interest is the trajectory rather than the vector field itself. In principle, one could place a Gaussian process (GP) prior directly over ODE solutions, but this requires propagating distributions through a nonlinear ODE solver, which is analytically intractable. To address this challenge, we adopt a practical approach that imposes a GP-kernel-based prior directly on the vector field evaluated at a finite set of measurement points. Specifically, we augment the standard weight-space variational objective with a kernel-based regularizer that penalizes deviations of the vector field from the structure implied by a GP prior. To handle long and irregular AIS trajectories, we further combine this function-space regularization with probabilistic multiple shooting, which decouples inference across temporal segments while maintaining global consistency.

2606.06347 2026-06-05 eess.SY cs.LG cs.SY

Attack Detection using Time Series Foundation Models

使用时间序列基础模型的攻击检测

Sribalaji C. Anand, Anh Tung Nguyen, George J. Pappas

发表机构 * University of Pennsylvania(宾夕法尼亚大学) KTH Royal Institute of Technology(皇家理工学院) Uppsala University(乌普萨拉大学)

AI总结 针对无模型知识的网络物理系统,提出基于TimesFM时间序列基础模型的零样本攻击检测方法,在IEEE 14节点电力系统上验证其性能。

Comments Under review

详情
AI中文摘要

本文解决了在没有任何被控对象模型或其结构知识的情况下,网络物理系统中的攻击检测问题。远程被控对象通过假设受到攻击的网络向操作员传输传感器测量值。我们考虑两类攻击:无模型重放攻击和基于模型的隐蔽攻击。对于后者,我们针对线性与非线性系统,推导了针对$\chi^2$检测器的最优隐蔽攻击策略的闭式表达式。然后,我们提出一种基于TimesFM(Google Research开发的时间序列基础模型)的无模型结构检测器,该检测器以零样本方式作为替代残差生成器运行。实验表明,基于TimesFM的检测器实现了相当或更优的攻击检测性能。在IEEE 14节点电力系统上通过数值实验证明了所提方法的有效性。我们还证明,当经典冗余假设失效时,TimesFM预测可作为受损测量值的替代,这是一种实用的缓解技术。

英文摘要

This paper addresses the problem of attack detection in cyber-physical systems without any knowledge of the plant model or its structure. A remotely located plant transmits sensor measurements to an operator over a network that is assumed to be under attack. We consider two classes of attacks: model-free replay attacks and model-based stealthy attacks. For the latter, we derive closed-form expressions for the optimal stealthy attack policy against a $χ^2$ detector, for both linear and nonlinear systems. We then propose a model-structure-free detector based on TimesFM, a time-series foundation model developed by Google Research, which serves as a surrogate residual generator operating in a zero-shot fashion. We show empirically that the TimesFM-based detector achieves a comparable or superior attack detection performance. The efficacy of the proposed approach is demonstrated numerically on the IEEE 14-bus power system. We also demonstrate that TimesFM predictions can serve as a substitute for corrupted measurements, a practical mitigation technique when classical redundancy assumptions fail.

2606.06342 2026-06-05 stat.ML cs.LG

Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

对称散度与归一化相似性:表示分析的统一拓扑框架

Yan Wang, Tianyang Hu

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen(数据科学学院,香港中文大学(深圳))

AI总结 提出对称表示拓扑散度(SRTD)和归一化拓扑相似性(NTS),分别解决现有拓扑散度的非对称性和无界性问题,实现细粒度结构诊断与跨场景标准化评估。

Comments Accepted by TMLR

详情
AI中文摘要

拓扑数据分析(TDA)为比较神经表示提供了一种原则性的、内在的视角。然而,现有的配对拓扑散度(如RTD)受到启发式非对称性以及更关键的无界分数(依赖于样本量)的限制,阻碍了可靠的跨场景基准测试。为了解决这些挑战,我们开发了一个统一的拓扑工具包,服务于两个互补的需求:细粒度结构诊断和鲁棒的标准化评估。首先,我们通过引入对称表示拓扑散度(SRTD)及其高效变体SRTD-lite来完善RTD框架。除了解决先前变体的理论非对称性外,SRTD将诊断信息整合到一个单一的、全面的交叉条码签名中。这使得能够精确定位结构差异,并作为有效的优化目标,无需双方向计算的开销。其次,为了在异构设置中实现可靠的基准测试,我们提出了归一化拓扑相似性(NTS)。通过测量层次合并顺序的秩相关性,NTS产生一个介于-1和1之间的尺度不变度量,有效克服了未归一化散度的尺度和样本依赖性。在合成和真实深度学习设置中的实验表明,我们的工具包捕捉到了几何度量无法发现的CNN中的功能变化,并且即使在距离饱和情况下也能鲁棒地映射LLM谱系,提供了一种严格的、拓扑感知的视角,补充了CKA等度量。

英文摘要

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.

2606.06316 2026-06-05 quant-ph cs.AI cs.DS

Quantum enhanced rare event discovery and sampling

量子增强的罕见事件发现与采样

Naixu Guo, Po-Wei Huang, Qisheng Wang, Jayne Thompson, Patrick Rebentrost, Mile Gu, Chengran Yang

发表机构 * Centre for Quantum Technologies, National University of Singapore(量子技术中心,新加坡国立大学) Mathematical Institute, University of Oxford(牛津大学数学研究所) School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) School of Informatics, University of Edinburgh(爱丁堡大学信息学院) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算与数据科学学院) Nanyang Quantum Hub, School of Physical and Mathematical Sciences, Nanyang Technological University(南洋量子中心,南洋理工大学物理与数学科学学院)

AI总结 针对概率极低的罕见事件发现与采样问题,提出一种无需预先知道事件类型的量子算法,实现了与稀有度阈值的最优量子标度,并在重尾系统和稳态随机过程中分别获得二次加速和鲁棒多项式加速。

Comments 36 pages (8+28)

详情
AI中文摘要

金融崩溃、基础设施的级联故障以及AI系统中的关键错误通常由概率极小的事件触发。因此,高效发现和采样概率低于阈值的事件具有关键意义。然而,使用现有的经典或量子方法,这一任务极具挑战性。由于事件罕见,需要巨大的采样开销才能收集足够的数据样本。此外,由于罕见事件事先未知,无法使用标准技术标记以进行放大。在此,我们提出了一种量子算法,用于罕见事件发现和采样,而无需事先学习哪些事件是罕见的。该算法实现了与稀有度阈值的最优量子标度。我们进一步证明,对于尾部总质量非零的重尾系统,这可以实现二次加速,并且对于稳态随机过程,转化为鲁棒的多项式加速,其指数由其熵率结构决定。

英文摘要

Financial crashes, cascading failures in infrastructure, and critical errors in AI systems are frequently triggered by events that occur with extremely small probability. Efficiently discovering and sampling events with probability below a threshold is therefore of critical interest. Yet this task is highly non-trivial using existing classical or quantum methods. Being rare, such events require an immense sampling overhead to collect sufficient data samples. Moreover, because the rare events are not known in advance, they cannot be flagged for amplification using standard techniques. Here, we introduce a quantum algorithm for rare-event discovery and sampling without first learning which events are rare. The algorithm achieves the optimal quantum scaling with the rarity threshold. We further demonstrate that this can achieve a quadratic speedup for heavy-tailed systems whose tail has nonvanishing total mass, and translates into a robust polynomial speedup for stationary stochastic processes, with the exponent determined by its entropy-rate structure.

2606.06314 2026-06-05 math.NA cs.LG cs.NA stat.ML

DAS-PINNs for high-dimensional partial differential equations: extending deep adaptive sampling to spacetime domains

DAS-PINNs 用于高维偏微分方程:将深度自适应采样扩展到时空域

Anshima Singh, David J. Silvester

发表机构 * University of Manchester(曼彻斯特大学) Department of Mathematics(数学系)

AI总结 提出一种基于归一化流的深度自适应采样框架,将时空视为统一域,通过残差分布自动识别高残差区域并生成采样点,有效求解具有局部动态特征的高维时变PDE。

详情
AI中文摘要

具有空间局部和动态演化解的时变高维偏微分方程对物理信息神经网络(PINNs)构成根本性挑战,因为在高维时空域中均匀配点采样越来越无效。本文将深度自适应采样框架扩展到时变设置,将空间和时间视为统一域,无需任何显式时间推进。归一化流神经网络模型有效学习由PDE残差诱导的分布,并生成集中在解最难学习区域的新配点。与需要显式时间步进或移动网格的传统自适应策略不同,高残差区域由PDE残差分布驱动,在空间和时间上自动识别和跟踪。通过从二维空间中的尖锐移动特征到高达八维空间中的局部结构等一系列基准问题,评估了所提策略的有效性。

英文摘要

Time-dependent high-dimensional partial differential equations (PDEs) with spatially localised and dynamically evolving solutions pose a fundamental challenge for physics-informed neural networks (PINNs), as uniform collocation sampling becomes increasingly ineffective in high-dimensional spatiotemporal domains. In this work, a deep adaptive sampling framework for PINNs is extended to the time-dependent setting by treating space and time as a unified domain without any explicit time marching. A normalising flow neural network model effectively learns the distribution induced by the PDE residual and generates new collocation points concentrated in regions where the solution is most difficult to learn. Unlike conventional adaptive strategies that require explicit time stepping or moving meshes, high-residual regions are automatically identified and tracked across both space and time, driven purely by the PDE residual distribution. The effectiveness of the proposed strategy is assessed on a range of benchmark problems, from sharp and moving features in two spatial dimensions to localised structures in up to eight spatial dimensions.