arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2301.06308 2026-06-02 cs.LG cs.AI

Stability Analysis of Sharpness-Aware Minimization

锐度感知最小化的稳定性分析

Hoki Kim, Jinseong Park, Yujin Choi, Jaewook Lee

发表机构 * Chung-Ang University, South Korea(Chung-Ang 大学,韩国) Korea Institute for Advanced Study, South Korea(韩国高级研究院) Ulsan National Institute of Science(乌山国家科学研究院) Nanyang Technological University (NTU), Singapore(南洋理工大学(NTU),新加坡) Seoul National University, South Korea(首尔国立大学,韩国)

AI总结 研究SAM在鞍点附近的收敛不稳定性,通过动力系统理论证明鞍点成为吸引子,并发现动量与批次大小可缓解该问题。

Comments Accepted to ICML 2026

详情
AI中文摘要

锐度感知最小化(SAM)是一种训练方法,旨在寻找深度学习中的平坦最小值,从而在各个领域取得最先进的性能。SAM不是最小化当前权重的损失,而是最小化参数空间中其邻域内的最坏情况损失。在本文中,我们研究了SAM在鞍点附近的收敛不稳定性。利用动力系统的定性理论,我们解释了SAM如何陷入鞍点,并从理论上证明了在SAM动力学下鞍点可以成为吸引子。此外,通过建立SAM的扩散,我们证明了这种收敛不稳定性也可能发生在随机动力系统中。我们证明,在逃离鞍点方面,SAM扩散比普通梯度下降更差。最后,我们展示了经常被忽视的训练技巧——动量和批次大小——可能对缓解收敛不稳定性和实现高泛化性能很重要。我们的理论和实证结果通过几个著名的优化问题和基准任务的实验得到了充分验证。

英文摘要

Sharpness-aware minimization (SAM) is a training method that seeks to find flat minima in deep learning, resulting in state-of-the-art performance across various domains. Instead of minimizing the loss of the current weights, SAM minimizes the worst-case loss in its neighborhood in the parameter space. In this paper, we investigate the convergence instability of SAM near a saddle point. Using the qualitative theory of dynamical systems, we explain how SAM becomes stuck in the saddle point and theoretically prove that the saddle point can become an attractor under SAM dynamics. Additionally, we show that this convergence instability can also occur in stochastic dynamical systems by establishing the diffusion of SAM. We prove that SAM diffusion is worse than that of vanilla gradient descent in terms of saddle point escape. Finally, we demonstrate that often overlooked training tricks, momentum and batch-size, might be important to mitigate the convergence instability and achieve high generalization performance. Our theoretical and empirical results are thoroughly verified through experiments on several well-known optimization problems and benchmark tasks.

2208.00967 2026-06-02 cs.CV

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

反事实干预特征迁移用于可见光-红外行人重识别

Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

发表机构 * School of Information Science and Technology, University of Science and Technology of China(信息科学与技术学院,中国科学技术大学) Key Laboratory of Electromagnetic Space Information, Chinese Academy of Science(电磁空间信息重点实验室,中国科学院) School of Data Science, University of Science and Technology of China(数据科学学院,中国科学技术大学) SenseTime Research(商汤研究院) Qing Yuan Research Institute, Shanghai Jiao Tong University(青元研究院,上海交通大学)

AI总结 针对可见光-红外行人重识别中图模型泛化性差的问题,提出反事实干预特征迁移方法,通过同质与异质特征迁移减少模态不平衡,并利用反事实关系干预增强图拓扑结构的可靠性。

Comments Accepted by ECCV 2022

详情
AI中文摘要

基于图模型的方法最近在行人重识别任务中取得了巨大成功,该方法首先计算不同行人之间的图拓扑结构(亲和度),然后跨行人传递信息以获得更强的特征。但我们发现,现有的基于图模型的方法在可见光-红外行人重识别任务(VI-ReID)中存在泛化性差的问题,原因有二:1)训练-测试模态平衡差距,这是VI-ReID任务的一个特性。训练阶段两种模态的数据量是平衡的,但在推理时极度不平衡,导致基于图的VI-ReID方法泛化性低。2)图模块的端到端学习方式导致次优的拓扑结构。我们分析认为,训练良好的输入特征削弱了图拓扑的学习,使其在推理过程中不够泛化。在本文中,我们提出了一种反事实干预特征迁移(CIFT)方法来解决这些问题。具体而言,设计了同质与异质特征迁移(H2FT),通过两种独立设计的图模块和不平衡场景模拟来减少训练-测试模态平衡差距。此外,提出了反事实关系干预(CRI),利用反事实干预和因果效应工具来突出拓扑结构在整个训练过程中的作用,使图拓扑结构更加可靠。在标准VI-ReID基准上的大量实验表明,CIFT在各种设置下均优于最先进的方法。

英文摘要

Graph-based models have achieved great success in person re-identification tasks recently, which compute the graph topology structure (affinities) among different people first and then pass the information across them to achieve stronger features. But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task. The number of two modalities data are balanced in the training stage, but extremely unbalanced in inference, causing the low generalization of graph-based VI-ReID methods. 2) sub-optimal topology structure caused by the end-to-end learning manner to the graph module. We analyze that the well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. In this paper, we propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems. Specifically, a Homogeneous and Heterogeneous Feature Transfer (H2FT) is designed to reduce the train-test modality balance gap by two independent types of well-designed graph modules and an unbalanced scenario simulation. Besides, a Counterfactual Relation Intervention (CRI) is proposed to utilize the counterfactual intervention and causal effect tools to highlight the role of topology structure in the whole training process, which makes the graph topology structure more reliable. Extensive experiments on standard VI-ReID benchmarks demonstrate that CIFT outperforms the state-of-the-art methods under various settings.

2208.12389 2026-06-02 cs.LG cs.AI

Static Seeding and Clustering of LSTM Embeddings to Learn from Loosely Time-Decoupled Events

LSTM嵌入的静态播种与聚类以从松散时间解耦事件中学习

Christian Manasseh, Razvan Veliche, Jared Bennett, Hamilton Clouse

发表机构 * Air Force Research Lab (AFRL) Autonomy Capability Team 3 (ACT3)(美国空军研究实验室(AFRL)自主能力团队3(ACT3))

AI总结 提出通过静态数据播种LSTM生成嵌入并聚类,以改进松散时间解耦时间序列预测,在COVID-19县级病例预测中提升10日移动平均精度。

详情
AI中文摘要

人类从不同时间和地点发生的事件中学习,以预测相似的事件轨迹。我们将松散解耦时间序列(LDT)现象定义为两个或多个可能发生在不同地点和不同时间线上,但在事件性质和位置属性上具有相似性的事件。在这项工作中,我们改进了循环神经网络(RNN),特别是长短期记忆(LSTM)网络的使用,以使AI解决方案能够为LDT生成更好的时间序列预测。我们基于趋势使用时间序列之间的相似性度量,并引入表示这些趋势的嵌入。嵌入表示事件的属性,与LSTM结构结合,可以聚类以识别相似的、时间上未对齐的事件。在本文中,我们探索了从与LSTM建模的地球物理和人口现象相关的时间不变数据中播种多变量LSTM的方法。我们将这些方法应用于从COVID-19检测感染和死亡病例中得出的时间序列数据。我们使用公开的社会经济数据来播种LSTM模型,创建嵌入,以确定这种播种是否改善了病例预测。这些LSTM产生的嵌入被聚类,以识别用于预测演变时间序列的最佳匹配候选。应用这种方法,我们在美国县级疾病传播的10日移动平均预测中显示出改进。

英文摘要

Humans learn from the occurrence of events in a different place and time to predict similar trajectories of events. We define Loosely Decoupled Timeseries (LDT) phenomena as two or more events that could happen in different places and across different timelines but share similarities in the nature of the event and the properties of the location. In this work we improve on the use of Recurring Neural Networks (RNN), in particular Long Short-Term Memory (LSTM) networks, to enable AI solutions that generate better timeseries predictions for LDT. We use similarity measures between timeseries based on the trends and introduce embeddings representing those trends. The embeddings represent properties of the event which, coupled with the LSTM structure, can be clustered to identify similar temporally unaligned events. In this paper, we explore methods of seeding a multivariate LSTM from time-invariant data related to the geophysical and demographic phenomena being modeled by the LSTM. We apply these methods on the timeseries data derived from the COVID-19 detected infection and death cases. We use publicly available socio-economic data to seed the LSTM models, creating embeddings, to determine whether such seeding improves case predictions. The embeddings produced by these LSTMs are clustered to identify best-matching candidates for forecasting an evolving timeseries. Applying this method, we show an improvement in 10-day moving average predictions of disease propagation at the US County level.

2203.03768 2026-06-02 cs.CV

CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability

CrowdFormer: 改进泛化性的弱监督人群计数

Siddharth Singh Savner, Vivek Kanhangad

发表机构 * Department of Electrical Engineering, Indian Institute of Technology Indore, India(印度理工学院印度尔分校电子工程系)

AI总结 提出基于金字塔视觉变换器的弱监督人群计数方法,通过全局上下文建模实现与现有方法相当的性能并展现显著泛化性。

详情
Journal ref
Journal of Visual Communication and Image Representation, vol. 94, article 103853, 2023
AI中文摘要

卷积神经网络(CNN)由于其强大的局部特征学习能力,在计算机视觉领域主导了近十年。然而,由于感受野有限,CNN无法建模全局上下文。另一方面,基于注意力的变换器可以轻松建模全局上下文。尽管如此,目前关于变换器在人群计数中有效性的研究仍然有限。此外,现有的大多数人群计数方法基于密度图回归,这需要对场景中每个人进行点级标注。这种标注任务既费力又容易出错。这导致了对仅需要计数级标注的弱监督人群计数方法的关注增加。在本文中,我们提出了一种使用金字塔视觉变换器的弱监督人群计数方法。我们进行了广泛评估以验证所提出方法的有效性。我们的方法在基准人群数据集上与最先进方法相当。更重要的是,它表现出显著的泛化性。

英文摘要

Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade due to their strong ability to learn local features. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformer, an attention-based architecture can model the global context easily. Despite this, there are limited studies that investigate the effectiveness of transformers in crowd counting. In addition, the majority of the existing crowd counting methods are based on the regression of density maps which requires point-level annotation of each person present in the scene. This annotation task is laborious and also error-prone. This has led to increased focus on weakly-supervised crowd counting methods which require only the count-level annotations. In this paper, we propose a weakly-supervised method for crowd counting using a pyramid vision transformer. We have conducted extensive evaluations to validate the effectiveness of the proposed method. Our method is comparable to the state-of-the-art on the benchmark crowd datasets. More importantly, it shows remarkable generalizability.

2606.02507 2026-06-02 cond-mat.mtrl-sci cs.ET cs.LG physics.app-ph physics.comp-ph

Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design

迈向自动发现:逆向材料设计中生成模型、多模态学习与闭环工作流综述

Anand Babu, Rogério Almeida Gouvêa, Gian-Marco Rignanese

发表机构 * Institute of Condensed Matter and Nanosciences, Université Catholique de Louvain(凝聚态与纳米科学研究所,比利时列日-努瓦尔桑大学) WEL Research Institute(WEL研究机构)

AI总结 本文综述了逆向材料设计中生成晶体结构建模、多模态学习和闭环设计管道的最新进展,重点讨论了可行性约束与物理先验的施加方式、多模态融合策略以及多种逆向设计策略(如条件生成与潜在优化、贝叶斯优化、强化学习和主动学习),并指出了常见失败模式及基于分阶段报告的评估实践。

详情
AI中文摘要

逆向材料设计将材料发现从正向预测转变为在物理约束下满足目标的有针对性的候选材料提出。在此,我们回顾了晶体固体中生成晶体结构建模、多模态学习和闭环设计管道的最新进展。我们调查了现代生成器如何从大型数据库中学习化学-结构先验,以实现周期性结构的可控采样,并比较了包括变分自编码器、归一化流、自回归公式和扩散模型在内的主要模型类别。特别关注如何通过表示选择、训练目标、采样时指导以及生成后筛选和弛豫,在整个工作流中施加可行性约束和物理先验。我们还讨论了多模态学习如何融合多种材料模态,包括晶体结构、热力学、电子信息、显微镜、光谱学、加工背景和科学文本,以构建更通用、可迁移的化学空间表示。此外,考察了多种逆向设计策略,特别是那些将条件生成与潜在优化、贝叶斯优化、强化学习和主动学习相结合的策略。最后,我们强调了反复出现的失败模式,如代理利用、多样性崩溃、分布偏移和稳定性-可合成性差距,并基于有效性、新颖性、独特性、稳定性和成本的分阶段报告,概述了发现级评估实践。

英文摘要

Inverse materials design is shifting materials discovery from forward prediction to targeted proposal of candidates that satisfy objectives under physical constraints. Here, we review recent advances in generative crystal structure modeling, multimodal learning, and closed-loop design pipelines for crystalline solids. We survey how modern generators learn chemical-structural priors from large databases to enable controllable sampling of periodic structures, and compare leading model classes including variational autoencoders, normalizing flows, autoregressive formulations, and diffusion models. Particular attention is given to how feasibility constraints and physical priors are enforced across the workflow, through representation choices, training objectives, sampling-time guidance, and post-generation screening and relaxation. We also discuss how multimodal learning fuses diverse materials modalities, including crystal structures, thermodynamic, electronic information, microscopy, spectroscopy, processing context, and scientific text, to construct a more universal, transferable representation of chemical space. In addition, diverse inverse-design strategies are examined, particularly those that integrate conditional generation with latent optimization, Bayesian optimization, reinforcement learning, and active learning. Finally, we highlight recurring failure modes, such as surrogate exploitation, diversity collapse, distribution shift, and the stability-synthesizability gap, and outline discovery-grade evaluation practices based on staged reporting of validity, novelty, uniqueness, stability, and cost.

2606.02494 2026-06-02 cs.SE cs.AI

Monitoring Agentic Systems Before They're Reliable

在代理系统可靠之前对其进行监控

Marisa Ferrara Boston, Glen Hanson, Effi Georgala, JD Hudgens, Heather Frase

发表机构 * Reins AI USA(Reins AI美国公司) Veraitech USA(Veraitech美国公司)

AI总结 针对生产环境中代理系统因结构缺陷主导故障的问题,提出一种基于方差信号的三维度三范围监控与分类方法,并通过合成测试验证其有效性。

Comments 9 pages, 2 figures, 3 tables. Accepted to the Workshop on Agentic Software Engineering (AgenticSE), co-located with ACM CAIS 2026 (non-archival)

详情
AI中文摘要

进入生产环境的代理系统通常以部分集成的组件形式运行,其中结构缺陷(而非任务级错误)主导故障场景。在此成熟度下,任务级错误检测可能不可行:结构故障模式掩盖了任务级监控器旨在检测的信号。我们提出一种监控与分类方法,将代理系统评估分解为三个维度(质量、适用性、效率)和三个监控范围(运行内、跨运行、结构),使用方差作为表征信号。发现结果通过基于FMEA的严重性分类进行路由,将人类注意力集中在需要调查的子集上。我们在一个包含220次运行、120个文档包且受控错误注入的合成测试平台上进行评估。三个结果显现:监控范围决定故障类型——运行内监控器发现确定性阶段缺陷(CV=0.02),跨运行监控器发现随机集成后果(CV=1.25,24%为L2级),结构监控器以完全一致性识别集成缺口(CV=0.00)。注入的任务级错误与干净基线无法区分,证实结构缺陷掩盖了任务级信号。确定性分类将97%的发现路由至自动跟踪,仅留下2%反映可变行为的发现供人工调查。基于第一阶段证据,我们提出一个成熟度阶段模型,其中监控随着集成缺陷的解决从结构表征过渡到错误检测再到可靠性跟踪。该分类法、基于CV的范围表征和严重性模型在架构上可迁移至受监管行业中基于文档的多阶段代理工作流;具体校准是领域特定的。尽早部署监控:它发现的第一个问题就是最需要修复的问题。

英文摘要

Agentic systems entering production typically operate as partially integrated assemblies where structural defects, not task-level errors, dominate the failure landscape. At this maturity level, task-level error detection may be infeasible: structural failure modes mask the signal that task-level monitors are designed to detect.We present a monitoring and triage methodology that decomposes agentic system evaluation into three dimensions (quality, suitability, efficiency) at three monitoring scopes (within-run, cross-run, structural), using variance as a characterization signal. Findings are routed through severity classification adapted from FMEA, concentrating human attention on the subset that warrants investigation. We evaluate on a synthetic testbed of 220 runs across 120 document bundles with controlled error injection.Three results emerge. Monitor scope determines failure type: within-run monitors surface deterministic stage defects (CV = 0.02), cross-run monitors surface stochastic integration consequences (CV = 1.25, 24% at L2), and a structural monitor identifies an integration gap with perfect consistency (CV = 0.00). Injected task-level errors are indistinguishable from clean baselines, confirming structural defects mask task-level signal. Deterministic triage routes 97% of findings to automated tracking, leaving the 2% reflecting variable behavior for human investigation.We propose, on Stage 1 evidence, a maturity-staging model in which monitoring transitions from structural characterization to error detection to reliability tracking as integration defects resolve. The taxonomy, CV-based scope characterization, and severity model transfer architecturally to document-driven, multi-stage agentic workflows in regulated industries; specific calibrations are domain-specific. Deploy monitoring early: the first thing it finds is the most important thing to fix.

2606.02483 2026-06-02 cs.CR cs.AI cs.CL

Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools

幽灵工具调用:投机性智能体工具的发布时隐私保护

Bardia Mohammadi, Lars Klein, Akhil Arora, Laurent Bindschaedler

发表机构 * Max Planck Institute for Software Systems(马克斯·普朗克软件系统研究所) EPFL(苏黎世联邦理工学院) Aarhus University(阿arhus大学)

AI总结 针对工具增强型语言智能体投机性预发调用泄露用户意图的问题,提出投机性工具隐私契约,在发布时而非提交后保护隐私。

详情
AI中文摘要

工具增强型语言智能体投机性地发出可能的未来工具调用以隐藏延迟,但这些调用在智能体提交分支之前将推断出的用户意图泄露给外部服务。每个收到调用的外部观察者在智能体放弃分支后仍保留该披露。问题在于时机,而非授权:提交后的清理、只读限制或访问控制白名单都无法撤回观察者已持有的信息。我们将这些调用称为幽灵工具调用,并提出投机性工具隐私契约,这是一种运行时抽象,将提交前的观察视为与状态突变不同的第一类效应。我们在原型运行时中实现了该契约,并在三个语料库上评估了十二种策略。投机性调度增加了观察者能够推断用户意图的程度;事后过滤器、只读限制和访问控制白名单无法消除这种推断;只有那些在调度前改变或抑制投机性调用的参数或目标投影的发布时策略才能减少这种推断。

英文摘要

Tool-augmented language agents speculatively issue likely future tool calls to hide latency, but those calls leak inferred user intent to external services before the agent commits to the branch. Every external observer that received the call retains the disclosure after the agent abandons the branch. Timing is the issue, not authorization: no commit-time cleanup, read-only restriction, or access-control allow-list unsends what an observer already holds. We call these invocations ghost tool calls and propose Speculative Tool Privacy Contracts, a runtime abstraction that treats observation before commitment as a first-class effect, distinct from state mutation. We implement the contracts in a prototype runtime and evaluate twelve policies across three corpora. Speculative dispatch increases what an observer can infer about user intent; post-hoc filters, read-only restrictions, and access-control allow-lists leave that inference intact; only issue-time policies that change or suppress the speculative call's argument or destination projection before dispatch reduce it.

2606.02448 2026-06-02 eess.SP cs.SD

Diffusion-Based Heart Sound Generation: Evaluation with Physiological Signal Metrics, Classifiers, and Expert Listening

基于扩散的心音生成:使用生理信号指标、分类器和专家听诊评估

Xinqi Bao, Jia Bi, Xin Chen, Ernest Nlandu Kamavuako, Saikat Chatterjee

发表机构 * Department of Information Science & Engineering, KTH Royal Institute of Technology(信息科学与工程系,皇家理工学院) Rutherford Appleton Laboratory(拉瑟福德·苹果顿实验室) Peng Cheng Laboratory(鹏城实验室) Department of Engineering, King’s College London(工程系,伦敦国王学院)

AI总结 提出一种在log-mel域上的类别条件扩散模型用于生成心音图,通过生理指标、下游分类准确率和专家听诊评估合成保真度,并分析了异常声学线索保留和重建伪影等挑战。

详情
AI中文摘要

公开可用的心音图(PCG)数据集在规模和病理多样性方面仍然有限,限制了听诊训练和自动心音分类器的泛化能力。本文在log-mel域上开发了一种用于PCG生成的类别条件扩散模型,并使用互补的(i)生理启发的合理性指标、(ii)下游标签一致性评估和(iii)专家听诊来评估合成保真度。实验使用Phy-sioNet/Computing in Cardiology Challenge 2016数据集(3240条记录)进行记录级划分。经过预处理和质量控制后,将16,749个不重叠的4秒片段映射到归一化的1×128×128 log-mel表示,以训练带有无分类器引导的条件2D U-Net去噪器。使用三个轻量级指标在重建波形上量化信号级合理性:包络自相关节律评分、基于幅度的爆炸评分和主周期滞后。合成片段保留了相似的主周期持续时间,但与真实片段相比,包络周期性降低,瞬态突发性增加。在下游评估中,ResNet-50分类器在保留的真实测试集上达到92.24%的准确率,在类别平衡的合成批次上达到82.8%的准确率,表明生成信号保留了与正常/异常分类相关的判别结构。在一项初步的专家听诊研究(60个片段,两名临床医生)中,大多数合成片段被判断为类似心音,而真实和合成的4秒片段对异常敏感性均较低。总体而言,结果为基于扩散的PCG生成提供了实用基线,同时突出了在保留异常声学线索和减少重建伪影方面的剩余挑战。

英文摘要

Publicly available phonocardiogram (PCG) datasets remain limited in size and pathological diversity, constraining both auscultation training and the generalisation of automated heart-sound classifiers. A class-conditional diffusion model for PCG generation is developed in the log-mel domain and synthetic fidelity is assessed using complementary (i) physiology-inspired plausibility metrics, (ii) downstream label-consistency evaluation, and (iii) expert listening. Experiments use the Phy-sioNet/Computing in Cardiology Challenge 2016 dataset (3240 recordings) with recording-level splits. After preprocessing and quality control, 16,749 non-overlapping 4 s clips are mapped to a normalised 1 x 128 x 128 log-mel representation to train a conditional 2D U-Net denoiser with classifier-free guidance. Signal-level plausibility is quantified on reconstructed waveforms using three lightweight metrics: an envelope-autocorrelation rhythm score, an amplitude-based explosion score, and the dominant cycle lag. Synthetic clips preserve similar dominant cycle durations but exhibit reduced envelope periodicity and increased transient burstiness relative to real clips. For downstream evaluation, a ResNet-50 classifier achieves 92.24% accuracy on the held-out real test set and 82.8% accuracy on class-balanced synthetic batches, indicating that generated signals retain discriminative structure relevant to normal/abnormal classification. In a pilot expert listening study (60 clips, two clinicians), most synthetic clips are judged as heart-sound-like, while abnormality sensitivity is low for both real and synthetic 4 s excerpts. Overall, the results provide a practical baseline for diffusion-based PCG generation while highlighting remaining challenges in retaining abnormal acoustic cues and reducing reconstruction-induced artefacts.

2606.02433 2026-06-02 cs.IR cs.AI cs.CL cs.LG cs.MA

ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning

ODTQA-FoRe:面向未来数据预测与推理的开放域表格问答数据集

Zhensheng Wang, Xiaole Liu, Wenmian Yang, Kun Zhou, Yiquan Zhang, Weijia Jia

发表机构 * School of Artificial Intelligence, Beijing Normal University(北京师范大学人工智能学院) Institute of Artificial Intelligence and Future Networks, Beijing Normal University(北京师范大学人工智能与未来网络研究院) Faculty of Arts and Sciences, Beijing Normal University(北京师范大学文理学院) Beijing Normal-Hong Kong Baptist University(北京师范大学-香港 Baptist大学)

AI总结 提出开放域表格问答的未来预测与推理任务,并构建首个覆盖时间序列预测和基于预测推理的数据集,通过基于LLM代理的TimeFore框架(检索器、预测器、分析器)解决历史数据检索、预测限制和响应标准化挑战。

Comments This paper has been accepted by Findings of ACL 2026

详情
AI中文摘要

大语言模型的快速发展显著推进了表格问答,但大多数系统无法进行面向未来的数值预测。为弥补这一空白,我们引入了一个新任务——面向未来数据预测与推理的开放域表格问答,并提出了首个覆盖时间序列预测和基于预测推理场景的数据集,使用房地产数据。该任务在检索精确历史数据、克服LLM的预测限制以及标准化多样化查询的响应方面提出了挑战。为解决上述挑战,我们提出了TimeFore,一个基于LLM代理的框架,将问题分解为三个协作角色:检索器自主生成SQL以获取数据,预测器调用外部时间序列模型以获得更高精度,分析器综合结果以构建精确且一致的最终答案。大量实验证明了我们TimeFore的有效性。

英文摘要

The rapid development of LLMs has significantly advanced tabular question answering, but most systems cannot perform future-oriented numerical prediction. To address this gap, we introduce a novel task, Open-Domain Tabular Question Answering for Future Data Forecasting and Reasoning, and propose the first dataset to cover time-series forecasting and forecast-based reasoning scenarios using real estate data. This task poses challenges in retrieving precise historical data, overcoming the forecasting limitations of LLMs, and standardizing responses for diverse queries. To solve the above challenges, we propose TimeFore, an LLM agent-based framework that decomposes the problem into three collaborative roles: a Retriever autonomously generates SQL to fetch data, a Forecaster invokes external time-series models for higher accuracy, and an Analyzer synthesizes the results to construct a precise and consistent final answer. Extensive experiments demonstrate the effectiveness of our TimeFore.

2606.02430 2026-06-02 cs.DC cs.AI

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

并非所有错误都平等:大型语言模型推理中错误传播的系统研究

Yafan Huang, Sheng Di, Guanpeng Li

发表机构 * University of Iowa(爱荷华大学) Argonne National Laboratory(阿贡国家实验室) University of Florida(佛罗里达大学)

AI总结 本研究通过提出的LLMFI故障注入框架,系统研究了软错误在大型语言模型推理中的传播机制,揭示了关键脆弱性模式,并提出了四种低开销的软件级可靠性改进方向。

Comments Accepted at ICS'26

详情
AI中文摘要

大型语言模型(LLM)日益集成到高性能计算(HPC)工作流中,通过代码生成和领域特定决策等多种视角加速科学发现。然而,软错误如何传播并影响LLM推理仍 largely unexplored。为弥补这一空白,我们提出了LLMFI——一个可配置且确定性的故障注入框架,并基于该框架对LLM推理中的错误传播进行了全面研究。我们系统地跨三个开放权重的LLM和十三个代表性任务(涵盖推理、多语言、数学和编码领域)注入故障。此外,我们进行了细粒度的案例研究,揭示了关键脆弱性模式。总体而言,我们的研究得出了17个要点,推进了对LLM推理中错误传播的理解,并提出了四种低开销的纯软件修改方向以提高可靠性,为未来的错误检测和缓解提供了实用指导。

英文摘要

Large language models (LLMs) are increasingly integrated into high-performance computing (HPC) workflows, accelerating scientific discovery through diverse perspectives such as code generation and domain-specific decision-making. Yet, how soft errors propagate and affect LLM inference remains largely unexplored. To bridge this gap, we present a comprehensive study on error propagation in LLM inference, enabled by our proposed LLMFI, a configurable and deterministic fault-injection framework. Using LLMFI, we systematically inject faults across three open-weighted LLMs and thirteen representative tasks, covering reasoning, multilingual, mathematical, and coding domains. In addition, we conduct fine-grained case studies that reveal critical vulnerability patterns. Overall, our study yields 17 takeaways that advance the understanding of error propagation in LLM inference and introduces four low-overhead directions to improve reliability through software-only modification, offering practical guidance for future error detection and mitigation.

2606.02427 2026-06-02 math.NA cs.LG cs.NA

Spectral Audit of In-Context Operator Networks

上下文算子网络的频谱审计

Zhiwei Gao, Liu Yang, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University(布朗大学应用数学系) Department of Mathematics, National University of Singapore(新加坡国立大学数学系)

AI总结 提出基于雅可比矩阵的频谱审计方法,通过分析上下文算子学习中的局部频谱特性(频率增益、相位结构、交叉模式耦合)来评估模型是否真正学习了PDE算子的局部动力学机制,而不仅仅是输出预测。

详情
AI中文摘要

现有的神经算子和上下文算子学习评估主要依赖于预测误差,但准确的输出预测并不能保证正确的局部动力学结构。一个模型可能匹配解,同时表现出不正确的敏感性、失真的频率响应、虚假的模式耦合或不稳定的切向行为。我们引入了一种基于雅可比矩阵的频谱审计方法,用于上下文算子学习。对于固定的提示,我们将网络输出对查询函数求导,并将得到的雅可比矩阵视为学习的切向算子。将其投影到傅里叶模式上,我们获得了推断算子的局部频谱特征,包括频率相关的增益、相位结构和交叉模式耦合。该审计通过测试模型是否再现底层PDE算子的局部机制(而不仅仅是输出)来补充标准预测指标。在多个基准测试中,审计揭示了不同的算子级现象,包括相位传输、粘度依赖的阻尼、非线性模式耦合和反应-扩散稳定性结构。它还检测了部分被预测误差指标隐藏的失败,包括高频退化、不正确的相位恢复和提示-算子不一致。即使逐点预测部分准确,损坏或内部不一致的提示也会导致切向算子结构退化。我们的结果表明,预测精度和局部算子保真度是学习到的神经算子的不同属性。我们的框架还为稳定性、灵敏度和算子一致性提供了诊断。

英文摘要

Existing evaluations of neural operators and in-context operator learning rely primarily on prediction error, but accurate output prediction does not guarantee the correct local dynamical structure. A model may match solutions while exhibiting incorrect sensitivities, distorted frequency response, spurious mode coupling, or unstable tangent behavior. We introduce a Jacobian-based spectral audit for in-context operator learning. For a fixed prompt, we differentiate the network output with respect to the query function and view the resulting Jacobian as a learned tangent operator. Projecting it onto Fourier modes, we obtain a local spectral characterization of the inferred operator, including frequency-dependent gains, phase structure, and cross-mode coupling. The audit complements standard prediction metrics by testing whether the model reproduces local mechanisms of the underlying PDE operator rather than only outputs. Across benchmarks, the audit reveals distinct operator-level phenomena, including phase transport, viscosity-dependent damping, nonlinear mode coupling, and reaction--diffusion stability structure. It also detects failures partially hidden by prediction-error metrics, including high-frequency degradation, incorrect phase recovery, and prompt--operator inconsistencies. Corrupted or internally inconsistent prompts lead to degraded tangent-operator structure even when pointwise predictions remain partially accurate. Our results suggest that prediction accuracy and local operator fidelity are distinct properties of learned neural operators. Our framework also provides a diagnostic for stability, sensitivity, and operator consistency.

2606.02418 2026-06-02 quant-ph cs.AI

Evolutionary Discovery of Bivariate Bicycle Codes with LLM-Guided Search

基于LLM引导搜索的双变量自行车码的进化发现

Juan Cruz-Benito, Andrew W. Cross, David Kremer, Ismael Faro

发表机构 * IBM Research(IBM研究院) IBM T. J. Watson Research Center(IBM T.J. 巴特勒研究中心)

AI总结 提出一种LLM引导的进化工作流,通过变异生成双变量自行车码和扰动变体的Python程序,在约1650次迭代中筛选约2×10^5个候选码,发现了465个不同候选码,包括非CSS扰动码和CSS码,展示了LLM引导的程序进化在结构化量子码发现中的实用性。

详情
AI中文摘要

量子LDPC码的发现需要在大型代数设计空间中进行搜索,同时可靠地认证任何候选码的参数和等价类。我们引入了一种LLM引导的进化工作流,其中语言模型变异生成双变量自行车码和扰动双变量自行车码ansätze的Python程序。在五次活动中,系统执行了约1,650次进化迭代,筛选了约$2 \times 10^5$个候选码,需要约140小时的计算时间和约400美元的LLM推理成本。候选码通过一个分阶段验证流水线进行评估,该流水线结合了$\mathrm{GF}(2)$秩计算、距离估计和认证、混合整数线性规划、BLISS Tanner图去重、可分解性分析和局部Clifford等价检查。在块长度$n \leq 360$时,工作流识别出465个不同的候选码:97个CSS双变量自行车码和368个非CSS扰动变体。CSS搜索恢复了已知的高性能码,并找到了新的有限长度代表,包括一个不可分解的[[288,16,12]]码和更高权重的码,在距离$d = 8$时最多有$k = 50$。非CSS搜索产生了在[[144,12,12]]处匹配总码品质因子的扰动码,以及根据MILP状态报告为认证值或上界的额外高距离候选码。总体而言,这些结果表明,当与独立评估配对时,LLM引导的程序进化可以作为一种实用的结构化量子码发现工具。

英文摘要

Quantum LDPC code discovery requires searching large algebraic design spaces while reliably certifying the parameters and equivalence classes of any candidates found. We introduce an LLM-guided evolutionary workflow in which language models mutate Python programs that generate bivariate-bicycle and perturbed bivariate-bicycle code ansätze. Across five campaigns, the system performed approximately 1{,}650 evolutionary iterations, screened about $2 \times 10^5$ candidate codes, and required ${\sim}140$ hours of computation and ${\sim}$US\$400 in LLM inference cost. Candidate codes are evaluated through a staged validation pipeline combining $\mathrm{GF}(2)$ rank computation, distance estimation and certification, mixed-integer linear programming, BLISS Tanner-graph deduplication, decomposability analysis, and local-Clifford equivalence checks. At block length $n \leq 360$, the workflow identifies 465 distinct candidate codes: 97 CSS bivariate-bicycle codes and 368 non-CSS perturbed variants. The CSS search recovers known high-performing codes and finds new finite-length representatives, including an indecomposable [[288,16,12]] code and higher-weight codes with up to $k = 50$ at distance $d = 8$. The non-CSS search produces perturbed codes matching the gross-code figure of merit at [[144,12,12]], along with additional high-distance candidates reported as certified values or upper bounds according to MILP status. Overall, these results show that LLM-guided program evolution can serve as a practical tool for structured quantum-code discovery when paired with independent evaluation.

2606.02345 2026-06-02 stat.ML cs.LG

Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization

少即是多!关于经验成对损失估计/最小化的采样技术

Louise Davy, Stephan Clémençon, Charlotte Laclau

发表机构 * IDS, LTCI Télécom Paris Palaiseau, France(IDS、LTCI 雷电巴黎实验室,巴黎帕莱索,法国)

AI总结 本文利用调查采样技术,通过直接对成对样本进行采样而非单个观测,在保留少量信息的情况下实现与全量成对评估相当的估计或优化性能,为精度与计算成本之间提供了理论上有依据的权衡。

详情
AI中文摘要

许多机器学习问题,包括相似性学习、排序和聚类,都依赖于经验成对损失函数,其二次计算成本在大规模下迅速变得难以承受。我们展示了一种节俭的方法,通过利用调查采样技术,仅保留成对信息的一小部分,即可实现与使用所有成对数据相当的估计或优化性能。一个核心发现(理论和实验均支持)是,这种采样方案必须直接针对成对样本而非单个观测。特别地,对于高维向量(如视觉或图学习中的嵌入)之间的成对损失,使用合适的辅助信息为信息量大的成对样本分配更高的包含概率,可以获得接近全量成对评估的性能,从而在精度和计算成本之间提供了一种有原则且理论上有依据的权衡。

英文摘要

Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.

2606.02302 2026-06-02 cs.CR cs.AI

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

SeClaw: 面向自主代理评估的规范驱动安全任务合成

Hao Cheng, Changtao Miao, Tianle Song, Yin Wu, He Liu, Erjia Xiao, Junchi Chen, Xiaoyu Shi, Yichi Wang, Jing Yang, Taowen Wang, Jinhao Duan, Mengshu Sun, Peiyan Dong, Xuan Shen, Yang Cao, Renjing Xu, Kaidi Xu, Jindong Gu, Bo Zhang, Jize Zhang, Chenhao Lin, Philip Torr, Chao Shen

发表机构 * The Hong Kong University of Science and Technology(香港科技大学) Ant Digital Technologies, Ant Group(蚂蚁集团数字技术部) Xi’an Jiaotong University(西安交通大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) University of Oxford(牛津大学) City University of Hong Kong(香港城市大学) Institute of Science Tokyo(东京科学研究所) Zhejiang University(浙江大学) Massachusetts Institute of Technology(麻省理工学院) University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) Beijing University of Technology(北京理工大学)

AI总结 提出SeClaw框架,通过规范驱动的安全任务合成与基于执行的安全评估,实现对自主LLM代理在状态化环境中的安全风险的可扩展、可复现评估。

详情
AI中文摘要

自主LLM代理越来越多地在有状态环境中运行,访问工具、文件、内存和外部服务。虽然这些能力支持复杂的现实工作流,但它们也引入了难以通过现有评估捕获的安全风险。当前的代理安全基准通常依赖手动策划的任务,对新兴威胁的覆盖有限,并且主要关注最终结果而非导致不安全行为的执行过程。我们引入了SeClaw,一个结合规范驱动的安全任务合成与基于执行的安全评估的框架,用于自主代理。规范驱动的安全任务合成能够从结构化风险规范中可扩展且可控地构建安全任务,而SeClaw docker提供了一个标准化测试平台,用于评估代理在各种安全风险场景下的行为。该基准涵盖了由资源、用户任务、环境和内在代理行为引起的风险,并支持对不安全行为的轨迹感知评估,超越最终响应。通过桥接系统化的任务合成和可复现的安全评估,SeClaw为测量、诊断和比较自主LLM代理中的安全故障提供了实用基础。代码可在 https://github.com/seclaw-eval/seclaw-eval 获取。

英文摘要

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe behavior. We introduce SeClaw, a framework that combines specification-driven security task synthesis with execution-based security evaluation for Autonomous agents. Spec-driven security task synthesis enables scalable and controllable construction of security tasks from structured risk specifications, while SeClaw docker provides a standardized testbed for evaluating agent behavior under diverse safety-risk scenarios. The benchmark covers risks arising from resources, user tasks, environments, and intrinsic agent behaviors, and supports trajectory-aware assessment of unsafe actions beyond final responses. By bridging systematic task synthesis and reproducible security evaluation, SeClaw provides a practical foundation for measuring, diagnosing, and comparing security failures in autonomous LLM agents. The code is available at https://github.com/seclaw-eval/seclaw-eval.

2606.02301 2026-06-02 cs.HC cs.AI cs.CV

Quantitative Movement Testing: Measuring Patient Movements from a Single Smartphone Video

定量运动测试:从单部智能手机视频测量患者运动

Pranav Mahajan, Amanda Wall, Eleonora Maria Camerone, Julie Stebbins, Eoin Kelleher, Shuangyi Tong, Annina Schmid, Katja Wiech, Anushka Irani, Ben Seymour

发表机构 * Nuffield Department of Clinical Neurosciences, University of Oxford(临床神经科学系,Nuffield大学,牛津大学) Max Planck Institute of Biological Cybernetics(生物信息学研究所) Oxford Gait Laboratory, University of Oxford(牛津大学步态实验室) Harvard Medical School(哈佛医学院) Massachusetts General Hospital(麻省总医院) Institute of Biomedical Engineering, University of Oxford(生物医学工程研究所,牛津大学) Mayo Clinic(梅奥诊所)

AI总结 提出基于计算机视觉的定量运动测试(QMT)方法,利用深度学习3D姿态估计从单目智能手机视频提取运动生物标志物,在实验室验证中与光学运动捕捉高度一致(r>0.85),并在纤维肌痛和慢性坐骨神经痛患者中展示了可靠性和纵向监测能力。

详情
AI中文摘要

慢性疼痛通过降低功能能力而损害生活质量,但在现实环境中客观测量这种功能影响仍然具有挑战性。虽然光学运动捕捉为评估运动质量改变提供了高精度,但成本高昂且局限于实验室环境。我们旨在开发并验证定量运动测试(QMT),这是一个从标准单目智能手机视频中提取3D运动生物标志物的计算机视觉流程,平衡临床可及性与生物力学精度。我们利用基于深度学习的3D姿态估计,在健康对照组(N=13)中针对金标准光学运动捕捉验证了QMT流程。经过留一法受试者校准以纠正系统偏差后,我们在两个前瞻性临床队列中部署QMT以评估现实世界效用:一项纤维肌痛患者的干预前后试验,以及一项慢性坐骨神经痛患者和健康对照的30天纵向家庭监测研究。在实验室验证中,QMT提取的临床运动指标与光学运动捕捉高度一致,显示出强相关性(r>0.85)和低平均绝对误差。QMT在纤维肌痛患者中显示出高重测信度(r>0.86),并成功追踪了慢性坐骨神经痛患者的日常运动波动。虽然现实家庭环境引入了比实验室环境更高的测量方差,但QMT完全基于远程记录发现了健康对照组和坐骨神经痛患者之间的组级差异。单目3D姿态估计为传统评估提供了一种可扩展的替代方案。QMT为临床试验中跟踪疾病进展和治疗反应提供了客观、可及的生物标志物,但需要进一步研究以优化家庭环境中的可靠性。

英文摘要

Chronic pain diminishes quality of life by decreasing functional ability, yet objectively measuring this functional impact remains challenging in real-world settings. While optical motion capture provides high precision for assessing altered movement quality, it is costly and restricted to laboratory environments. We aimed to develop and validate Quantitative Movement Testing (QMT), a computer vision pipeline extracting 3D kinematic biomarkers from standard monocular smartphone video, balancing clinical accessibility with biomechanical accuracy. We validated the QMT pipeline, utilising deep learning-based 3D pose-estimation, against gold-standard optical motion capture in healthy controls (N=13). Following leave-one-subject-out calibration to correct systematic bias, we deployed QMT in two prospective clinical cohorts to assess real-world utility: a pre- and post-intervention trial for fibromyalgia patients, and a 30-day longitudinal at-home monitoring study of chronic sciatica patients and healthy controls. In laboratory validation, QMT extracted clinical kinematic metrics with high agreement to optical motion capture, yielding strong correlations (r > 0.85) and low mean absolute errors. QMT demonstrated high test-retest reliability (r > 0.86) in fibromyalgia patients and successfully tracked day-to-day movement fluctuations in chronic sciatica. While real-world home settings introduced higher measurement variance than lab settings, QMT found group-level differences between healthy controls and sciatica patients based entirely on remote recordings. Monocular 3D pose estimation offers a scalable alternative to traditional assessments. QMT provides an objective, accessible biomarker for tracking disease progression and treatment response in clinical trials, though further research is needed to optimise reliability in home environments.

2606.02278 2026-06-02 eess.SY cs.LG cs.SY

Physics-Guided Recurrent State-Space Neural Networks for Multi-Step Prediction

物理引导的循环状态空间神经网络用于多步预测

Ruiyuan Li, Ajay Seth, Manon Kok

发表机构 * Delft Center for Systems and Control, TU Delft, the Netherlands(代尔夫特系统与控制中心,代尔夫特理工大学,荷兰) Department of Biomechanical Engineering, TU Delft, the Netherlands(生物力学工程系,代尔夫特理工大学,荷兰)

AI总结 提出PG-RSSNN,一种结合物理知识和循环结构的状态空间神经网络,通过缓解梯度消失和数值发散风险,在有限数据和部分物理模型下提升多步预测性能。

Comments 6 pages, 3 figures. Accepted at IFAC World Congress 2026

详情
AI中文摘要

状态空间模型传统上基于物理知识,但由于模型不准确,这些物理模型的多步预测可能较差。黑盒深度学习作为替代方案显示出潜力,但这些方法依赖于大量数据集的可用性,且潜在可用的物理知识被忽略。我们提出PG-RSSNN,一种物理引导的循环状态空间神经网络,它结合循环结构以在多步预测中使用非饱和激活函数。它缓解了梯度消失,并消除了现有结构中因反馈状态估计而导致的训练数值发散风险。在多个具有不同物理模型不完善性的系统上(从带高斯噪声的线性状态空间模型到机械臂和级联水箱系统)的实验结果表明,与黑盒神经网络和纯物理模型相比,所提出的PG-RSSNN即使在训练数据有限且物理模型仅部分已知的情况下,也能保持稳定的训练行为,并改善多步预测。

英文摘要

State-space models are traditionally based on physical knowledge, but multi-step predictions from these physical models can be poor due to model inaccuracy. Black-box deep learning has shown promise as an alternative. However, these methods rely on the availability of large datasets and potentially available physical knowledge is neglected. We propose the PG-RSSNN, a physics-guided recurrent state-space neural network that incorporates recurrent structures to enable the use of non-saturating activation functions in multi-step prediction. It mitigates the vanishing gradients and eliminates the risk of numerical divergence in training seen in existing structures that feed back state estimates. Results across multiple systems with various physical model imperfections, from linear state-space models with Gaussian noise to a robotic arm and a cascaded water tank system, show that the proposed PG-RSSNN maintains stable training behavior, and improves multi-step predictions, as compared with black-box neural networks and physics-only models, even with limited training data and when physical models are only partially known.

2606.02228 2026-06-02 stat.ML cs.CV cs.LG

Bayesian meta-learning for modeling Alzheimer's disease progression

贝叶斯元学习用于阿尔茨海默病进展建模

Clara Hoffmann, Nadja Klein

发表机构 * Scientific Computing Center, Karlsruhe Institute of Technology, Germany(卡尔斯鲁厄理工学院科学计算中心,德国) Alzheimer’s Disease Neuroimaging Initiative(阿尔茨海默病神经影像计划)

AI总结 提出贝叶斯元学习方法,利用个体历史MRI体积和疾病轨迹预测疾病评分分布,无需重新训练即可动态预测,并减少长期预测的过度自信。

详情
AI中文摘要

预测阿尔茨海默病患者将经历轻度还是重度疾病进展对于个性化治疗至关重要。通常,临床医生试图预测离散疾病评分的分布,条件是个体当前的MRI体积及其历史疾病轨迹。经典的统计回归模型和单任务神经网络不适合此目的,因为拟合单独模型不可行(每个个体通常只有少量观测),而忽略个体间相关性会导致泛化能力差。相比之下,元学习提供了一种自然的方法来动态预测分布,无需重新训练,并能建模结果与协变量之间的非线性关系。受此启发,我们提出了一种贝叶斯元学习器,它在多个个体上训练,但根据每个个体的历史数据定制预测的疾病评分分布。我们的模型无需重新训练即可预测未见过的个体,与历史观测数量呈线性扩展,并且在预测长期疾病评分时,与确定性对应模型相比,保证更少的过度自信。在阿尔茨海默病神经影像学倡议(ADNI)数据库的真实世界数据上,我们的模型在性能上与单任务模型和确定性元学习器相当,同时在预测长期疾病进展时显著提高了性能。

英文摘要

Predicting whether an individual with Alzheimer's disease will experience mild or severe disease progression is essential for personalized treatment. Typically, practitioners seek to predict the distribution of a discrete disease score, conditional on an individual's current MRI volume and their historical disease trajectory. Classical statistical regression models and single-task neural networks are not well-suited for this purpose because fitting separate models is infeasible (since each individual typically has few observations), while ignoring individual-level correlation leads to poor generalization. Meta-learning, in contrast, provides a natural avenue to dynamically predict distributions without retraining and model nonlinear relationships between the outcome and covariates. Motivated by this, we propose a Bayesian meta-learner that is trained on multiple individuals but tailors the predictive disease score distribution to each individual's historical data. Our model predicts on unseen individuals without retraining, scales linearly with the number of historical observations, and is guaranteed to be less overconfident when predicting long-term disease scores compared to its deterministic counterpart. On real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, our model achieves performance competitive with both single-task models and deterministic meta-learners, while substantially improving performance when predicting long-term disease progression.

2606.02184 2026-06-02 cs.DL cs.LG

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

幽灵搭档:相关的大语言模型姓名先验及其对网络和学术出版的困扰

Michał Brzozowski, Neo Christopher Chung

发表机构 * Samsung AI Center(三星人工智能中心) University of Warsaw(华沙大学)

AI总结 研究发现大语言模型生成虚构专家姓名时会产生相关性强的角色组合,这些组合具有模型家族特异性,并在Zenodo等平台造成大量幽灵作者记录,影响学术出版。

详情
AI中文摘要

这些名字并不存在。Elena Vasquez 和 Marcus Chen 作为火山专家、宇航员、惊悚小说主角、播客主持人和学术合著者,出现在数百个独立生成的AI生成文档中,却从未存在过。我们表明,大语言模型在生成虚构专家时不仅仅默认使用高概率的单个名字:它们会产生相关的角色组合、配对和三人组,其共现频率远超偶然,并且在独立生成中保持一致。这些先验是模型家族特定的(Claude:Elena Vasquez + Marcus Chen + Amara Okafor;Gemini:Aris Thorne + Lena Petrova;GPT:Elara Voss 无固定搭档)、版本特定的,并且在模型发布边界处被主动抑制,在它们生成的内容中留下可定时的行为指纹。我们记录了一个大规模的下游后果。在Zenodo(一个由CERN运营的、生成真实DataCite DOI的存储库)上,我们识别出1,655条幽灵作者记录,声称不存在的期刊并带有捏造的出版日期:服务器端的DataCite时间戳证明了故意的回溯日期,其中991条记录在一个月内注册;这些记录携带在DataCite中注册的真实DOI,因此任何摄取DOI元数据的学术聚合器都可以获取它们。幽灵名字还出现在ResearchGate上,形成由来自多个模型家族的合作者组成的合成研究小组;这些记录上的出版日期为模型部署窗口提供了可靠的时间代理。

英文摘要

These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.

2606.02156 2026-06-02 eess.IV cs.AI cs.CV cs.IR cs.LG

Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel

基于术前肠道血供映射预测结直肠吻合口漏风险

Zahra Tabatabaei, Jon Sporring, Mark Bremholm Ellebæk, Alaa El-Hussuna

发表机构 * Computer Science Department, Københavns Universitet (KU)(哥本哈根大学计算机科学系) University of Southern Denmark(南部丹麦大学) Odense University Hospital(奥登塞大学医院) OpenSourceResearch Collaboration(开源研究协作)

AI总结 提出一种基于术前CT影像的AI驱动系统,通过分析血管和组织特征量化吻合口漏风险,并结合内容检索支持临床决策。

详情
AI中文摘要

吻合口漏仍然是结直肠癌手术后最严重的并发症之一,显著影响患者预后、康复轨迹和医疗成本。尽管影像技术有所进步,目前的术前评估仍依赖临床评估,这一过程主观、易出错且高度依赖个人经验。迄今为止,尚无经过验证的基于CT的方法能够在术前预测吻合口漏风险。本方案论文概述了一个全面的框架,用于开发和验证一个AI驱动的系统,该系统利用对比增强前后的CT影像进行术前风险评估。研究描述了数据收集、伦理处理、符合GDPR的患者数据预处理、图像预处理以及旨在生成临床可解释输出的深度学习架构探索等阶段。该工作流程的两个主要成果是:1) 风险评估模块,通过分析CT扫描中的血管和组织特征量化漏液可能性;2) 基于内容的医学图像检索(CBMIR)模块,识别并显示相似历史病例以支持循证手术决策。该方案论文需要医院和大学之间的密切合作;本方案表明,此类系统在现有医疗基础设施内技术上可行且临床可实施。通过遵循所提出的方法论阶段和监管原则,其他机构可以复制此工作流程以开发类似的决策支持工具。最终,这一跨学科框架旨在加强手术规划、减少漏液发生率,并推动向可解释、数据驱动的精准手术的更广泛范式转变。

英文摘要

Anastomotic leak remains one of the most serious complications following colorectal cancer surgery, substantially affecting patient outcomes, recovery trajectories, and healthcare costs. Despite advances in imaging technology, current preoperative assessment relies only on clinical assessment, a process that is subjective, error-prone, and highly dependent on individual expertise. To date, no validated CT-based method exists to predict anastomotic leak risk prior to surgery. This protocol paper outlines a comprehensive framework for developing and validating an AI-driven system for preoperative risk assessment using pre- and post-contrast CT imaging. The study describes the stages of data collection, ethical handling, and preprocessing of patient data in accordance with GDPR, image preprocessing, and the exploration of deep learning architectures designed to generate clinically interpretable outputs. Two integrated tools constitute the main deliverables of this workflow: 1) a risk assessment module, which quantifies the likelihood of leakage by analyzing vascular and tissue features in CT scans, and 2) a Content-Based Medical Image Retrieval (CBMIR) module, which identifies and displays similar historical cases to support evidence-based surgical decision making. The protocol paper requires close collaboration between hospitals and universities; this protocol demonstrates that such a system is technically feasible and clinically implementable within existing healthcare infrastructures. By following the proposed methodological stages and regulatory principles, other institutions can reproduce this workflow to develop analogous decision-support tools. Ultimately, this interdisciplinary framework aims to enhance surgical planning, reduce leak incidence, and contribute to a broader paradigm shift toward explainable, data-driven precision surgery.

2606.02127 2026-06-02 eess.AS cs.SD

Localizing broadband noise sources using the Loève spectrum and a 2.5D approach

使用Loève谱和2.5D方法定位宽带噪声源

Christian H. Kasess, Wolfgang Kreuzer, Holger Waubke

发表机构 * OeAW(奥埃阿维)

AI总结 针对移动宽带随机声源定位问题,提出一种基于2.5D设置和Loève谱的逆定位方法,推导了移动源功率谱密度与静态接收器Loève谱的关系,并通过多窗估计实现源定位。

Comments 31 pages, 13 figures

详情
AI中文摘要

使用麦克风阵列定位移动声源通常基于修改信号以补偿多普勒效应。在时域中,这种补偿是逐样本进行的。在频域中,需要使用短时间片段,其中假设多普勒效应近似恒定,并对每个片段进行离散傅里叶变换。相比之下,作者开发了一种针对均匀移动单频源的逆2.5D定位方法,该方法在谱域中工作,并允许使用更长的窗口。这是通过修改2.5D正向模型以直接计算运动在静态观察者位置的影响来实现的。该方法既不需要修改测量信号,也不需要在所使用的窗口内要求测量准平稳。不幸的是,这种方法不直接适用于宽带随机源,在本文中,我们将研究均匀移动随机源在静态观察者处观测时其统计特性如何变化。使用2.5D设置,推导了移动源功率谱密度与静态接收器处互谱密度推广形式——Loève谱之间的关系。基于速度高达100 m/s的模拟数据,本文提供了一种基于多窗估计Loève谱的方法的概念验证,用于定位移动宽带随机源。目前,该方法要求源信号平稳,并且谱密度在感兴趣频率附近的一定范围内平坦。此外,目前不考虑源之间的相关性。

英文摘要

The localization of moving sound sources using a microphone array is typically based on modifying the signal to compensate for the Doppler effect. In the time domain this compensation is done on a sample-by-sample basis. In the frequency domain short time segments need to be used in which the Doppler effect is assumed to be approximately constant and a discrete Fourier transform is done on each segment. In contrast, the authors developed an inverse 2.5D localization method for uniformly moving single-frequency sources that works in the spectral domain and allows for the use of longer windows. This was achieved by modifying the 2.5D forward model to directly compute the effect of the motion in the static observer position. The method does neither require to modify the measured signal nor does it require quasi-stationary of the measurements within the window used. Unfortunately, this approach is not directly suitable for broad-band stochastic sources, and in the present work we will investigate how the statistical properties of a uniformly moving stochastic source change when observed at a static observer. Using a 2.5D setting, the relation between the power spectral density of the moving source and the Loève spectrum, which is a generalization of the cross-spectral density at the static receivers, was derived. Based on simulated data with speeds up to 100 m\,s$^{-1}$, the work presented here provides a proof of concept for a method based on multi-taper estimates for the Loève spectrum to localize moving broad-band stochastic sources . Currently, the method requires a stationary source signal and that the spectral density is flat within a certain range around the frequency of interest. Also, correlations between sources are currently not considered.

2606.02115 2026-06-02 stat.ML cs.LG

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

发表机构 * Basque Center for Applied Mathematics (BCAM)(巴斯克应用数学中心) Centre for AI Fundamentals & Department of Computer Science(人工智能基础研究中心及计算机科学系) University of Manchester, UK(英国曼彻斯特大学)

AI总结 针对随机微分方程中已知扩散参数时的漂移估计问题,利用扩散模型理论推导了时间平均均方误差的显式风险界,将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

Comments Preprint

详情
AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题,在许多科学领域具有重要意义。Tapia Costa等人(2026)的最新工作引入了一种新技术,当扩散参数已知时,利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题,并利用(条件)得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果,但其估计器的理论保证问题仍未解决。在本笔记中,我们通过利用扩散模型理论的技术来填补这一空白。更具体地说,我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为(i)Euler-Maruyama离散化,(ii)得分/去噪器近似,(iii)噪声初始化,以及(iv)采样方差,揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

2606.02101 2026-06-02 stat.ML cs.LG stat.AP

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实:来自粗化边际的安全合成数据

Gillian M Raab

发表机构 * University of Edinburgh(爱丁堡大学) Scottish Centre for Administrative Data Research(苏格兰行政数据研究中心)

AI总结 提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法,确保透明性和无披露风险。

详情
AI中文摘要

本文提出了一种创建合成数据的方法,与当前可用的其他方法相比,该方法对用户有两个重要优势。首先是透明性;与其他方法不同,接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后,每个边际将根据数据保管者定义的标准进行统计披露控制,例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

2606.02092 2026-06-02 eess.IV cs.AI cs.CV

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE:用于土地覆盖估计的轻量级Transformer架构

Ümit Mert Çağlar, Alptekin Temizel

发表机构 * Middle East Technical University(中亚技术大学)

AI总结 提出LALE架构,通过分辨率分支编码器(轻量级ConvMixer处理高分辨率局部特征,Transformer处理低分辨率全局上下文)和全MLP多尺度解码器,在遥感图像分割中实现高效性能与计算成本的平衡。

详情
AI中文摘要

遥感图像的语义分割需要模型在严格的计算预算下同时捕捉全局上下文和局部细节。先前的工作通常针对这些轴之一进行优化:注意力用于全局上下文,卷积用于局部细节,或紧凑性用于效率。虽然混合方法旨在同时捕捉两者,但它们需要架构更改和带有计算开销的编码器骨干,限制了效率和性能。我们提出了LALE(用于土地覆盖估计的轻量级Transformer架构),一种端到端的遥感图像分割架构,它通过分辨率分支编码器:轻量级ConvMixer阶段处理高分辨率局部特征,而Transformer阶段处理低分辨率全局上下文,将自注意力的二次成本限制在深层、下采样的特征图上。全MLP多尺度解码器,以及贯穿始终的RMSNorm和StarReLU,进一步减少了计算量和参数数量。在大型ARAS400k遥感分割基准上,LALE相对于CNN、Transformer和混合基线建立了强大的效率-性能权衡。我们最小的变体(仅1.6M参数)在F1分数上达到最佳基线(UPerNet)的2.6分以内,同时使用4.5倍更少的参数、7倍更少的存储、17倍更少的GMACs,并提供1.8倍更高的吞吐量。

英文摘要

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

2606.02047 2026-06-02 stat.ML cs.LG math.ST stat.ME stat.TH

Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation

凸距离算子传输:一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

发表机构 * KAIST(韩国科学技术院)

AI总结 提出凸距离算子传输(CDOT),通过算子正则化联合保持特征对应与内在几何结构,实现异质分布对齐,并证明其伪度量性质及与Gromov-Wasserstein的关系。

Comments This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026

详情
AI中文摘要

我们引入了凸距离算子传输(CDOT),这是第一个凸最优传输框架,通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说,CDOT采用基于算子的正则化,通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此,所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外,我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein(GW)之间的关系,正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下,我们推导了一个非渐近风险界,分解为优化误差和统计误差,并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明,该方法优于现有方法,在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

2606.01987 2026-06-02 cs.DM cs.LG

Graph Edit Distance Formulation for the Vehicle Routing Problem: Theory and Analysis

车辆路径问题的图编辑距离公式:理论与分析

Adel Dabah

发表机构 * Forschungszentrum Jülich(耶鲁斯研究中心)

AI总结 本文提出将车辆路径问题重新表述为图编辑距离最大化问题,通过边删除成本模型实现总路线成本最小化,并利用该公式进行结构分析和基准测试。

详情
AI中文摘要

我们证明车辆路径问题(VRP)可以重新表述为图编辑距离(GED)最大化问题。在简单的边删除成本模型下,最小化总路线成本等价于从完整实例图中删除的边的总权重最大化。该公式在边级别对VRP进行建模,其中解由选定的边而非路线序列定义,从而能够进行经典公式中难以实现的结构分析:解质量的每条边归因、最优性差距的分解、解稀疏性的刻画以及贪婪构造难以到达的边的识别。理论上,我们建立了一个合并-分解定理,表明Clarke-Wright节省等于每次合并的GED增量,以及一个近似转移定理,将GED近似比转化为VRP成本界限。利用这一重新表述,我们分析了90个已知最优解的CVRP基准实例。我们发现最优路由图仅使用5.5%的可用边,约3.0%的最优边在重复重启下始终未被Clarke-Wright启发式找到,并且成本差距分解为遗漏的最优边和替代的非最优边,两者总权重相当。边加性目标为未来的图神经网络边预测方法提供了自然的每条边监督信号,暗示了与图神经网络方法的潜在联系,这留待后续工作。

英文摘要

We show that the Vehicle Routing Problem (VRP) can be reformulated as a Graph Edit Distance (GED) maximization problem. Under a simple edge-deletion cost model, minimizing total route cost is equivalent to maximizing the total weight of edges deleted from the complete instance graph. This formulation models VRP at the edge level, where solutions are defined by selected edges rather than route sequences, enabling structural analyses that are difficult in classical formulations: per-edge attribution of solution quality, decomposition of the optimality gap, characterization of solution sparsity, and identification of edges that are hard to reach by greedy construction. Theoretically, we establish a merge-decomposition theorem showing that Clarke-Wright savings equal per-merge GED increments, and an approximation-transfer theorem that turns GED approximation ratios into VRP cost bounds. Using this reformulation, we analyze 90 CVRP benchmark instances with known optimal solutions. We find that optimal routing graphs use only 5.5% of available edges, that approximately 3.0% of optimal edges are consistently not found by Clarke-Wright heuristics under repeated restarts, and that the cost gap decomposes into missed optimal edges and substituted non-optimal edges of comparable total weight. The edge-additive objective provides a natural per-edge supervision signal for future graph neural network approaches to edge prediction, suggesting a potential connection to graph neural network approaches that we leave for follow-up work.

2605.03384 2026-06-02 cs.CR cs.SD

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

DECKER: 跨键盘提取与识别的域不变嵌入

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal, Arun Balaji Buduru

发表机构 * IIIT-Delhi(印度德里理工学院) Guru Gobind Singh Indraprastha University(戈克辛格印度教大学)

AI总结 针对键盘声学侧信道攻击的跨键盘、跨用户和噪声环境泛化问题,提出包含四阶段域不变击键推理框架DECKER,并构建了多维度数据集HEAR,实验表明该方法在跨键盘和跨用户场景下显著提升击键识别性能。

Comments Accepted to AsiaCCS'26

详情
AI中文摘要

键盘上的声学侧信道攻击(ASCA)构成了重大的安全风险,因为击键可以从打字声音中推断出来,从而泄露敏感信息。先前的ASCA研究受限于小规模数据集,在用户、键盘和环境方面的多样性不足,限制了跨设备、麦克风和噪声条件的分析。我们引入了HEAR数据集,旨在沿着三个轴研究ASCA:键盘泛化、噪声适应和用户偏差。HEAR包含来自53名参与者使用37种笔记本电脑键盘的录音,在三种现实场景中收集:(1)外部麦克风捕获,(2)无网络噪声的设备麦克风捕获,以及(3)基于VoIP的流式捕获。这使得能够在用户、键盘和环境之间进行受控评估。在HEAR上,我们建立了一个ASCA基准,涵盖了单模态和多模态设置中来自原始音频和频谱图的传统特征和预训练表示。我们提出了DECKER,一个域不变的击键推理框架,包含四个阶段:(1)键盘签名归一化以减少设备着色,(2)域对抗解耦以抑制键盘身份,(3)有监督的跨键盘对比对齐以强制键一致性,以及(4)声学风格随机化以合成未见过的键盘响应。我们进一步探索了使用基于LLM的后处理层进行句子级推理,通过语言上下文优化击键序列。在HEAR上的结果表明,DECKER在跨键盘和跨用户设置中显著提高了击键识别性能,并通过语言模型校正进一步获得提升。这些发现强调,ASCA在多样化的用户、设备和噪声环境中仍然有效,凸显了其实际安全风险。

英文摘要

Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing sensitive information. Prior ASCA studies are limited by small-scale datasets with restricted diversity in users, keyboards, and environments, constraining analysis across devices, microphones, and noise conditions. We introduce HEAR, a dataset designed to study ASCA along three axes: keyboard generalization, noise adaptation, and user bias. HEAR contains recordings from 53 participants using 37 laptop keyboards, collected in three realistic settings: (1) external microphone capture, (2) device microphone capture without network noise, and (3) VoIP-based streaming capture. This enables controlled evaluation across users, keyboards, and environments. On HEAR, we establish an ASCA benchmark spanning conventional features and pre-trained representations from raw audio and spectrograms in unimodal and multimodal settings. We propose DECKER, a domain-invariant keystroke inference framework with four stages: (1) Keyboard Signature Normalization to reduce device coloration, (2) domain-adversarial disentanglement to suppress keyboard identity, (3) supervised cross-keyboard contrastive alignment to enforce key consistency, and (4) Acoustic Style Randomization to synthesize unseen keyboard responses. We further explore sentence-level inference using an LLM-based post-processing layer to refine keystroke sequences via linguistic context. Results on HEAR show DECKER improves keystroke identification over strong baselines, particularly in cross-keyboard and cross-user settings, with further gains from language-model rectification. These findings highlight that ASCA remains effective across diverse users, devices, and noisy environments, underscoring its practical security risk.

2606.01948 2026-06-02 cs.IR cs.AI

Rank-Constrained Deep Matrix Completion for Group Recommendation

面向群组推荐的秩约束深度矩阵补全

Mubaraka Sani Ibrahim, Lehel Csató, Isah Charles Saidu

发表机构 * Department of Computer Science, African University of Science and Technology(非洲科学与技术大学计算机科学系) Faculty of Mathematics and Computer Science, Babes-Bolyai University(巴纳特-博雅大学数学与计算机科学学院) Department of Computer Science, Baze University(贝泽大学计算机科学系)

AI总结 提出Group RC-DMC框架,通过Set-Transformer聚合器整合群组级表示学习,结合低秩结构和注意力非线性建模,实现个体与群组级别的准确预测。

详情
AI中文摘要

群体活动的日益普及增加了根据用户个体偏好向用户群组提供推荐的方法需求。许多现有的群组推荐系统依赖于聚合个体用户偏好,但通常难以处理现实场景中常见的高维且高度稀疏的评分数据。我们提出了群组秩约束深度矩阵补全(Group RC-DMC),这是一个新颖的框架,通过Set-Transformer聚合器整合群组级表示学习,扩展了RC-DMC,联合利用了低秩结构和基于注意力的非线性建模。与大多数现有群组推荐系统不同,Group RC-DMC在一个统一框架中融合了显式低秩正则化、线性编码器-解码器架构和基于注意力的非线性群组建模,在个体和群组级别都产生准确的预测。Group RC-DMC通过低秩矩阵补全解决数据稀疏性,仅从观测评分计算每个用户的潜在表示,并基于周期性奇异值阈值化使用核范数近端步骤对潜在空间施加秩约束。解码器被参数化为低秩分解,从而实现高效推理。在MovieLens和Goodbooks数据集上的实验结果表明,Group RC-DMC实现了优越的重建精度(以更低的群组RMSE衡量),同时在计算效率上保持竞争力,并且在群组级别的性能(精确率、召回率和F1分数)上与加权前分解(WBF)和加权后分解(AF)基线相当。结果突显了模型恢复用户-物品交互的底层低秩结构的能力,并为小、中、大用户群组提供稳健的群组推荐。

英文摘要

The growing popularity of group activities has increased the need for methods that provide recommendations to groups of users given their individual preferences. Many existing group recommender systems rely on aggregating individual user preferences, but they often struggle with high-dimensional and highly sparse rating data commonly found in real-world scenarios. We propose Group Rank-Constrained Deep Matrix Completion (Group RC-DMC), a novel framework that extends RC-DMC by integrating group-level representation learning via a Set-Transformer aggregator, jointly leveraging low-rank structure and attention-based nonlinear modeling. Unlike most existing group recommender systems, Group RC-DMC unifies explicit low-rank regularization, linear encoder-decoder architectures, and attention-based nonlinear group modeling within a single framework, yielding accurate predictions at both the individual and group levels. Group RC-DMC addresses data sparsity through low-rank matrix completion, computing per-user latent representations from observed ratings only, and enforcing a rank constraint on the latent space using a nuclear-norm proximal step based on periodic singular value thresholding. The decoder is parametrized as a low-rank factorization, enabling efficient inference. Experimental results on the MovieLens and Goodbooks datasets demonstrate that Group RC-DMC achieves superior reconstruction accuracy, measured by lower group RMSE, while remaining computationally efficient and competitive in group-level performance in terms of precision, recall, and F1 score compared with weighted-before-factorization (WBF) and after-factorization (AF) baselines. The results highlight the model's ability to recover the underlying low-rank structure of user-item interactions and provide robust group recommendations across small, medium, and large user groups.

2606.01910 2026-06-02 cs.GR cs.CV

Single-Line Drawing Generation via Semantics-Driven Optimization

基于语义驱动的单线图生成

Tanguy Magne, Alexandre Binninger, Ruben Wiersma, Olga Sorkine-Hornung

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 提出一种基于语义驱动的方法,通过文本提示或输入图像自动生成矢量格式的单线图,利用分数蒸馏采样优化均匀有理B样条曲线参数,并引入额外损失项控制艺术风格,生成结果优于现有方法且支持下游制造。

Comments 18 pages, published in Computer Graphics Forum 2026

详情
AI中文摘要

线条画是一种高度表现力的艺术形式,要求艺术家抽象和提炼其主题的本质。我们提出了第一种语义驱动的方法,用于自动生成矢量格式的单线图,该方法可由描述概念的文本提示或描绘概念的输入图像引导。我们的方法利用分数蒸馏采样来优化均匀有理B样条(URBS)曲线的参数,确保绘图由单一连续笔画组成。这种表示提供了对细节水平的精细控制,而额外的损失项使我们能够引导最终的艺术风格。我们证明,我们的方法在此任务上优于最先进的文本到图像模型和优化流程,产生的结果在美学上更令人愉悦,并且更忠实于连续线条画艺术家的风格。此外,由于我们的方法生成矢量化的曲线,它直接支持下游制造过程,如刺绣、激光雕刻和弯线。我们的代码和结果可在 https://github.com/tanguymagne/SLDgen 获取。

英文摘要

Line drawings are a highly expressive art form that requires the artist to abstract and distill the essence of their subject. We present the first semantics-driven method for automatically generating single-line drawings in vector format, guided either by a text prompt describing the concept or an input image depicting it. Our approach leverages score distillation sampling to optimize the parameters of a uniform rational B-spline (URBS) curve, ensuring that the drawing consists of a single continuous stroke by design. This representation provides fine-grained control over the level of detail, while additional loss terms allow us to steer the final artistic style. We demonstrate that our method outperforms state-of-the-art text-to-image models and optimization pipelines for this task, producing results that are both more aesthetically pleasing and more faithful to the style of continuous line drawing artists. Furthermore, because our method generates a vectorized curve, it directly supports downstream fabrication processes such as embroidery, laser engraving and wire bending. Our code and results are available at https://github.com/tanguymagne/SLDgen.

2606.01905 2026-06-02 eess.AS cs.SD

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

通过语音-文本表示学习推进电喉语音增强

Ding Ma, Jinyi Mi, Fengji Li, Lester Phillip Violeta, Jiajun He, Wenchin Huang, Kazuhiro Kobayashi, Tomoki Toda

发表机构 * Graduate School of Informatics, Nagoya University(名古屋大学信息学研究科) School of Biological Science and Medical Engineering, Beihang University(北航生物医学工程学院) TARVO, Inc.(TARVO公司) Information Technology Center, Nagoya University(名古屋大学信息技术中心)

AI总结 提出一种融合语音和文本表示的学习框架,通过序列到序列语音转换模型改进电喉语音到正常语音的映射与重建质量,实验证明优于仅依赖语音表示的方法。

Comments 15 pages, 7 figures. Accepted to IEEE TBME

详情
Journal ref
IEEE Transactions on Biomedical Engineering, Early Access, 2026
AI中文摘要

目的:喉切除者依赖机电设备产生电喉(EL)语音。与正常语音相比,EL语音存在严重失真、有限的语音变化、不自然的韵律和时间偏移,降低了自然度和可懂度。尽管基于序列到序列(seq2seq)语音转换(VC)的EL语音到正常语音转换(EL2SP)很有前景,但EL与正常语音之间的显著不匹配不可避免地导致累积映射误差,限制了性能。为解决这一问题,我们描述了一种新颖的表示学习框架,该框架整合语音和文本表示,以改善seq2seq VC模型内的映射和重建质量。方法:我们的方法包括两个主要阶段:1)表示整合与学习,以及2)重建训练。首先构建一个能够融入辅助文本信息的网络,使用预训练模块学习基于语音-文本的整合表示。然后,采用自编码器风格的重建策略完成EL2SP模型,以继承这些表示而不增加模型复杂度。我们引入了三种融合策略,包括中级、输入级和混合级融合策略,逐步增强学习。此外,除了标准的seq2seq VC目标外,还引入了对整合表示的额外重建损失,以细化表示迁移。结果:在不同EL2SP数据集上的实验一致表明,我们的方法结合数据增强,优于仅依赖语音表示的基线方法。此外,随着系统设计深度的逐步改进验证了我们方法的有效性。意义:所提出的方法为EL语音增强和辅助通信技术提供了一种可扩展且实用的方法。

英文摘要

Objective: laryngectomees depend on an electromechanical device to generate electrolaryngeal (EL) speech. Compared with normal speech, EL speech suffers from severe distortion, limited phonetic variation, unnatural prosody, and temporal shifts, degrading naturalness and intelligibility. Although sequence-to-sequence (seq2seq) voice conversion (VC) based EL-speech-to-normal-speech conversion (EL2SP) is promising, substantial mismatches between EL and normal speech inevitably cause cumulative mapping errors that limit performance. To address this, we describe a novel representation learning framework integrating speech and text representations to improve mapping and reconstruction quality within a seq2seq VC model. Methods: our methodology comprises two main stages: 1) representation integration and learning, and 2) reconstruction training. A network capable of incorporating auxiliary text information is first constructed with pretrained modules to learn speech--text-based integrated representations. Then, an autoencoder-style reconstruction strategy finalizes EL2SP model to inherit these representations without increasing model complexity. We introduce three fusion strategies including middle-, input-, and hybrid-level fusion strategies that progressively enhance learning. Moreover, besides standard seq2seq VC objectives, an additional reconstruction loss on the integrated representation is introduced to refine representation transfer. Results: experiments under different EL2SP datasets consistently demonstrate that our methods, combined with data augmentations, outperform baselines relying solely on speech representations. Furthermore, progressive improvements with system design depth validate the effectiveness of our methods. Significance: the proposed methods provide an extensible and practical methodology for EL speech enhancement and assistive communication technologies.

2606.01899 2026-06-02 eess.SP cs.AI

RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

RA-LWLM:基于检索增强的上下文无线定位基础模型

Guangjin Pan, Hui Chen, Hei Victor Cheng, Henk Wymeersch

发表机构 * Department of Electrical Engineering, Chalmers University of Technology(查尔姆斯理工大学电子工程系) Department of Electrical and Computer Engineering, Aarhus University(阿鲁斯大学电子与计算机工程系)

AI总结 提出RA-LWLM框架,通过将场景特定信息外化到指纹数据库,实现无需训练的跨场景无线定位,利用冻结的无线基础模型编码器、检索模块和基于Transformer的上下文学习模块预测用户位置。

Comments 13 pages, 9 figures. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

无线定位是第六代(6G)网络的基本能力。传统的基于模型的方法需要对传播环境进行精确建模,在复杂的多径和非视距场景中性能下降,而基于学习的方法将模型参数紧密耦合到训练场景中,每当基站(BS)配置或传播环境变化时需要昂贵的重新训练。在本文中,我们提出RA-LWLM,一种检索增强的上下文定位框架,通过将场景特定信息外化到每个场景的指纹数据库(而非编码在模型权重中)来实现无需训练的跨场景适应。该框架由三个组件组成:一个冻结的无线基础模型(FM)编码器,将原始信道状态信息映射为场景无关的表示;一个检索模块,通过表示空间中的相似性搜索从每个场景的数据库中选择最具信息量的参考;以及一个基于Transformer的上下文学习(ICL)模块,将查询与检索到的参考融合以预测用户设备(UE)位置。为了适应不同查询的检索质量和传播复杂性,ICL模块采用混合专家设计,其中专家专注于不同的上下文大小,并由可学习的选择器软组合。跨不同BS配置的异构场景的广泛基于射线追踪的实验表明,RA-LWLM在未见和已见场景上实现了几乎相同的精度,无需任何每个场景的重新训练,显著优于端到端和基于FM的基线。这些结果验证了所提出的检索增强上下文范式作为6G网络中跨场景定位的可扩展解决方案。

英文摘要

Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.