arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2606.02184 2026-06-02 cs.DL cs.LG

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

幽灵搭档:相关的大语言模型姓名先验及其对网络和学术出版的困扰

Michał Brzozowski, Neo Christopher Chung

发表机构 * Samsung AI Center(三星人工智能中心) University of Warsaw(华沙大学)

AI总结 研究发现大语言模型生成虚构专家姓名时会产生相关性强的角色组合,这些组合具有模型家族特异性,并在Zenodo等平台造成大量幽灵作者记录,影响学术出版。

详情
AI中文摘要

这些名字并不存在。Elena Vasquez 和 Marcus Chen 作为火山专家、宇航员、惊悚小说主角、播客主持人和学术合著者,出现在数百个独立生成的AI生成文档中,却从未存在过。我们表明,大语言模型在生成虚构专家时不仅仅默认使用高概率的单个名字:它们会产生相关的角色组合、配对和三人组,其共现频率远超偶然,并且在独立生成中保持一致。这些先验是模型家族特定的(Claude:Elena Vasquez + Marcus Chen + Amara Okafor;Gemini:Aris Thorne + Lena Petrova;GPT:Elara Voss 无固定搭档)、版本特定的,并且在模型发布边界处被主动抑制,在它们生成的内容中留下可定时的行为指纹。我们记录了一个大规模的下游后果。在Zenodo(一个由CERN运营的、生成真实DataCite DOI的存储库)上,我们识别出1,655条幽灵作者记录,声称不存在的期刊并带有捏造的出版日期:服务器端的DataCite时间戳证明了故意的回溯日期,其中991条记录在一个月内注册;这些记录携带在DataCite中注册的真实DOI,因此任何摄取DOI元数据的学术聚合器都可以获取它们。幽灵名字还出现在ResearchGate上,形成由来自多个模型家族的合作者组成的合成研究小组;这些记录上的出版日期为模型部署窗口提供了可靠的时间代理。

英文摘要

These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.

2606.02156 2026-06-02 eess.IV cs.AI cs.CV cs.IR cs.LG

Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel

基于术前肠道血供映射预测结直肠吻合口漏风险

Zahra Tabatabaei, Jon Sporring, Mark Bremholm Ellebæk, Alaa El-Hussuna

发表机构 * Computer Science Department, Københavns Universitet (KU)(哥本哈根大学计算机科学系) University of Southern Denmark(南部丹麦大学) Odense University Hospital(奥登塞大学医院) OpenSourceResearch Collaboration(开源研究协作)

AI总结 提出一种基于术前CT影像的AI驱动系统,通过分析血管和组织特征量化吻合口漏风险,并结合内容检索支持临床决策。

详情
AI中文摘要

吻合口漏仍然是结直肠癌手术后最严重的并发症之一,显著影响患者预后、康复轨迹和医疗成本。尽管影像技术有所进步,目前的术前评估仍依赖临床评估,这一过程主观、易出错且高度依赖个人经验。迄今为止,尚无经过验证的基于CT的方法能够在术前预测吻合口漏风险。本方案论文概述了一个全面的框架,用于开发和验证一个AI驱动的系统,该系统利用对比增强前后的CT影像进行术前风险评估。研究描述了数据收集、伦理处理、符合GDPR的患者数据预处理、图像预处理以及旨在生成临床可解释输出的深度学习架构探索等阶段。该工作流程的两个主要成果是:1) 风险评估模块,通过分析CT扫描中的血管和组织特征量化漏液可能性;2) 基于内容的医学图像检索(CBMIR)模块,识别并显示相似历史病例以支持循证手术决策。该方案论文需要医院和大学之间的密切合作;本方案表明,此类系统在现有医疗基础设施内技术上可行且临床可实施。通过遵循所提出的方法论阶段和监管原则,其他机构可以复制此工作流程以开发类似的决策支持工具。最终,这一跨学科框架旨在加强手术规划、减少漏液发生率,并推动向可解释、数据驱动的精准手术的更广泛范式转变。

英文摘要

Anastomotic leak remains one of the most serious complications following colorectal cancer surgery, substantially affecting patient outcomes, recovery trajectories, and healthcare costs. Despite advances in imaging technology, current preoperative assessment relies only on clinical assessment, a process that is subjective, error-prone, and highly dependent on individual expertise. To date, no validated CT-based method exists to predict anastomotic leak risk prior to surgery. This protocol paper outlines a comprehensive framework for developing and validating an AI-driven system for preoperative risk assessment using pre- and post-contrast CT imaging. The study describes the stages of data collection, ethical handling, and preprocessing of patient data in accordance with GDPR, image preprocessing, and the exploration of deep learning architectures designed to generate clinically interpretable outputs. Two integrated tools constitute the main deliverables of this workflow: 1) a risk assessment module, which quantifies the likelihood of leakage by analyzing vascular and tissue features in CT scans, and 2) a Content-Based Medical Image Retrieval (CBMIR) module, which identifies and displays similar historical cases to support evidence-based surgical decision making. The protocol paper requires close collaboration between hospitals and universities; this protocol demonstrates that such a system is technically feasible and clinically implementable within existing healthcare infrastructures. By following the proposed methodological stages and regulatory principles, other institutions can reproduce this workflow to develop analogous decision-support tools. Ultimately, this interdisciplinary framework aims to enhance surgical planning, reduce leak incidence, and contribute to a broader paradigm shift toward explainable, data-driven precision surgery.

2606.02127 2026-06-02 eess.AS cs.SD

Localizing broadband noise sources using the Loève spectrum and a 2.5D approach

使用Loève谱和2.5D方法定位宽带噪声源

Christian H. Kasess, Wolfgang Kreuzer, Holger Waubke

发表机构 * OeAW(奥埃阿维)

AI总结 针对移动宽带随机声源定位问题,提出一种基于2.5D设置和Loève谱的逆定位方法,推导了移动源功率谱密度与静态接收器Loève谱的关系,并通过多窗估计实现源定位。

Comments 31 pages, 13 figures

详情
AI中文摘要

使用麦克风阵列定位移动声源通常基于修改信号以补偿多普勒效应。在时域中,这种补偿是逐样本进行的。在频域中,需要使用短时间片段,其中假设多普勒效应近似恒定,并对每个片段进行离散傅里叶变换。相比之下,作者开发了一种针对均匀移动单频源的逆2.5D定位方法,该方法在谱域中工作,并允许使用更长的窗口。这是通过修改2.5D正向模型以直接计算运动在静态观察者位置的影响来实现的。该方法既不需要修改测量信号,也不需要在所使用的窗口内要求测量准平稳。不幸的是,这种方法不直接适用于宽带随机源,在本文中,我们将研究均匀移动随机源在静态观察者处观测时其统计特性如何变化。使用2.5D设置,推导了移动源功率谱密度与静态接收器处互谱密度推广形式——Loève谱之间的关系。基于速度高达100 m/s的模拟数据,本文提供了一种基于多窗估计Loève谱的方法的概念验证,用于定位移动宽带随机源。目前,该方法要求源信号平稳,并且谱密度在感兴趣频率附近的一定范围内平坦。此外,目前不考虑源之间的相关性。

英文摘要

The localization of moving sound sources using a microphone array is typically based on modifying the signal to compensate for the Doppler effect. In the time domain this compensation is done on a sample-by-sample basis. In the frequency domain short time segments need to be used in which the Doppler effect is assumed to be approximately constant and a discrete Fourier transform is done on each segment. In contrast, the authors developed an inverse 2.5D localization method for uniformly moving single-frequency sources that works in the spectral domain and allows for the use of longer windows. This was achieved by modifying the 2.5D forward model to directly compute the effect of the motion in the static observer position. The method does neither require to modify the measured signal nor does it require quasi-stationary of the measurements within the window used. Unfortunately, this approach is not directly suitable for broad-band stochastic sources, and in the present work we will investigate how the statistical properties of a uniformly moving stochastic source change when observed at a static observer. Using a 2.5D setting, the relation between the power spectral density of the moving source and the Loève spectrum, which is a generalization of the cross-spectral density at the static receivers, was derived. Based on simulated data with speeds up to 100 m\,s$^{-1}$, the work presented here provides a proof of concept for a method based on multi-taper estimates for the Loève spectrum to localize moving broad-band stochastic sources . Currently, the method requires a stationary source signal and that the spectral density is flat within a certain range around the frequency of interest. Also, correlations between sources are currently not considered.

2606.02115 2026-06-02 stat.ML cs.LG

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

发表机构 * Basque Center for Applied Mathematics (BCAM)(巴斯克应用数学中心) Centre for AI Fundamentals & Department of Computer Science(人工智能基础研究中心及计算机科学系) University of Manchester, UK(英国曼彻斯特大学)

AI总结 针对随机微分方程中已知扩散参数时的漂移估计问题,利用扩散模型理论推导了时间平均均方误差的显式风险界,将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

Comments Preprint

详情
AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题,在许多科学领域具有重要意义。Tapia Costa等人(2026)的最新工作引入了一种新技术,当扩散参数已知时,利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题,并利用(条件)得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果,但其估计器的理论保证问题仍未解决。在本笔记中,我们通过利用扩散模型理论的技术来填补这一空白。更具体地说,我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为(i)Euler-Maruyama离散化,(ii)得分/去噪器近似,(iii)噪声初始化,以及(iv)采样方差,揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

2606.02101 2026-06-02 stat.ML cs.LG stat.AP

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实:来自粗化边际的安全合成数据

Gillian M Raab

发表机构 * University of Edinburgh(爱丁堡大学) Scottish Centre for Administrative Data Research(苏格兰行政数据研究中心)

AI总结 提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法,确保透明性和无披露风险。

详情
AI中文摘要

本文提出了一种创建合成数据的方法,与当前可用的其他方法相比,该方法对用户有两个重要优势。首先是透明性;与其他方法不同,接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后,每个边际将根据数据保管者定义的标准进行统计披露控制,例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

2606.02092 2026-06-02 eess.IV cs.AI cs.CV

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE:用于土地覆盖估计的轻量级Transformer架构

Ümit Mert Çağlar, Alptekin Temizel

发表机构 * Middle East Technical University(中亚技术大学)

AI总结 提出LALE架构,通过分辨率分支编码器(轻量级ConvMixer处理高分辨率局部特征,Transformer处理低分辨率全局上下文)和全MLP多尺度解码器,在遥感图像分割中实现高效性能与计算成本的平衡。

详情
AI中文摘要

遥感图像的语义分割需要模型在严格的计算预算下同时捕捉全局上下文和局部细节。先前的工作通常针对这些轴之一进行优化:注意力用于全局上下文,卷积用于局部细节,或紧凑性用于效率。虽然混合方法旨在同时捕捉两者,但它们需要架构更改和带有计算开销的编码器骨干,限制了效率和性能。我们提出了LALE(用于土地覆盖估计的轻量级Transformer架构),一种端到端的遥感图像分割架构,它通过分辨率分支编码器:轻量级ConvMixer阶段处理高分辨率局部特征,而Transformer阶段处理低分辨率全局上下文,将自注意力的二次成本限制在深层、下采样的特征图上。全MLP多尺度解码器,以及贯穿始终的RMSNorm和StarReLU,进一步减少了计算量和参数数量。在大型ARAS400k遥感分割基准上,LALE相对于CNN、Transformer和混合基线建立了强大的效率-性能权衡。我们最小的变体(仅1.6M参数)在F1分数上达到最佳基线(UPerNet)的2.6分以内,同时使用4.5倍更少的参数、7倍更少的存储、17倍更少的GMACs,并提供1.8倍更高的吞吐量。

英文摘要

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

2606.02047 2026-06-02 stat.ML cs.LG math.ST stat.ME stat.TH

Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation

凸距离算子传输:一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

发表机构 * KAIST(韩国科学技术院)

AI总结 提出凸距离算子传输(CDOT),通过算子正则化联合保持特征对应与内在几何结构,实现异质分布对齐,并证明其伪度量性质及与Gromov-Wasserstein的关系。

Comments This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026

详情
AI中文摘要

我们引入了凸距离算子传输(CDOT),这是第一个凸最优传输框架,通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说,CDOT采用基于算子的正则化,通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此,所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外,我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein(GW)之间的关系,正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下,我们推导了一个非渐近风险界,分解为优化误差和统计误差,并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明,该方法优于现有方法,在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

2606.01987 2026-06-02 cs.DM cs.LG

Graph Edit Distance Formulation for the Vehicle Routing Problem: Theory and Analysis

车辆路径问题的图编辑距离公式:理论与分析

Adel Dabah

发表机构 * Forschungszentrum Jülich(耶鲁斯研究中心)

AI总结 本文提出将车辆路径问题重新表述为图编辑距离最大化问题,通过边删除成本模型实现总路线成本最小化,并利用该公式进行结构分析和基准测试。

详情
AI中文摘要

我们证明车辆路径问题(VRP)可以重新表述为图编辑距离(GED)最大化问题。在简单的边删除成本模型下,最小化总路线成本等价于从完整实例图中删除的边的总权重最大化。该公式在边级别对VRP进行建模,其中解由选定的边而非路线序列定义,从而能够进行经典公式中难以实现的结构分析:解质量的每条边归因、最优性差距的分解、解稀疏性的刻画以及贪婪构造难以到达的边的识别。理论上,我们建立了一个合并-分解定理,表明Clarke-Wright节省等于每次合并的GED增量,以及一个近似转移定理,将GED近似比转化为VRP成本界限。利用这一重新表述,我们分析了90个已知最优解的CVRP基准实例。我们发现最优路由图仅使用5.5%的可用边,约3.0%的最优边在重复重启下始终未被Clarke-Wright启发式找到,并且成本差距分解为遗漏的最优边和替代的非最优边,两者总权重相当。边加性目标为未来的图神经网络边预测方法提供了自然的每条边监督信号,暗示了与图神经网络方法的潜在联系,这留待后续工作。

英文摘要

We show that the Vehicle Routing Problem (VRP) can be reformulated as a Graph Edit Distance (GED) maximization problem. Under a simple edge-deletion cost model, minimizing total route cost is equivalent to maximizing the total weight of edges deleted from the complete instance graph. This formulation models VRP at the edge level, where solutions are defined by selected edges rather than route sequences, enabling structural analyses that are difficult in classical formulations: per-edge attribution of solution quality, decomposition of the optimality gap, characterization of solution sparsity, and identification of edges that are hard to reach by greedy construction. Theoretically, we establish a merge-decomposition theorem showing that Clarke-Wright savings equal per-merge GED increments, and an approximation-transfer theorem that turns GED approximation ratios into VRP cost bounds. Using this reformulation, we analyze 90 CVRP benchmark instances with known optimal solutions. We find that optimal routing graphs use only 5.5% of available edges, that approximately 3.0% of optimal edges are consistently not found by Clarke-Wright heuristics under repeated restarts, and that the cost gap decomposes into missed optimal edges and substituted non-optimal edges of comparable total weight. The edge-additive objective provides a natural per-edge supervision signal for future graph neural network approaches to edge prediction, suggesting a potential connection to graph neural network approaches that we leave for follow-up work.

2605.03384 2026-06-02 cs.CR cs.SD

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

DECKER: 跨键盘提取与识别的域不变嵌入

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal, Arun Balaji Buduru

发表机构 * IIIT-Delhi(印度德里理工学院) Guru Gobind Singh Indraprastha University(戈克辛格印度教大学)

AI总结 针对键盘声学侧信道攻击的跨键盘、跨用户和噪声环境泛化问题,提出包含四阶段域不变击键推理框架DECKER,并构建了多维度数据集HEAR,实验表明该方法在跨键盘和跨用户场景下显著提升击键识别性能。

Comments Accepted to AsiaCCS'26

详情
AI中文摘要

键盘上的声学侧信道攻击(ASCA)构成了重大的安全风险,因为击键可以从打字声音中推断出来,从而泄露敏感信息。先前的ASCA研究受限于小规模数据集,在用户、键盘和环境方面的多样性不足,限制了跨设备、麦克风和噪声条件的分析。我们引入了HEAR数据集,旨在沿着三个轴研究ASCA:键盘泛化、噪声适应和用户偏差。HEAR包含来自53名参与者使用37种笔记本电脑键盘的录音,在三种现实场景中收集:(1)外部麦克风捕获,(2)无网络噪声的设备麦克风捕获,以及(3)基于VoIP的流式捕获。这使得能够在用户、键盘和环境之间进行受控评估。在HEAR上,我们建立了一个ASCA基准,涵盖了单模态和多模态设置中来自原始音频和频谱图的传统特征和预训练表示。我们提出了DECKER,一个域不变的击键推理框架,包含四个阶段:(1)键盘签名归一化以减少设备着色,(2)域对抗解耦以抑制键盘身份,(3)有监督的跨键盘对比对齐以强制键一致性,以及(4)声学风格随机化以合成未见过的键盘响应。我们进一步探索了使用基于LLM的后处理层进行句子级推理,通过语言上下文优化击键序列。在HEAR上的结果表明,DECKER在跨键盘和跨用户设置中显著提高了击键识别性能,并通过语言模型校正进一步获得提升。这些发现强调,ASCA在多样化的用户、设备和噪声环境中仍然有效,凸显了其实际安全风险。

英文摘要

Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing sensitive information. Prior ASCA studies are limited by small-scale datasets with restricted diversity in users, keyboards, and environments, constraining analysis across devices, microphones, and noise conditions. We introduce HEAR, a dataset designed to study ASCA along three axes: keyboard generalization, noise adaptation, and user bias. HEAR contains recordings from 53 participants using 37 laptop keyboards, collected in three realistic settings: (1) external microphone capture, (2) device microphone capture without network noise, and (3) VoIP-based streaming capture. This enables controlled evaluation across users, keyboards, and environments. On HEAR, we establish an ASCA benchmark spanning conventional features and pre-trained representations from raw audio and spectrograms in unimodal and multimodal settings. We propose DECKER, a domain-invariant keystroke inference framework with four stages: (1) Keyboard Signature Normalization to reduce device coloration, (2) domain-adversarial disentanglement to suppress keyboard identity, (3) supervised cross-keyboard contrastive alignment to enforce key consistency, and (4) Acoustic Style Randomization to synthesize unseen keyboard responses. We further explore sentence-level inference using an LLM-based post-processing layer to refine keystroke sequences via linguistic context. Results on HEAR show DECKER improves keystroke identification over strong baselines, particularly in cross-keyboard and cross-user settings, with further gains from language-model rectification. These findings highlight that ASCA remains effective across diverse users, devices, and noisy environments, underscoring its practical security risk.

2606.01948 2026-06-02 cs.IR cs.AI

Rank-Constrained Deep Matrix Completion for Group Recommendation

面向群组推荐的秩约束深度矩阵补全

Mubaraka Sani Ibrahim, Lehel Csató, Isah Charles Saidu

发表机构 * Department of Computer Science, African University of Science and Technology(非洲科学与技术大学计算机科学系) Faculty of Mathematics and Computer Science, Babes-Bolyai University(巴纳特-博雅大学数学与计算机科学学院) Department of Computer Science, Baze University(贝泽大学计算机科学系)

AI总结 提出Group RC-DMC框架,通过Set-Transformer聚合器整合群组级表示学习,结合低秩结构和注意力非线性建模,实现个体与群组级别的准确预测。

详情
AI中文摘要

群体活动的日益普及增加了根据用户个体偏好向用户群组提供推荐的方法需求。许多现有的群组推荐系统依赖于聚合个体用户偏好,但通常难以处理现实场景中常见的高维且高度稀疏的评分数据。我们提出了群组秩约束深度矩阵补全(Group RC-DMC),这是一个新颖的框架,通过Set-Transformer聚合器整合群组级表示学习,扩展了RC-DMC,联合利用了低秩结构和基于注意力的非线性建模。与大多数现有群组推荐系统不同,Group RC-DMC在一个统一框架中融合了显式低秩正则化、线性编码器-解码器架构和基于注意力的非线性群组建模,在个体和群组级别都产生准确的预测。Group RC-DMC通过低秩矩阵补全解决数据稀疏性,仅从观测评分计算每个用户的潜在表示,并基于周期性奇异值阈值化使用核范数近端步骤对潜在空间施加秩约束。解码器被参数化为低秩分解,从而实现高效推理。在MovieLens和Goodbooks数据集上的实验结果表明,Group RC-DMC实现了优越的重建精度(以更低的群组RMSE衡量),同时在计算效率上保持竞争力,并且在群组级别的性能(精确率、召回率和F1分数)上与加权前分解(WBF)和加权后分解(AF)基线相当。结果突显了模型恢复用户-物品交互的底层低秩结构的能力,并为小、中、大用户群组提供稳健的群组推荐。

英文摘要

The growing popularity of group activities has increased the need for methods that provide recommendations to groups of users given their individual preferences. Many existing group recommender systems rely on aggregating individual user preferences, but they often struggle with high-dimensional and highly sparse rating data commonly found in real-world scenarios. We propose Group Rank-Constrained Deep Matrix Completion (Group RC-DMC), a novel framework that extends RC-DMC by integrating group-level representation learning via a Set-Transformer aggregator, jointly leveraging low-rank structure and attention-based nonlinear modeling. Unlike most existing group recommender systems, Group RC-DMC unifies explicit low-rank regularization, linear encoder-decoder architectures, and attention-based nonlinear group modeling within a single framework, yielding accurate predictions at both the individual and group levels. Group RC-DMC addresses data sparsity through low-rank matrix completion, computing per-user latent representations from observed ratings only, and enforcing a rank constraint on the latent space using a nuclear-norm proximal step based on periodic singular value thresholding. The decoder is parametrized as a low-rank factorization, enabling efficient inference. Experimental results on the MovieLens and Goodbooks datasets demonstrate that Group RC-DMC achieves superior reconstruction accuracy, measured by lower group RMSE, while remaining computationally efficient and competitive in group-level performance in terms of precision, recall, and F1 score compared with weighted-before-factorization (WBF) and after-factorization (AF) baselines. The results highlight the model's ability to recover the underlying low-rank structure of user-item interactions and provide robust group recommendations across small, medium, and large user groups.

2606.01905 2026-06-02 eess.AS cs.SD

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

通过语音-文本表示学习推进电喉语音增强

Ding Ma, Jinyi Mi, Fengji Li, Lester Phillip Violeta, Jiajun He, Wenchin Huang, Kazuhiro Kobayashi, Tomoki Toda

发表机构 * Graduate School of Informatics, Nagoya University(名古屋大学信息学研究科) School of Biological Science and Medical Engineering, Beihang University(北航生物医学工程学院) TARVO, Inc.(TARVO公司) Information Technology Center, Nagoya University(名古屋大学信息技术中心)

AI总结 提出一种融合语音和文本表示的学习框架,通过序列到序列语音转换模型改进电喉语音到正常语音的映射与重建质量,实验证明优于仅依赖语音表示的方法。

Comments 15 pages, 7 figures. Accepted to IEEE TBME

详情
Journal ref
IEEE Transactions on Biomedical Engineering, Early Access, 2026
AI中文摘要

目的:喉切除者依赖机电设备产生电喉(EL)语音。与正常语音相比,EL语音存在严重失真、有限的语音变化、不自然的韵律和时间偏移,降低了自然度和可懂度。尽管基于序列到序列(seq2seq)语音转换(VC)的EL语音到正常语音转换(EL2SP)很有前景,但EL与正常语音之间的显著不匹配不可避免地导致累积映射误差,限制了性能。为解决这一问题,我们描述了一种新颖的表示学习框架,该框架整合语音和文本表示,以改善seq2seq VC模型内的映射和重建质量。方法:我们的方法包括两个主要阶段:1)表示整合与学习,以及2)重建训练。首先构建一个能够融入辅助文本信息的网络,使用预训练模块学习基于语音-文本的整合表示。然后,采用自编码器风格的重建策略完成EL2SP模型,以继承这些表示而不增加模型复杂度。我们引入了三种融合策略,包括中级、输入级和混合级融合策略,逐步增强学习。此外,除了标准的seq2seq VC目标外,还引入了对整合表示的额外重建损失,以细化表示迁移。结果:在不同EL2SP数据集上的实验一致表明,我们的方法结合数据增强,优于仅依赖语音表示的基线方法。此外,随着系统设计深度的逐步改进验证了我们方法的有效性。意义:所提出的方法为EL语音增强和辅助通信技术提供了一种可扩展且实用的方法。

英文摘要

Objective: laryngectomees depend on an electromechanical device to generate electrolaryngeal (EL) speech. Compared with normal speech, EL speech suffers from severe distortion, limited phonetic variation, unnatural prosody, and temporal shifts, degrading naturalness and intelligibility. Although sequence-to-sequence (seq2seq) voice conversion (VC) based EL-speech-to-normal-speech conversion (EL2SP) is promising, substantial mismatches between EL and normal speech inevitably cause cumulative mapping errors that limit performance. To address this, we describe a novel representation learning framework integrating speech and text representations to improve mapping and reconstruction quality within a seq2seq VC model. Methods: our methodology comprises two main stages: 1) representation integration and learning, and 2) reconstruction training. A network capable of incorporating auxiliary text information is first constructed with pretrained modules to learn speech--text-based integrated representations. Then, an autoencoder-style reconstruction strategy finalizes EL2SP model to inherit these representations without increasing model complexity. We introduce three fusion strategies including middle-, input-, and hybrid-level fusion strategies that progressively enhance learning. Moreover, besides standard seq2seq VC objectives, an additional reconstruction loss on the integrated representation is introduced to refine representation transfer. Results: experiments under different EL2SP datasets consistently demonstrate that our methods, combined with data augmentations, outperform baselines relying solely on speech representations. Furthermore, progressive improvements with system design depth validate the effectiveness of our methods. Significance: the proposed methods provide an extensible and practical methodology for EL speech enhancement and assistive communication technologies.

2606.01899 2026-06-02 eess.SP cs.AI

RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

RA-LWLM:基于检索增强的上下文无线定位基础模型

Guangjin Pan, Hui Chen, Hei Victor Cheng, Henk Wymeersch

发表机构 * Department of Electrical Engineering, Chalmers University of Technology(查尔姆斯理工大学电子工程系) Department of Electrical and Computer Engineering, Aarhus University(阿鲁斯大学电子与计算机工程系)

AI总结 提出RA-LWLM框架,通过将场景特定信息外化到指纹数据库,实现无需训练的跨场景无线定位,利用冻结的无线基础模型编码器、检索模块和基于Transformer的上下文学习模块预测用户位置。

Comments 13 pages, 9 figures. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

无线定位是第六代(6G)网络的基本能力。传统的基于模型的方法需要对传播环境进行精确建模,在复杂的多径和非视距场景中性能下降,而基于学习的方法将模型参数紧密耦合到训练场景中,每当基站(BS)配置或传播环境变化时需要昂贵的重新训练。在本文中,我们提出RA-LWLM,一种检索增强的上下文定位框架,通过将场景特定信息外化到每个场景的指纹数据库(而非编码在模型权重中)来实现无需训练的跨场景适应。该框架由三个组件组成:一个冻结的无线基础模型(FM)编码器,将原始信道状态信息映射为场景无关的表示;一个检索模块,通过表示空间中的相似性搜索从每个场景的数据库中选择最具信息量的参考;以及一个基于Transformer的上下文学习(ICL)模块,将查询与检索到的参考融合以预测用户设备(UE)位置。为了适应不同查询的检索质量和传播复杂性,ICL模块采用混合专家设计,其中专家专注于不同的上下文大小,并由可学习的选择器软组合。跨不同BS配置的异构场景的广泛基于射线追踪的实验表明,RA-LWLM在未见和已见场景上实现了几乎相同的精度,无需任何每个场景的重新训练,显著优于端到端和基于FM的基线。这些结果验证了所提出的检索增强上下文范式作为6G网络中跨场景定位的可扩展解决方案。

英文摘要

Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.

2606.01891 2026-06-02 cs.GR cs.LG

MidSurfNet: Learnable Face Pairing and Interference Implicit Fields for Generalized Mid-surface Abstraction

MidSurfNet:面向广义中面抽象的可学习面配对与干涉隐式场

Li Ye, Xinhang Zhou, Xingyu Yang, Ruofeng Tong, Hailong Li, Peng Du, Min Tang

发表机构 * College of Computer Science and Technology, Zhejiang University(浙江大学计算机科学与技术学院) Shenzhen Poisson Software Co., Ltd.(深圳波森软件有限公司)

AI总结 提出MidSurfNet框架,通过可学习的面配对模块和干涉隐式场,解决薄壁CAD模型中多壁厚、自匹配及非中心偏移等复杂场景的中面抽象问题,实现87.32%的面配对准确率。

Comments 20 pages, 12 figures, 5 tables

详情
AI中文摘要

中面抽象对于薄壁CAD模型的有限元分析至关重要。现有的基于面配对的方法依赖手工几何启发式,但实际工业模型常呈现多壁厚区域、自匹配面配置,并需要非中心偏移曲面——在这些场景中,基于规则的方法始终失败。我们提出MidSurfNet,一个学习增强框架,通过两个新颖组件解决这些局限:(1) 神经面配对模块,从几何和拓扑特征学习预测面配对置信度,处理超越基于规则方法的复杂配对场景;(2) 干涉隐式场,将中面表示为两个符号距离函数的干涉,实现广义偏移控制,以便在下游CAE/FEA导向工作流中灵活定位。我们构建了一个包含超过1500个手动标注CAD模型的大规模中面数据集。实验表明,MidSurfNet达到87.32%的面配对准确率,并成功处理了困扰所有现有方法的多壁厚(完成率61.90%)和自匹配(完成率52.94%)场景。此外,MidSurfNet为面向CAE的应用提供了具有任意偏移控制的广义中面抽象的学习方法。

英文摘要

Mid-surface abstraction is essential for finite element analysis of thin-walled CAD models. Existing face pairing-based methods rely on handcrafted geometric heuristics, yet real-world industrial models frequently exhibit multi-wall-thickness regions, self-matching face configurations, and demand for non-center offset surfaces--scenarios where rule-based approaches consistently fail. We present MidSurfNet, a learning-augmented framework that addresses these limitations through two novel components: (1) a neural face pairing module that learns to predict face pair confidence from geometric and topological features, handling complex pairing scenarios beyond rule-based methods; and (2) an interference implicit field that represents mid-surfaces as the interference of two signed distance functions, enabling generalized offset control for flexible positioning in downstream CAE/FEA-oriented workflows. We construct a large-scale mid-surface dataset containing over 1,500 manually annotated CAD models. Experiments demonstrate that MidSurfNet achieves 87.32% face pairing accuracy and successfully handles multi-wall-thickness (61.90% completion) and self-matching (52.94% completion) scenarios that confound all existing methods. Furthermore, MidSurfNet provides a learning-based approach to generalized mid-surface abstraction with arbitrary offset control for CAE-oriented applications.

2606.01862 2026-06-02 cs.MA cs.AI cs.NI

RadioMaster: Multi-Agent System for Autonomous Radio Signal Generation

RadioMaster: 自主无线电信号生成的多智能体系统

Jiazhen Lei, Tianze Cao, Yuxin Sha, Sihan Wang, Bingbing Wang, Fengyuan Zhu, Zeming Yang, Xiaohua Tian

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出RadioMaster,一个全自主的多智能体框架,通过RadioWiki、RadioAgent和RadioEmulator三大支柱,将用户意图转化为真实无线信号,解决现有模型因领域知识和硬件约束敏感性不足而无法生成无线电信号的问题。

详情
AI中文摘要

将用户意图转化为物理无线电信号是无线原型设计中关键但繁琐的最后一步,因为它需要复杂的物理层细节知识,并带来巨大的实现挑战。大型语言模型(LLM)和多智能体系统已经彻底改变了传统的软件工程,提出了一个引人深思的问题:它们能否解决这些艰巨的困难?然而,我们的研究表明,当前模型在应用于无线电信号生成时存在显著局限性,无法完成此任务。这种性能下降主要源于严重的领域无知和对物理硬件约束的根本不敏感。为弥补这一差距,我们引入了RadioMaster,一个完全自主的多智能体框架,旨在将用户输入无缝转化为真实的无线发射。RadioMaster基于三个协同支柱运行:用于领域特定知识检索的RadioWiki、用于协作I/Q样本生成和硬件配置的RadioAgent,以及用于闭环物理层验证的RadioEmulator。此外,我们构建了RadioBench,这是首个专门针对无线电信号生成领域的全面基准测试。广泛的真实世界评估表明,RadioMaster在配置可行性和信号保真度方面显著优于最先进的基线方法。

英文摘要

Translating user intents into physical radio signals represents the critical yet notoriously tedious final step in wireless prototyping, as it requires intricate knowledge of physical layer details and presents immense implementation challenges. Large Language Models (LLMs) and multi-agent systems have revolutionized conventional software engineering, raising the compelling question of whether they can resolve these formidable difficulties. However, our investigations reveal that current models experience significant limitations and fail to accomplish this task when applied to radio signal generation. This performance degradation primarily stems from severe domain ignorance and a fundamental insensitivity to physical hardware constraints. To bridge this gap, we introduce RadioMaster, a fully autonomous multi-agent framework designed to seamlessly translate user input into real-world wireless emissions. RadioMaster operates on three synergistic pillars: RadioWiki for domain-specific knowledge retrieval, RadioAgent for collaborative I/Q sample generation alongside hardware configuration, and RadioEmulator for closed-loop physical layer verification. Furthermore, we construct RadioBench, the first comprehensive benchmark tailored specifically for the radio signal generation domain. Extensive real-world evaluations demonstrate that RadioMaster significantly outperforms state-of-the-art (SOTA) baselines regarding configuration viability and signal fidelity.

2606.01839 2026-06-02 cs.DC cs.AR cs.LG

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving

观察而非预测:面向智能体服务的对话级解耦调度

Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang, Henry Hoffmann

发表机构 * Anonymous Authors(匿名作者)

AI总结 提出将调度单元从单轮提升至整个对话,利用对话中首轮计算密集与后续内存密集的两阶段可观察特性,实现无需预测的解耦调度,显著降低延迟并提升能效。

详情
AI中文摘要

基于LLM的智能体通过多轮依赖推理和工具调用来解决用户任务,产生的工作负载在任务到达时总成本未知。现有的多轮系统以轮次为调度单元,逐轮决定是否将预填充与解码解耦。该决策依赖于该轮的解码长度、工具行为和KV增长,这些量在调度器必须行动时不可观察,迫使系统进行预测。我们表明这种对预测的依赖是由调度单元而非工作负载强加的。将调度单元从轮次提升到对话,将轮次级的不规则性转化为稳定的两阶段结构:1) 计算密集的首轮预填充,随后是2) 长尾内存密集阶段。因此,以对话为调度单元,放置问题简化为读取首轮输入长度和每解码器KV占用率,两者均可直接观察。我们在ConServe中实例化这一原则,它将首轮预填充路由到高吞吐预填充器,精确传输KV缓存一次,并将对话固定到单个解码器处理其整个尾部,无需学习解码侧成本模型。与每轮预测基线相比,ConServe将p95首次有效令牌时间(对话首个用户可见输出的延迟)降低51.08%,能效提升7.51%,同时保持最后一轮的TBT和SLO;将两阶段映射到异构GPU层级可进一步增加22.75%的能效。

英文摘要

LLM-based agents resolve a user task through many turns of dependent inference and tool calls, producing a workload whose total cost is unknown when the task arrives. Existing multi-turn systems keep the turn as the scheduling unit and decide, turn by turn, whether to disaggregate prefill from decode. That decision rests on the turn's decode length, tool behavior, and KV growth, quantities that are not observable when the scheduler must act, forcing the system to predict them. We show this dependence on prediction is imposed by the scheduling unit, not the workload. Raising the scheduling unit from the turn to the conversation converts turn-level irregularity into a stable, two-phase structure: 1) a compute-bound turn-1 prefill followed by 2) a long, memory-bound tail. Thus, with the conversation as the scheduling unit, placement reduces to reading the first-turn input length and per-decoder KV occupancy, both directly observable. We instantiate this principle in ConServe, which routes the first-turn prefill to a high-throughput prefiller, transfers the KV cache exactly once, and pins the conversation to a single decoder for its entire tail, with no learned model of decode-side cost. Against a per-turn prediction baseline, ConServe reduces p95 time-to-first-effective-token (the latency of a conversation's first user-visible output) by 51.08% and improves energy efficiency by 7.51% while preserving last-turn TBT and SLOs; mapping the two phases onto heterogeneous GPU tiers adds a further 22.75% in energy efficiency.

2606.01828 2026-06-02 cs.MA cs.AI

Dynamic Trust-Aware Sparse Communication Topology for LLM-Based Multi-Agent Consensus

基于动态信任感知的稀疏通信拓扑用于基于LLM的多智能体共识

Wanshuang Gou, Zihan Liu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出DySCo动态稀疏共识机制,通过信任感知的边选择降低通信开销并保持共识质量。

Comments 11 pages, 3 figures, 5 tables

详情
AI中文摘要

大型语言模型驱动的多智能体系统通过多轮讨论、角色专业化和交叉验证增强了复杂推理任务的可靠性。然而,现有的多智能体辩论和协作框架通常采用全连接通信,导致消息数量、令牌成本和端到端延迟随智能体数量近似二次增长;尽管固定稀疏拓扑减少了开销,但它们无法适应不同任务实例或中间推理状态,容易保留低价值交互或丢失关键的纠错信息。针对这一问题,本文提出了DySCo(动态稀疏共识),一种动态信任感知的稀疏共识机制。在每一轮推理中,DySCo基于智能体可靠性、答案分歧和任务相关性估计通信边的价值,并在预算约束下选择少量高价值边进行消息交换;然后通过动态信任权重聚合不同智能体的答案,并在共识稳定后提前终止讨论。该机制用按需通信替代通用广播,从而在保留关键交叉验证信息的同时降低通信开销。我们进一步给出了通信复杂度和共识稳定性的分析,并在数学推理、逻辑推理和事实问答任务上评估了DySCo的性能。

英文摘要

Large language model-driven multi-agent systems enhance the reliability of complex reasoning tasks through multi-round deliberation, role specialization, and cross-validation. However, existing multi-agent debate and collaboration frameworks typically adopt fully connected communication, causing the number of messages, token costs, and end-to-end latency to grow approximately quadratically with the number of agents; although fixed sparse topologies reduce overhead, they cannot adapt communication relationships to different task instances or intermediate reasoning states, making them prone either to preserving low-value interactions or to losing critical error-correction information. To address this problem, this paper proposes DySCo (Dynamic Sparse Consensus), a dynamic trust-aware sparse consensus mechanism. In each round of reasoning, DySCo estimates the value of communication edges based on agent reliability, answer divergence, and task relevance, and selects a small number of high-value edges for message exchange under budget constraints; it then aggregates the answers of different agents through dynamic trust weights and terminates the discussion early once consensus stabilizes. This mechanism replaces universal broadcasting with on-demand communication, thereby reducing communication overhead while preserving essential cross-validation information. We further present analyses of communication complexity and consensus stability, and evaluate the performance of DySCo on mathematical reasoning, logical reasoning, and factual question-answering tasks.

2606.01816 2026-06-02 q-bio.BM cs.LG

Site4Drug: Predicting Drug-Binding Target Sites with an AI Agent

Site4Drug: 利用AI智能体预测药物结合靶点

Taehan Kim, Sarrah Rose Mikhail Leung, Bharat Mekala, Jeongbin Park

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出Site4Drug,一种模态感知的靶点发现智能体,通过整合拓扑、亲水性、翻译后修饰等证据,输出带约束、风险标记和决策日志的可靶向区域排名列表,并自动推荐结合模态。

Comments Accepted to the ICML 2026 Workshop on Generative and Agentic AI for Biology (GenBio)

详情
AI中文摘要

选择在蛋白质上的干预位置(即选择可靶向位点)通常比选择结合物更模糊且更容易失败,尤其是对于膜蛋白,其可及性、拓扑和翻译后修饰(PTMs)限制了可作用区域。我们提出Site4Drug,一种模态感知的位点发现智能体,输出带有显式约束、证据摘要、风险标记和可追溯决策日志的可靶向区域排名列表。Site4Drug无需用户预先指定药物模态,而是利用与位点发现相同的证据(包括拓扑、亲水性、PTM倾向、二硫键、结构域背景和序列)推荐结合模态(例如抗体/肽类 vs 小分子)。重要的是,这些证据一致地应用于所有模态,包括小分子口袋发现,以避免选择化学上可行但生物学上被遮蔽的位点。

英文摘要

Selecting where to intervene on a protein (i.e., choosing a targetable site) is often a more ambiguous and failure-prone bottleneck than selecting what binds, especially for membrane proteins where accessibility, topology, and post-translational modifications (PTMs) constrain actionable regions. We present Site4Drug, a modality-aware site-finding agent that outputs a ranked list of targetable regions with explicit constraints, evidence summaries, risk flags, and a traceable decision log. Rather than requiring users to specify the drug modality upfront, Site4Drug can recommend a binding modality (e.g., antibody/peptide-like vs small-molecule) from the same evidence used for site discovery, including topology, hydropathy, PTM propensity, disulfides, domain context, and sequence. Importantly, this evidence is applied consistently across modalities, including small-molecule pocket discovery, to avoid selecting chemically plausible but biologically occluded sites.

2606.01783 2026-06-02 cs.IR cs.AI

Breaking the Information Silo: Semantic Personas for Cross-Domain Recommendation

打破信息孤岛:面向跨域推荐的语义人物画像

Jonathan Mayo, Moshe Unger, Konstantin Bauman

发表机构 * Technology and Information Management Department, Coller School of Management, Tel Aviv University(技术与信息管理系,科勒管理学院,特拉维夫大学) Management Information Systems Department, Fox School of Business, Temple University(管理信息系统系,福克斯商学院, Temple大学)

AI总结 提出SPHERE方法,利用大语言模型生成语义人物画像,实现无共享用户或物品的跨域推荐,并通过双塔架构和动态融合门增强推荐性能。

详情
AI中文摘要

数字平台日益成为孤立的信息孤岛,限制了它们跨域构建全面用户表征的能力。跨域推荐系统试图通过将知识从源域迁移到目标域来克服这一限制,但大多数现有方法依赖于共享用户、共享物品或结构相似的交互图。这些假设在独立平台上往往不切实际。我们提出SPHERE(面向异构跨域推荐的语义人物画像),一种设计构件,能够在严格不相交的域之间实现推荐知识迁移,无需共享用户或物品。SPHERE不通过身份或图结构对齐域,而是使用大语言模型诱导共享行为词汇,为用户生成结构化语义人物画像,并检索行为相似的源域社区,形成社区源人物画像。该语义信号通过双塔架构和动态融合门与协同信号集成,使SPHERE能够增强标准推荐骨干。在Amazon Books、Goodreads和Steam上的实证评估表明,在全排名评估下,SPHERE在NCF、SVD++和LightGCN基线上取得了一致的改进。结果表明,跨域迁移效果不仅由域之间的语义接近度决定,还关键取决于目标域的结构密度和原生预测强度。该研究通过将跨域个性化重新定义为基于行为的语义对齐,为信息系统研究做出贡献,提供了一种在保持可解释性和模块化的同时克服信息孤岛的实用机制。

英文摘要

Digital platforms increasingly operate as isolated information silos, limiting their ability to construct comprehensive user representations across domains. Cross-domain recommender systems seek to overcome this limitation by transferring knowledge from a source domain to a target domain, yet most existing approaches depend on shared users, shared items, or structurally similar interaction graphs. These assumptions are often unrealistic across independent platforms. We propose SPHERE (Semantic Personas for Heterogeneous cross-domain Recommendation), a design artifact that enables recommendation knowledge transfer across strictly disjoint domains with no shared users or items. Rather than aligning domains through identity or graph structure, SPHERE uses large language models to induce a shared behavioral vocabulary, generate structured semantic personas for users, and retrieve behaviorally similar source-domain communities that form a Community Source Persona. This semantic signal is integrated with collaborative signals through a dual-tower architecture and dynamic fusion gate, allowing SPHERE to augment standard recommender backbones. Empirical evaluation across Amazon Books, Goodreads, and Steam demonstrates consistent improvements over NCF, SVD++, and LightGCN baselines under full-ranking evaluation. The results show that cross-domain transfer effectiveness is not determined solely by semantic proximity between domains; rather, it depends critically on the structural density and native predictive strength of the target domain. The study contributes to information systems research by reframing cross-domain personalization as behavior-based semantic alignment, offering a practical mechanism for overcoming information silos while preserving interpretability and modularity.

2606.01764 2026-06-02 math.OC cs.GT cs.LG

Accelerating Min-Max Optimization via Power-Law Stepsizes

通过幂律步长加速极小极大优化

Yue Wu, Weiqiang Zheng, Yang Cai, Haipeng Luo

发表机构 * University of Southern California(南加州大学) Yale University(耶鲁大学)

AI总结 本文提出确定性动态步长调度,将外梯度方法的最后迭代收敛率从Θ(T^{-1/2})加速到O(T^{-2/3+ε}),并通过分离外推和更新步长进一步达到近最优的O(T^{-1+ε})。

Comments 56 pages

详情
AI中文摘要

我们重新审视了无约束双仿射极小极大优化的外梯度(EG)方法的收敛保证。已知固定步长的EG实现了$Θ(T^{-1/2})$的最后迭代收敛率,这比通过引入锚定等额外机制可达到的最优$\mathcal{O}(T^{-1})$率要慢。受最近进展(动态步长本身可以显著加速梯度下降)的启发,我们询问动态步长是否也能类似地加速EG的最后迭代收敛。我们在此方向上给出了第一个正面结果。具体地,我们提供了一个确定性动态步长调度,将EG的收敛率加速到$\mathcal{O}(T^{-2/3+\varepsilon})$,对于任意$\varepsilon > 0$。我们还证明,当EG的外推和更新步使用相同步长时,该率是紧的。然后我们表明,允许外推和更新步使用不同步长进一步将收敛率提高到近最优的$\mathcal{O}(T^{-1+\varepsilon})$。我们的分析将步长调度简化为一个优化问题,其解导致遵循幂律分布(的离散化)的步长调度。我们提出的步长调度和分析可扩展到其他方法,如乐观梯度(OG),并表明对一般极小极大优化问题的更广泛适用性。

英文摘要

We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $Θ(T^{-1/2})$ last-iterate convergence rate, which is slower than the optimal $\mathcal{O}(T^{-1})$ rate attainable by incorporating additional mechanisms such as anchoring. Motivated by recent advances showing that dynamic stepsizes alone can significantly accelerate gradient descent, we ask whether dynamic stepsizes can similarly accelerate the last-iterate convergence of EG. We present the first positive result in this direction. Specifically, we provide a deterministic dynamic stepsize schedule that accelerates the convergence rate of EG to $\mathcal{O}(T^{-2/3+\varepsilon})$ for any $\varepsilon > 0$. We also show that this rate is tight when the extrapolation and update steps of EG use the same stepsize. We then show that allowing different stepsizes for the extrapolation and update steps further improves the convergence rate to the near-optimal $\mathcal{O}(T^{-1+\varepsilon})$. Our analysis reduces stepsize scheduling to an optimization problem, whose solution leads to a stepsize schedule that follows (a discretization of) a power-law distribution. Our proposed stepsize schedules and analysis extend to other methods, such as Optimistic Gradient (OG), and suggest broader applicability to general min-max optimization problems.

2606.01691 2026-06-02 cs.CR cs.LG

IstGPT: LLM-based Anomaly Detection for Spatial-Temporal Graph in Industrial Systems

IstGPT:基于LLM的工业系统时空图异常检测

Yuchen Zhang, Ning Xi, Pengbin Feng, Shigang Liu, Jianfeng Ma, Yulong Shen, Yanan Sun, Xiaolin Zhou

发表机构 * School of Cyber Engineering, Xidian University(电子科技大学信息工程学院) School of Science, Computing and Engineering Technologies, Swinburne University of Technology(斯winburne技术大学科学与工程技术学院) School of Computer Science and Technology, Xidian University(电子科技大学计算机科学与技术学院)

AI总结 提出IstGPT,首个结合大语言模型与图学习的工业异常检测工具,通过多模态知识提取传感器-执行器依赖图并利用改进的图神经网络实现实时异常检测,在9个数据集上取得最佳F1分数和eTaF1指标。

详情
AI中文摘要

工业互联网系统面临来自复杂工业控制系统(ICS)攻击的日益增长的威胁,导致严重的安全事件。然而,由于传感器和执行器之间的复杂依赖关系,现有工具在实时异常检测方面效果有限。为了解决这个问题,我们提出了IstGPT,这是首个基于大语言模型和图学习的工业异常检测工具,能够针对广泛的ICS攻击提供实时保护。IstGPT实现了对工业信息物理系统中时空依赖关系的细粒度精确建模。它首先利用工业多模态知识,包括操作数据、技术文档和系统图,通过多阶段提示工程提取传感器-执行器依赖图。然后,LLM-Optimation基于节点准确性、边缘一致性和逻辑连贯性迭代优化图。最后,IstGPT将改进的图神经网络与编码器-解码器架构相结合,通过重构误差检测异常。我们在9个数据集上评估了IstGPT与12个最先进基线模型的性能,包括2个公共数据集、6个模拟数据集和一个真实机器人手臂数据集。IstGPT在所有九个数据集上取得了最佳的F1分数和eTaF1(一种较新的时间感知指标)。我们进一步讨论了在真实工业场景中部署IstGPT的可行性。

英文摘要

Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.

2606.01680 2026-06-02 cs.DC cs.LG cs.NI

Don't Let a Few Network Failures Slow the Entire AllReduce

不要让少数网络故障拖慢整个 AllReduce

Peiqing Chen, Jiedong Jiang, Nengneng Yu, Yuefeng Wang, Sixian Xiong, Wei Wang, Zaoxing Liu

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校) Utrecht University(乌特雷赫大学) Kyoto University(京都大学)

AI总结 针对网络故障导致 AllReduce 性能下降的问题,提出基于信息论下界的 OptCC 算法,通过四阶段流水线设计在带宽损失高达 50% 时仍接近无故障性能。

详情
AI中文摘要

网络故障是大规模 GPU 集群中最常见的硬件故障之一,也是训练任务中断的主要原因。现代集体通信库(如 NCCL)通过将流量重新路由到同一服务器上幸存的 NIC 来缓解网络故障,以降低节点间带宽换取不间断训练。然而,降级后的服务器仍处于标准环形算法的关键路径上,拖慢了整个集体通信。我们首次给出了非对称网络带宽下 AllReduce 完成时间的信息论下界,并表明当落后者保留至少一半原始带宽时,相对于无故障最优值的不可避免开销仅为 O(1/p)(p 为 GPU 数量)。然后,我们设计了 OptCC,一种接近该下界的四阶段流水线 AllReduce 算法。SimAI 上的实验证实,OptCC 缩小了现有容错方案留下的差距:在实际网络故障(带宽损失高达 50%)下,OptCC 的 AllReduce 完成时间在 NCCL 无故障环形性能的 2-6% 以内,而现有最优方案的开销高达 57%。

英文摘要

Network failures are among the most frequent hardware faults in large-scale GPU clusters and a leading cause of training-job interruptions. Modern collective communication libraries such as NCCL mitigate network failures by rerouting traffic through surviving NICs on the same server, trading reduced inter-node bandwidth for uninterrupted training. However, the degraded server remains on the critical path of the standard ring algorithm, slowing the entire collective. We present the first information-theoretic lower bound on AllReduce completion time under asymmetric network bandwidth and show that when the straggler retains at least half of its original bandwidth, the unavoidable overhead relative to the fault-free optimum is only O(1/p) for p GPUs. We then design OptCC, a four-stage pipelined AllReduce algorithm that approaches this lower bound. Experiments on SimAI confirm that OptCC closes the gap left by existing fault-tolerant schemes: under practical network failures with up to 50% bandwidth loss, OptCC completes AllReduce within 2-6% of NCCL's fault-free ring performance, whereas the state-of-the-art incurs up to 57% overhead.

2606.01670 2026-06-02 cs.IR cs.AI

Time-Aware Diffusion based on Preference Disentanglement for Generative Recommendation

基于偏好解耦的时间感知扩散用于生成式推荐

Bangguo Zhu, Peng Huo, Yuanbo Zhao, Zhicheng Du, Jun Yin, Senzhang Wang

发表机构 * Central South University(中南大学) National Super Computing Center(国家超算中心) Renmin University of China(中国人民大学) Hong Kong Polytechnic University(香港理工大学)

AI总结 针对现有扩散生成式推荐模型忽略用户偏好时间非平稳分布的问题,提出TDPM框架,通过将用户偏好解耦为长期周期偏好和短期点状偏好并融入扩散过程,在三个数据集上HR@20和NDCG@20平均提升29.21%和25.45%。

详情
AI中文摘要

最近,生成式推荐(GRs)通过用语义索引(SIDs)取代传统项目ID,成为一种变革性的推荐范式。由于扩散模型卓越的生成能力,一些开创性工作探索了以扩散架构为骨干开发GRs。然而,现有基于扩散的GRs的一个致命限制是扩散过程统一应用于历史交互中的所有项目。相比之下,用户偏好由多方面的时变因素塑造,因此在时间维度上呈现非平稳分布。为弥补这一差距,本研究提出一种新颖的GR框架,名为TDPM,通过在SID令牌上设计时间感知扩散。具体而言,TDPM将时变用户偏好的影响明确整合到扩散过程中。详细地,用户偏好被解耦为(i)长期一致的周期偏好和(ii)由近期焦点事件触发的点状偏好。在三个公开真实数据集上的大量实验表明,TDPM显著优于最先进的基线模型。TDPM在HR@20和NDCG@20上分别实现了平均高达29.21%和25.45%的提升。消融研究进一步强调了基于扩散的GRs中时间感知令牌扩散的必要性。

英文摘要

Recently, Generative Recommenders (GRs) have emerged as a transformative recommendation paradigm by replacing traditional item IDs with semantic indices (SIDs). Owing to the exceptional generative capabilities of diffusion models, a few pioneering works explore developing GRs with diffusion architectures as the backbone. However, a fatal limitation of existing diffusion-based GRs is that the diffusion process applies uniformly to all items within the historical interactions. In contrast, the user preference is shaped by multifaceted time-evolving factors and thus exhibits a non-stationary distribution in the temporal aspect. To bridge this gap, this study proposes a novel GR framework, named TDPM, by designing the time-aware diffusion on SID tokens. Specifically, TDPM explicitly integrates the impact of time-evolving user preferences into the diffusion process. In detail, the user preference is disentangled into (i) the period preference, which remains consistent over a long time-span, and (ii) the point preference, which is triggered by recent focal events. Extensive experiments on three public real-world datasets demonstrate the significant superiority of TDPM over the state-of-the-art baselines. TDPM achieves average improvements of up to 29.21% and 25.45% in terms of HR@20 and NDCG@20, respectively. The ablation study further underscores the necessity of time-aware token diffusion in diffusion-based GRs.

2606.01655 2026-06-02 math.OC cs.AI cs.LG stat.ML

MINTS: Minimalist Thompson Sampling

MINTS: 极简汤普森采样

Kaizheng Wang

发表机构 * Department of IEOR and Data Science Institute, Columbia University(工业工程与数据科学学院,哥伦比亚大学)

AI总结 针对贝叶斯方法在复杂结构约束下的局限性,提出一种仅对最优位置设置先验、通过轮廓似然消除冗余参数的极简贝叶斯框架,并实例化为MINTS算法,在均值约束多臂老虎机中实现近最优非渐近遗憾保证和精确几乎必然渐近遗憾刻画。

Comments 29 pages

详情
AI中文摘要

贝叶斯范式为不确定性下的序贯决策提供了原则性工具,但其对所有参数依赖概率模型的做法会阻碍复杂结构约束的纳入。我们提出一种极简贝叶斯框架,仅对最优位置设置先验,同时通过轮廓似然消除冗余参数。这产生了一个自然适应结构约束的广义后验。作为直接实例,我们开发了极简汤普森采样(MINTS)。对于具有均值约束的多臂老虎机,我们建立了近最优的非渐近遗憾保证和精确的几乎必然渐近遗憾刻画。特别地,MINTS在无结构设置中达到了经典的Lai-Robbins常数,并自动适应单峰结构,达到仅由最优臂的紧邻所确定的精确常数。

英文摘要

The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the location of the optimum, while eliminating nuisance parameters through profile likelihood. This yields a generalized posterior that naturally accommodates structural constraints. As a direct instantiation, we develop MINimalist Thompson Sampling (MINTS). For multi-armed bandits with mean constraints, we establish near-optimal non-asymptotic regret guarantees and sharp almost-sure asymptotic regret characterizations. In particular, MINTS attains the classical Lai--Robbins constant in the unstructured setting and automatically adapts to unimodal structure, achieving the sharp constant determined only by the immediate neighbors of the optimal arm.

2606.01652 2026-06-02 eess.SP cs.CV

Physics-Aware Linearized ADMM and Its Unrolling

物理感知线性化ADMM及其展开

Satoshi Takabe, Shunta Arai, Tadashi Wadayama

发表机构 * Japan Society for the Promotion of Science (JST), CRONOS(日本学术振兴会(JST)、CRONOS)

AI总结 针对基于PDE测量过程的逆问题,提出物理感知线性化ADMM算法,通过子问题线性化实现高效更新,并利用深度展开训练内部参数,在光纤通信压缩感知和噪声各向异性扩散图像恢复中验证有效性。

Comments 5 pages, 3 figures

详情
AI中文摘要

近年来,偏微分方程(PDE)已被用于直接建模信号处理中的测量过程,尽管其评估成本高昂。本文提出一种新颖的基于交替方向乘子法(ADMM)的算法,称为物理感知线性化ADMM(PA-LADMM),用于基于PDE测量过程的逆问题。关键思想是对包含PDE的子问题进行线性化,从而得到一种成本高效的更新规则,每次迭代仅需调用PDE求解器及其梯度评估。该算法在特定条件下具有理论收敛保证。此外,我们将其与深度展开相结合,展开PA-LADMM并使用监督数据训练其内部参数。两个不同的实验——光纤通信压缩感知和噪声各向异性扩散图像恢复——证明了所提算法的有效性。

英文摘要

Recently, partial differential equations (PDEs) have been used to directly model the measurement process in signal processing, although their evaluation is costly. In this paper, we propose a novel alternating direction method of multipliers (ADMM)-based algorithm called physics-aware linearized ADMM (PA-LADMM) for inverse problems from PDE-based measurement processes. The key idea is the linearization of the subproblem with PDEs, leading to a cost-efficient update rule that calls only a PDE solver and its gradient evaluation per iteration. The algorithm has a theoretical convergence guarantee under certain conditions. In addition, we combine it with deep unfolding to unroll the PA-LADMM and train its internal parameters using supervised data. Two distinct experiments, compressed sensing with optical fiber communication and image restoration from noisy anisotropic diffusion, demonstrated the effectiveness of the proposed algorithms.

2606.01645 2026-06-02 stat.ML cs.LG

Self-Regulating Annealing in Heavy-Tailed Diffusion Models

重尾扩散模型中的自调节退火

Keito Wakatsuki, Hideaki Shimazaki

发表机构 * Keito Wakatsuki(凯托·瓦卡苏基) Hideaki Shimazaki

AI总结 本文提出一种基于随机微分方程的重尾扩散模型采样器,通过状态依赖的扩散系数实现自调节退火机制,以改进重尾数据的生成保真度。

Comments 6 pages, 3 figures, IJCNN2026

详情
AI中文摘要

扩散模型已成为深度生成模型的主要框架。虽然标准高斯公式在理论上很方便,但其对重尾数据集的适用性仍不清楚。为了解决这个问题,重尾扩散模型(HTDM)通过用学生t分布替换高斯分布来扩展标准公式,从而提高了重尾数据集上的尾部保真度。尽管基于随机微分方程(SDE)的采样在HTDM中是可能的,但尚未得到充分探索。在本文中,我们提出了一种用于HTDM的基于SDE的采样器,该采样器明确地包含了状态依赖的扩散系数。这种状态依赖性通过自适应地调节有效噪声尺度,自然地诱导出自调节退火机制。我们从理论上探讨了这一机制,并通过实验验证了其在从重尾分布中重现样本的必要性。

英文摘要

Diffusion models have emerged as a leading framework for deep generative modeling. While the standard Gaussian formulation is theoretically convenient, its suitability for heavy-tailed datasets remains unclear. To address this, heavy-tailed diffusion models (HTDMs) extend the standard formulation by replacing the Gaussian distribution with a Student's t-distribution, thereby improving tail fidelity on heavy-tailed datasets. Although stochastic differential equation (SDE)-based sampling is possible in HTDMs, it has not been fully explored. In this paper, we propose an SDE-based sampler for HTDMs that explicitly incorporates a state-dependent diffusion coefficient. This state dependence naturally induces a self-regulating annealing mechanism by adaptively modulating the effective noise scale. We theoretically explore this mechanism and experimentally verify its necessity for reproducing samples from a heavy-tailed distribution.

2606.01628 2026-06-02 q-bio.BM cs.AI

Demystifying Multimodal Biomolecular Co-design With Intrinsic Geodesic Coupling

揭示具有内在测地耦合的多模态生物分子协同设计

Keyue Qiu, Xintong Wang, Zhilong Zhang, Hao Zhou, Wei-Ying Ma

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对生物分子协同设计中模态间时间耦合被忽视的问题,提出GeoCoupling框架优化异构模态的时间耦合,在基于结构的药物设计和无条件蛋白质设计中提升物理有效性和多样性。

Comments Accepted to ICML 2026

详情
AI中文摘要

蛋白质和小分子配体等生物分子在生物系统中发挥核心作用,这源于序列与三维结构之间的紧密相互作用。最近的生物分子协同设计生成模型旨在通过联合建模耦合模态来捕捉这种相互作用。然而,现有方法大多采用并行执行边际生成过程,隐式地强制固定同步耦合。我们认为,一个关键但被忽视的自由度在于这些边际过程在训练和生成过程中如何时间耦合,不恰当的耦合会引入高方差监督和不一致的中间状态,影响模态一致性。为了解决这个问题,我们引入了GeoCoupling,一个优化异构模态之间时间耦合的系统框架。在基于结构的药物设计和无条件蛋白质设计上的实证结果表明,学习到的耦合始终优于同步和随机耦合基线,产生了具有改进的物理有效性和多样性的生物分子。

英文摘要

Biomolecules such as proteins and small-molecule ligands play a central role in biological systems, arising from the tight interplay between sequence and three-dimensional structure. Recent generative models for biomolecular co-design aim to capture this interplay by jointly modeling coupled modalities. However, existing approaches largely adopt a parallel execution of marginal generative processes, implicitly enforcing fixed synchronous coupling. We argue that a critical but overlooked degree of freedom lies in how these marginal processes are temporally coupled during training and generation, where inappropriate coupling can introduce high-variance supervision and inconsistent intermediate states, affecting modality consistency. To address this, we introduce GeoCoupling, a systematic framework that optimizes for temporal couplings between heterogeneous modalities. Empirical results across structure-based drug design and unconditional protein design demonstrate the learned couplings consistently outperform synchronous and randomly coupled baselines, yielding biomolecules with improved physical validity and diversity.

2606.01596 2026-06-02 math.NA cs.LG cs.NA

Learning Chaotic Dynamics through Second-Order Geometric Supervision

通过二阶几何监督学习混沌动力学

Shinhoo Kang, Hai V. Nguyen, Tan Bui-Thanh

发表机构 * Department of Computer Science and Software Engineering, Korea University(韩国大学计算机科学与软件工程系) Department of Aerospace Engineering and Engineering Mechanics, The Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin(德克萨斯大学奥斯汀分校航空航天工程与工程力学系,奥登计算工程与科学研究所)

AI总结 提出模型约束随机雅可比匹配方法,以O(d^2)代价隐式施加二阶一致性,在混沌系统中恢复吸引子几何和不变统计量。

Comments 37 pages, 15 figures, 6 tables

详情
AI中文摘要

从数据中学习混沌动力系统需要的不仅仅是短期预测精度:学习模型必须保持吸引子几何及其不变统计量。轨迹(零阶)和雅可比(一阶)匹配监督向量场的值和切结构,但两者都不约束场如何偏离其切平面。因此,模型可以在监督状态下匹配值和切线,但弯曲方式与真实情况不同,在保持局部精度的同时,向虚假吸引子漂移并扭曲长时间统计量。我们证明,强制二阶一致性可以减轻这些失败,但在高维中形成完整的Hessian矩阵是禁止的。我们提出模型约束随机雅可比匹配,该方法在随机扰动的输入处比较真实和学习的向量场的雅可比矩阵。泰勒展开表明,期望的随机雅可比损失分解为名义雅可比失配加上由噪声方差缩放的Hessian失配,从而以O(d^2)代价隐式施加二阶一致性,而无需形成O(d^3)的Hessian张量。仅使用雅可比评估,该方法可扩展到显式Hessian匹配无法实现的高维。数值实验证实二阶方法是稳健的。对于Lorenz~63,一阶方法在最小时间监督下产生灾难性的Lyapunov指数异常值,而二阶方法消除了这些异常值并恢复了正确的吸引子。对于耦合Lorenz~96,分布外强迫扫描区分了这些方法:所有方法在F=16之前一致,但超过F=18后,只有二阶方法保持了不变测度和Lyapunov谱。在两个系统上,随机雅可比匹配以低得多的成本实现了与显式Hessian匹配相当的性能。

英文摘要

Learning chaotic dynamical systems from data requires more than short-term predictive accuracy: the learned model must preserve the attractor geometry and its invariant statistics. Trajectory (zero-order) and Jacobian (first-order) matching supervise the values and tangent structure of the vector field, but neither constrains how the field bends away from its tangent plane. A model can thus match values and tangents at the supervised states yet curve differently from the truth, remaining locally accurate while drifting toward spurious attractors and distorting long-time statistics. We show that enforcing second-order consistency mitigates these failures, but forming the full Hessian is prohibitive in high dimensions. We propose model-constrained randomized Jacobian matching, which compares the Jacobians of the true and learned vector fields at randomly perturbed inputs. A Taylor expansion shows that the expected randomized Jacobian loss decomposes into the nominal Jacobian mismatch plus a Hessian mismatch scaled by the noise variance, implicitly enforcing second-order consistency at $\mathcal{O}(d^2)$ cost without forming the $\mathcal{O}(d^3)$ Hessian tensor. Using only Jacobian evaluations, the method scales to high dimensions where explicit Hessian matching does not. Numerical experiments confirm that second-order methods are robust. For Lorenz~63, first-order methods produce catastrophic Lyapunov-exponent outliers under minimal temporal supervision, which second-order methods eliminate while recovering the correct attractor. For coupled Lorenz~96, an out-of-distribution forcing sweep separates the methods: all agree up to $F=16$, but beyond $F=18$ only second-order methods preserve the invariant measure and Lyapunov spectrum. On both systems, randomized Jacobian matching performs comparably to explicit Hessian matching at much lower cost.

2606.01578 2026-06-02 eess.AS cs.SD

Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

DCASE 2026挑战任务2:面向机器状态监测的噪声感知无监督异常声音检测——描述与讨论

Tomoya Nishida, Noboru Harada, Daiki Takeuchi, Daisuke Niizumi, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi

发表机构 * National Institute of Information and Communications Technology, Japan(日本信息与通信技术研究院)

AI总结 本文介绍DCASE 2026挑战任务2,通过利用近远双通道音频分离环境噪声与机器声音,提升无监督异常声音检测在噪声条件下的鲁棒性。

Comments this article draws heavily from arXiv:2506.10097

详情
AI中文摘要

本文概述了DCASE 2026挑战任务2,题为“面向机器状态监测的噪声感知无监督异常声音检测”。该任务旨在推进无监督设置下机器状态监测的噪声鲁棒异常声音检测,其中仅使用正常机器声音进行训练。在噪声条件下进行可靠检测对于实际部署至关重要,但以往的DCASE任务2设置提供的环境噪声信息有限,可能限制了高噪声情况下的UASD性能。为解决这一限制,DCASE 2026允许参与者利用同时在目标机器附近和远处采集的双通道音频样本。由于远处的麦克风预计包含相对更强的环境噪声和更弱的直接机器声音,它可能有助于从目标机器声音中区分环境噪声成分。在挑战提交截止日期后,将添加挑战结果和提交系统的分析。

英文摘要

This paper presents an overview of DCASE 2026 Challenge Task 2, titled "Noise-aware unsupervised anomalous sound detection (UASD) for machine condition monitoring." The task aims to advance noise-robust anomalous sound detection for machine condition monitoring under the unsupervised setting, where only normal machine sounds are available for training. Reliable detection under noisy conditions is crucial for practical deployment, but previous DCASE Task 2 settings provided limited information about environmental noise, potentially limiting UASD performance in highly noisy situations. To address this limitation, DCASE 2026 allows participants to exploit two-channel audio samples simultaneously captured at locations near and far from the target machine. Since the distant microphone is expected to contain relatively stronger environmental noise and weaker direct machine sounds, it may help distinguish environmental noise components from the target machine sounds. After the challenge submission deadline, challenge results and an analysis of the submitted systems will be added.

2606.01572 2026-06-02 eess.IV cs.CV

PINNOCHIO: Physics-Informed Neural Network for Coupled Hyperelastic Interface-Volume Simulation in Orthognathic Surgery

PINNOCHIO: 用于正颌手术中耦合超弹性界面-体积模拟的物理信息神经网络

Jungwook Lee, Daeseung Kim, Kevin Gu, Zhangfeng Hu, Tianshu Kuang, Finn Hopeman, Michael A. K. Liebschner, Jaime Gateno, Pingkun Yan

发表机构 * Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute(生物医学工程系和生物技术与跨学科研究中心,伦塞拉尔理工学院) Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute(口腔颌面外科系,休斯顿方法主义研究学院) Department of Neurosurgery, Baylor College of Medicine(神经外科系,贝勒医学院)

AI总结 提出PINNOCHIO框架,通过混合顺序分解解耦不连续骨-软组织界面运动与连续体积超弹性变形,实现稳定训练和物理启发的模拟到真实适应策略,在40名患者队列中优于现有基线,解决了精度-效率权衡问题。

Comments This work has been submitted to MICCAI 2026

详情
AI中文摘要

预测患者特定面部软组织变形对于迭代正颌手术规划至关重要。然而,当前计算方法面临严格的精度-效率权衡:高保真有限元方法计算成本过高,而纯深度学习模型往往产生生物力学不一致的结果。尽管物理信息神经网络提供了一条有前景的途径,但在仅有部分临床监督(即外表面)下学习骨-软组织相互作用的复杂异质力学仍然高度不稳定。为克服这些挑战,我们提出了PINNOCHIO,一种用于面部软组织模拟的新型物理信息框架。PINNOCHIO引入了一种混合顺序分解,明确地将不连续的骨-软组织界面运动与连续的体积超弹性变形解耦。这种结构分离实现了稳定训练,并促进了物理启发的模拟到真实适应策略,确保内部生物力学一致性而无需体积真实数据。在40名患者临床队列上的评估表明,PINNOCHIO在表面精度和物理有效性方面均优于现有基线。此外,它实现了比有限元方法显著的加速,成功解决了精度-效率权衡,为交互式手术规划提供了高度可靠和实用的工具。

英文摘要

Predicting patient-specific facial soft-tissue deformation is critical for iterative orthognathic surgery planning. However, current computational methods face a strict accuracy-efficiency trade-off: high-fidelity Finite Element Methods (FEM) are computationally prohibitive, whereas pure deep learning models often produce biomechanically inconsistent results. While Physics-Informed Neural Networks (PINNs) offer a promising avenue, learning the complex heterogeneous mechanics of bone--soft-tissue interactions with only partial clinical supervision (i.e., outer facial surfaces) remains highly unstable. To overcome these challenges, we present PINNOCHIO, a novel physics-informed framework for facial soft-tissue simulation. PINNOCHIO introduces a hybrid sequential decomposition that explicitly decouples discontinuous bone--soft-tissue interface movements from continuous volumetric hyperelastic deformation. This structural separation enables stable training and facilitates a physics-enabled sim-to-real adaptation strategy, ensuring internal biomechanical consistency without requiring volumetric ground truth. Evaluated on a 40-patient clinical cohort, PINNOCHIO outperforms existing baselines in both surface accuracy and physical validity. Furthermore, it achieves a substantial speedup over FEM, successfully resolving the accuracy-efficiency trade-off to provide a highly reliable and practical tool for interactive surgical planning.

2606.01542 2026-06-02 cs.DC cs.AI cs.CL cs.DB cs.IR

Self-Conditioned Positional HNSW for Overlap-Aware Retrieval in Chunked-Document RAG Systems: Method and Industrial Evidence-Quality Audit

自条件位置HNSW:面向分块文档RAG系统的重叠感知检索方法与工业证据质量审计

Nataraj Agaram Sundar, Tejas Morabia

发表机构 * eBay Inc.(eBay公司)

AI总结 提出自条件位置HNSW(SCP-HNSW),通过低维位置编码和两遍查询过程实现重叠感知检索,减少重复证据,并基于工业审计数据验证其有效性。

Comments 11 pages, 5 figures, 4 tables

详情
AI中文摘要

分块文档检索是检索增强生成(RAG)系统的常见组件。文档被分割成重叠的块,嵌入,并使用近似最近邻搜索(如分层可导航小世界图HNSW)进行索引。重叠改善了边界覆盖,但引入了一个实际故障模式:top-k检索通常返回重复证据的相邻块,浪费提示预算。我们提出自条件位置HNSW(SCP-HNSW),这是一种轻量级修改,将低维位置代码附加到块嵌入,并使用两遍查询过程来估计和应用查询特定的文档位置先验。SCP-HNSW保持HNSW图构建和遍历不变,同时为最终上下文构建添加了一个可审计的最小索引间隙选择器。我们还集成了用于生成证据质量的工业审查工件:一个包含318个完全标记审查的770条文本证据审计,以及一个包含350个评级的70例OCR审计。文本审计显示,770个预计审查中有574个被评为3/5,只有39个落在1-2范围内,叙述性审查者细节比结构化问题标志出现得更频繁。OCR审计显示,切片级通过率从干净聊天截图的95%到手写/模糊捕获的45%不等,一致性中等至强。这些结果激励了重叠感知、审计友好的RAG检索,并确定了因果性能声明所需的剩余受控检索消融。

英文摘要

Chunked-document retrieval is a common component of retrieval-augmented generation (RAG) systems. Documents are split into overlapping chunks, embedded, and indexed with approximate nearest-neighbor search such as hierarchical navigable small world graphs (HNSW). Overlap improves boundary coverage but induces a practical failure mode: top-k retrieval often returns near-adjacent chunks that repeat evidence and waste prompt budget. We propose Self-Conditioned Positional HNSW (SCP-HNSW), a lightweight modification that appends a low-dimensional positional code to chunk embeddings and uses a two-pass query procedure to estimate and apply a query-specific document-position prior. SCP-HNSW leaves HNSW graph construction and traversal unchanged while adding an auditable minimum-index-gap selector for final context construction. We also integrate industrial review artifacts for generated evidence quality: a 770-review text-evidence audit with 318 fully labeled reviews and a 70-case OCR audit with 350 ratings. The text audit shows that 574 of 770 projected reviews are rated 3/5, only 39 fall in the 1-2 range, and narrative reviewer detail appears much more often than structured issue flags. The OCR audit shows slice-level pass rates from 95% for clean chat screenshots to 45% for handwritten/blurry captures, with moderate to strong agreement. These results motivate overlap-aware, audit-friendly RAG retrieval and identify the remaining controlled retrieval ablations needed for causal performance claims.