arXivDaily arXiv每日学术速递 周一至周五更新
2606.20235 2026-06-19 cs.IR cs.AI 新提交

ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments

ScholarQuest:开放文献环境中智能学术论文搜索的基于分类法的基准测试

Tingyue Pan, Mingyue Cheng, Daoyu Wang, Yitong Zhou, Jie Ouyang, Qi Liu, Enhong Chen

AI总结 提出ScholarQuest基准,基于1000多个计算机科学主题和四种研究意图,构建可扩展的答案和共享检索后端,评估LLM智能体在开放文献环境中的学术论文搜索能力。

详情
AI中文摘要

学术论文搜索是科学研究中的核心步骤,基于LLM的搜索智能体正成为迭代式、意图驱动的文献探索的有前景范式。然而,现有基准不足以在现实开放文献环境下系统评估智能学术搜索。我们提出ScholarQuest,一个大规模、基于分类法的智能学术论文搜索基准。ScholarQuest基于1000多个计算机科学主题和四种代表性研究意图构建,包括方法导向、设置锚定、比较型和范围控制查询。它进一步提供可扩展的答案构建和共享检索后端ScholarBase,用于可重复评估。基准测试结果表明,智能方法优于单次检索基线,但表现最佳的智能体仅达到0.314的Recall@100和0.355的Recall@All,表明有显著的改进空间。此外,对搜索效率、意图级鲁棒性和失败案例的分析进一步凸显了该基准为学术论文搜索智能体提供多维评估信号的能力。

英文摘要

Academic paper search is a core step in scientific research, and LLM-based search agents are emerging as a promising paradigm for iterative, intent-driven literature exploration. However, existing benchmarks are insufficient for systematically evaluating agentic academic search under realistic open literature environments. We propose ScholarQuest, a large-scale, taxonomy-guided benchmark for agentic academic paper search. ScholarQuest is constructed from over 1,000 computer science topics and four representative research intents, including method-oriented, setting-anchored, comparison-based, and scope-controlled queries. It further provides scalable answer construction and a shared retrieval backend ScholarBase for reproducible evaluation. Benchmarking results show that agentic methods outperform single-shot retrieval baselines, yet the best-performing agent only achieves 0.314 Recall@100 and 0.355 Recall@All, indicating substantial room for improvement. In addition, analyses of search efficiency, intent-level robustness, and failure cases further highlight the benchmark's ability to provide multi-dimensional evaluation signals for academic paper search agents.

2606.20233 2026-06-19 cs.CV 新提交

Cinematic Compositing Using Character-Environment-Harmonized Video Generation Models

使用角色-环境协调视频生成模型的电影级合成

Tianyi Xiang, Mingming He, Li Ma, Jing Liao

发表机构 * City University of Hong Kong(香港城市大学) Independent Researcher(独立研究员)

AI总结 提出端到端视频扩散框架,通过三掩码引导和RGB-D联合去噪建模角色与环境的双向物理与光照交互,实现高质量动态视频合成。

详情
AI中文摘要

电影级合成旨在将绿幕角色融入新环境,同时保持物理和光度真实性。先前的方法通常未能捕捉角色与其周围环境之间的复杂双向交互,我们将其表征为角色到环境(C2E)的物理交互和环境到角色(E2C)的光照协调。为了解决这个问题,我们提出了一个端到端的视频扩散框架,联合建模C2E和E2C交互,特别处理交互道具的挑战。我们的方法引入了一种三掩码引导架构,结合RGB-D联合去噪,以确保角色、道具和环境之间的物理一致交互。我们进一步开发了一种高效的先验驱动数据整理流程,无需昂贵的渲染即可构建高质量的重光照对。最后,参考条件机制实现了可控的环境合成和精确的道具替换。大量实验表明,我们的框架在电影级动态视频合成方面显著优于现有方法。

英文摘要

Cinematic compositing aims to integrate green-screen characters into novel environments while maintaining physical and photometric realism. Previous methods often fail to capture the complex bidirectional interactions between characters and their surroundings, which we characterize as Character-to-Environment (C2E) physical interaction and Environment-to-Character (E2C) lighting harmonization. To address this, we propose an end-to-end video diffusion framework that jointly models C2E and E2C interactions, specifically handling the challenges of interactive props. Our approach introduces a tri-mask-guided architecture with RGB-D joint denoising to ensure physically consistent interactions among the character, props, and environment. We further develop an efficient prior-driven data curation pipeline to construct high-quality relighting pairs without expensive rendering. Finally, a reference-conditioned mechanism enables controllable environment synthesis and precise prop replacement. Extensive experiments demonstrate that our framework significantly outperforms existing methods in cinematic-quality dynamic video compositing.

2606.20232 2026-06-19 cs.RO cs.GT 新提交

Mobile Target Search with Imperfect Perception: A Partially Observable Stochastic Game Theoretical Approach

不完美感知下的移动目标搜索:一种部分可观测随机博弈论方法

Hanzheng Zhang, Shu Liang, Shuyu Liu

发表机构 * Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University(同济大学上海自主智能无人系统科学中心) Department of Control Science and Engineering, Tongji University(同济大学控制科学与工程系)

AI总结 针对传感器限制、恶意干扰或通信噪声导致的不完美感知,采用部分可观测随机博弈(POSG)框架建模搜索者与目标间的对抗互动,提出可检测性概念和基于随机递归分析的充分判据,并开发服务器辅助分布式算法。

详情
AI中文摘要

本文研究了在传感器限制、恶意干扰或通信噪声导致的不完美感知下的移动目标搜索问题。搜索者和目标在具有有限移动性的网格状区域中运行,导致搜索与逃避之间的动态相互作用。为了捕捉不完美感知下的这种对抗互动,我们采用部分可观测随机博弈(POSG)方法,该方法通过引入目标智能来推广部分可观测马尔可夫决策过程(POMDP)。为了处理感知不确定性引起的虚警和漏检,我们提出了一种新颖的可检测性概念,以确定搜索策略是否能保证最终检测,并基于随机递归分析提供了充分的可检测性准则。我们进一步开发了一种服务器辅助的分布式算法,该算法利用搜索者的聚合势博弈结构和基于KL散度的目标预测约简。数值模拟验证了所提算法的有效性,并支持了可检测性分析。

英文摘要

This paper investigates mobile target search under imperfect perceptions caused by sensor limitations, malicious jamming, or communication noise. Searchers and targets operate in a grid-shaped area with bounded mobility, leading to a dynamic interplay between search and evasion. To capture this adversarial interaction under imperfect perceptions, we adopt the partially observable stochastic game (POSG) approach, which generalizes partially observable Markov decision processes (POMDPs) by incorporating target intelligence. To handle false alarms and missed detections caused by perceptual uncertainties, we propose a novel detectability concept to determine whether a search strategy guarantees eventual detection, and provide sufficient detectability criteria based on stochastic recurrence analysis. We further develop a server-assisted distributed algorithm that utilizes the aggregative potential game structure for searchers and a KL-divergence-based reduction for target prediction. Numerical simulations validate the effectiveness of the proposed algorithm and support the detectability analysis.

2606.20230 2026-06-19 cs.SE 新提交

SysML Modeling of Digital Twins for Renewable Energy Communities

可再生能源社区数字孪生的SysML建模

Mohammad Samadi, Luís Miguel Pinho, Andrey Sadovykh, Gabriela Lucas

AI总结 针对可再生能源社区数字孪生工程中的异构性挑战,提出基于SysML的MBSE工作流,通过设备分类和社区组织视图建模,并引入SAREF4ENER本体弥补语义鸿沟。

Comments Presented at the Workshop on Digital Twin Experiences and Model-Based Testing Methods, 12 June 2026, Västerås, Sweden, co-located with the 30th Ada-Europe International Conference on Reliable Software Technologies (AEiC 2026)

详情
AI中文摘要

可再生能源社区(REC)正成为本地和全球共享可再生能源发电、存储和灵活负载的关键组织模型。由于涉及设备、合同和运行时数据的异构性,REC数字孪生的工程变得困难。在本文中,我们朝着REC数字孪生的基于模型的系统工程(MBSE)工作流迈出了第一步。从经过工业验证的REC领域模型出发,我们使用开源Modelio工具在SysML中重新表达了一个代表性的房屋子集,生成了两个块定义图——一个设备分类和一个社区组织视图。然后,我们讨论了普通SysML留下的四个语义鸿沟,并概述了如何将SAREF4ENER本体作为参考包导入以弥合这些鸿沟。将SysML与基于SAREF的智能能源数字孪生语义相结合在很大程度上仍未探索,我们将本文定位为沿着这条线的第一步。

英文摘要

Renewable Energy Communities (RECs) are emerging as a key organizational model for local and global sharing of renewable generation, storage, and flexible loads. Engineering Digital Twins of RECs is made difficult by the heterogeneity of devices, contracts, and runtime data involved. In this paper, we take a first step toward a Model-Based Systems Engineering (MBSE) workflow for REC's Digital Twins. Starting from an industrially-validated REC domain model, we re-express a representative house subset in SysML using the open-source Modelio tool, yielding two Block Definition Diagrams - a device taxonomy and a community organizational view. We then discuss four semantic gaps that plain SysML leaves open and sketch how the SAREF4ENER ontology could be imported as a reference package to close them. Combining SysML with SAREF-based semantics for smart-energy Digital Twins remains largely unexplored, and we position this paper as a first step along that line.

2606.20227 2026-06-19 cs.AI cs.SE 新提交

QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation

QMFOL:通过可量化的一元一阶逻辑测试用例生成来基准测试大语言模型推理

Xinyi Zheng, Ling Shi, Tianlong Yu, Yongxin Zhao, Lorenz Goette, Kailong Wang

发表机构 * Huazhong University of Science and Technology(华中科技大学) Nanyang Technological University(南洋理工大学) Hubei University(湖北大学) East China Normal University(华东师范大学) National University of Singapore(新加坡国立大学)

AI总结 提出QMFOL框架,通过可控制复杂度的合取/析取模式生成一元一阶逻辑推理任务,并构建包含2880个实例的基准QMFOLBench,评估显示逻辑复杂度增加导致性能下降和计算开销上升。

详情
AI中文摘要

大型语言模型(LLMs)在推理方面取得了显著进展,特别是在演绎推理中,这对于高风险决策至关重要。随着模型的改进,评估基准也应随之发展。然而,现有基准缺乏对逻辑复杂性的细粒度控制,并且在语义多样性与逻辑一致性之间难以平衡。为了解决这些问题,我们提出了QMFOL,一个自动生成具有可量化和可控复杂度的一元一阶逻辑推理任务的框架。它使用合取和析取模式构建形式逻辑结构,从而能够精确控制推理深度、宽度、标签类型和干扰项。然后通过LLM将这些结构转化为自然语言,并通过外部证明器的往返验证确保逻辑一致性。基于我们的框架,我们构建了QMFOLBench,一个包含2880个实例、960种配置的基准,覆盖不同的逻辑和语义维度。对六个大型推理模型(LRMs)和两个LLM的评估表明,随着逻辑复杂度的增加,性能下降且计算开销上升。模型在True标签任务上的表现优于False或Unknown任务,并且对语义变化敏感。总体而言,QMFOL提供了一种可扩展且可靠的方法来构建具有可控复杂度的演绎推理基准,从而能够更精确地评估现代语言模型的推理能力。

英文摘要

Large Language Models (LLMs) have made significant progress in reasoning, particularly in deductive reasoning, which is crucial for high-stakes decision-making. As models improve, evaluation benchmarks should evolve to keep pace. However, existing benchmarks lack fine-grained control over logical complexity and struggle to balance semantic diversity with logical consistency. To address these issues, we propose QMFOL, an automated framework for generating monadic first-order logic reasoning tasks with quantifiable and controllable complexity. It constructs formal logical structures using conjunction and disjunction patterns, enabling precise control over reasoning depth, width, label types, and distractors. These structures are then translated into natural language via LLMs, with logical consistency ensured through round-trip verification using an external prover. Based on our framework, we build QMFOLBench, a benchmark comprising 2880 instances with 960 configurations across diverse logical and semantic dimensions. Evaluations on six large reasoning models (LRMs) and two LLMs show that performance degrades and computational overhead increases with rising logical complexity. Models perform better on True-labeled tasks than on False or Unknown ones, and exhibit sensitivity to semantic variation. Overall, QMFOL offers a scalable and reliable approach for constructing deductive reasoning benchmarks with controllable complexity, enabling more precise evaluation of reasoning capabilities in modern language models.

2606.20225 2026-06-19 cs.CL 新提交

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

可操作的激活方向:检测和缓解跨语言模型家族的突发性对齐失调

Abdul Rafay Syed

发表机构 * Universität des Saarlandes(萨尔大学)

AI总结 通过差分均值方向在最终层实现99.6%的对齐/失调分离,因果干预将代码泄露降低21-51点;跨架构迁移虽有效但缺乏特异性,揭示了两层特异性结构。

Comments 12 pages, 2 figures

详情
AI中文摘要

在不安全代码上微调语言模型会引发突发性对齐失调,其内部结构尚不明确。我们研究了这种失调是否对应于跨架构共享的因果可操作的激活空间方向。在四个指令微调模型家族(Qwen2.5-1.5B、Gemma-2-2B、Llama-3.2-1B、Ministral-3-3B)上进行相同微调后,差分均值方向在每个模型的最终层实现了99.6%的对齐与失调激活分离。通过减去该方向进行因果干预,代码泄露减少了21-51个百分点,而安全代码控制验证了内容特异性。通过岭回归映射进行跨架构迁移产生了较大的行为抑制(高达46个百分点),但未能通过特异性控制,因为随机和正交方向表现相当。我们识别出一个两层特异性结构:模型内方向具有因果特异性和可操作性;跨模型方向具有因果真实性但缺乏特异性。出现了不对称的迁移拓扑,Gemma和Qwen作为几何捐赠者,Llama作为接收者。这些发现定义了线性跨架构校正的局限性,并推荐使用模型内探测进行审计。

英文摘要

Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared across architectures. Across four instruction-tuned model families (Qwen2.5-1.5B, Gemma-2-2B, Llama-3.2-1B, Ministral-3-3B) finetuned identically, a difference-in-means direction achieves 99.6% separation of aligned and misaligned activations at each model's final layer. Causal steering by subtracting this direction reduces code spillover by 21-51 points, while a secure-code control confirms content specificity. Cross-architecture transfer via ridge regression maps yields large behavioral suppression (up to 46 points) but fails specificity controls as random and orthogonal directions perform comparably. We identify a two-tier specificity structure: within-model directions are causally specific and actionable; cross-model directions are causally real but non-specific. An asymmetric transfer topology emerges, with Gemma and Qwen acting as geometric donors and Llama as a receiver. These findings define the limits of linear cross-architecture correction and recommend within-model probing for auditing.

2606.20223 2026-06-19 cs.CV q-bio.QM 新提交

DeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forests

DeepForestVisionV2:面向非洲热带森林相机监测的生态驱动分类扩展

Hugo Magaldi, Theau d'Audiffret, Etienne Francois Akomo-Okoue, Bala Amarasekaran, Naomi Anderson, Claire Auger, Noemie Cappelle, Daniel Cornelis, Raphael Cornette, Tobias Deschner, Gabriel Dubus, Davy Fonteyn, Rosa M. Garriga, Jennifer Hatlauf, Innocent Kasekendi, Raymond Katumba, Aram Kazandjian, Alfred Ngomanda, Stephan Ntie, Simone Pika, Xavier Rufray, Harold Rugonge, John Justice Tibesigwa, Peter van Lunteren, Hadrien Vanthomme, Joeri A. Zwerts, Sabrina Krief

发表机构 * UMR7206 Eco-Anthropologie, MNHN(UMR7206 生态人类学,法国国家自然历史博物馆) One Forest Vision initiative(One Forest Vision 倡议) Sebitoli Chimpanzee Project(塞比托利黑猩猩项目) Centre National de la Recherche Scientifique et Technologique(国家科学技术研究中心) Institut de Recherche en Ecologie Tropicale(热带生态研究所) Tacugama Chimpanzee Sanctuary(塔库加马黑猩猩保护区) Biotope(Biotope 公司) CIRAD(法国农业发展国际合作研究中心) Max Planck Institute for Evolutionary Anthropology(马克斯·普朗克进化人类学研究所) BOKU University(维也纳自然资源与生命科学大学) Agence Nationale des Parcs Nationaux du Gabon(加蓬国家公园管理局) Uganda Wildlife Authority(乌干达野生动物管理局) Addax Data Science(Addax 数据科学公司) Utrecht University(乌得勒支大学)

AI总结 针对非洲热带森林相机监测中生态梯度(垂直分层、场景开放度、人为界面)导致原35类分类过粗的问题,提出扩展至64类的DeepForestVisionV2,在保持离线工作流的同时提升野外实用性。

Comments Accepted at ICPR 2026 - Computer Vision for Biodiversity Monitoring and Conservation Workshop

详情
AI中文摘要

非洲热带森林中的相机监测正从封闭冠层内部扩展到河岸、空地和公园边缘。在现有的非洲森林相机分类开放工具中,DeepForestVision是唯一提供照片和视频匹配离线工作流的工具,先前研究表明其在可比基准上优于其他基线。然而,它专为封闭冠层、地面森林内部设计,使用35类预测空间,当部署遇到树栖灵长类、鸟类、半水生类群或家畜等人为混杂因素时,该空间变得过于粗糙。我们提出DeepForestVisionV2,这是一个从35类扩展到64类预测空间(61个动物类加上人类、车辆和空白)的生态驱动扩展,旨在解决三个反复出现的部署梯度:垂直分层、场景开放度和人为界面。DeepForestVisionV2保留相同的离线工作流,并在来自多国非洲热带森林项目的1,535,010张照片和243,354个视频上训练。评估结合了一个跨国家裁剪照片验证集(用于评估跨站点和相机设置的鲁棒性)和三个涵盖目标梯度的留出乌干达视频基准。在验证集上,DeepForestVisionV2达到0.86准确率、0.82宏F1和0.81平衡准确率。在部署基准上,尽管分类任务更困难,它仍保持或提高了基线准确率,同时将识别的类群数量从森林内部视频的22个增加到29个,河岸视频从4个增加到9个。在公园边缘用例中,它将准确率从0.62提高到0.86,并将误报从11次减少到0次。这些结果表明,DeepForestVisionV2在保持跨站点、栖息地和相机设置鲁棒性的同时,显著提高了野外实用性。

英文摘要

Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline workflow for both photographs and videos, and previous work showed that it outperformed other available baselines on a comparable benchmark. However, it was designed for closed-canopy, ground-level forest interiors and uses a 35-class prediction space that becomes too coarse when deployments encounter arboreal primates, birds, semi-aquatic taxa, or human-associated confounders such as livestock. We present DeepForestVisionV2, an ecology-driven expansion from 35 to 64 prediction classes (61 animal classes plus human, vehicle, and blank) designed to address three recurrent deployment gradients: vertical stratification, scene openness, and anthropogenic interfaces. DeepForestVisionV2 retains the same offline workflow and is trained on 1,535,010 photographs and 243,354 videos from multi-country African tropical-forest projects. Evaluation combines a cross-country cropped-photo validation set, used to assess robustness across sites and camera-trap settings, with three held-out Uganda video benchmarks spanning the targeted gradients. On the validation set, DeepForestVisionV2 reaches 0.86 accuracy, 0.82 macro-F1, and 0.81 balanced accuracy. On the deployment benchmarks, it preserves or improves baseline accuracy despite its harder classification task, while increasing the number of identified taxa from 22 to 29 in forest-interior videos and from 4 to 9 at riverbanks. In the park-edge use case, it raises accuracy from 0.62 to 0.86 and reduces false alarms from 11 to 0. These results show that DeepForestVisionV2 materially improves field utility while preserving robustness across sites, habitats, and camera-trap settings.

2606.20218 2026-06-19 cs.SD 新提交

Zero-VC: Zero-Lookahead Streaming Voice Conversion via Speaker Anonymization

Zero-VC: 通过说话人匿名化实现零前瞻流式语音转换

Yudong Li, Zihao Fang, Junwen Qiu, Ruihai Jing, Ruixiang Hang, Yingda Shen, Zhizheng Wu

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Shenzhen Loop Area Institute(深圳环域研究所) Shenzhen Transsion Holdings Co., Ltd.(深圳传音控股股份有限公司)

AI总结 针对流式零样本语音转换中音色与语言内容解耦的挑战,提出将说话人匿名化作为扰动机制,在保留韵律效用的同时显式减轻音色泄露,实现严格因果的零前瞻网络。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

流式零样本语音转换在不解耦音色与语言内容的情况下,难以避免降低效用或增加延迟。当前方法依赖于信息瓶颈(IB)或说话人扰动。虽然IB过滤了音色,但它丢弃了韵律,迫使模型显式注入基频等特征。这通常需要缓冲未来帧,产生算法前瞻延迟。另一方面,现有的扰动方法在很大程度上忽略了音色泄露与效用保留之间的关键权衡。认识到这一被忽视的权衡,我们发现说话人匿名化(SA)的内在目标与平衡这些因素高度一致。因此,我们引入SA作为一种新颖的扰动机制,在保留韵律效用的同时显式减轻音色泄露。关键在于,SA的鲁棒表示显著减轻了生成器对未来上下文的依赖,使我们能够实现严格因果的零前瞻网络。音频样本可在此https URL获取。

英文摘要

Streaming zero-shot voice conversion struggles to disentangle timbre from linguistic content without degrading utility or inflating latency. Current methods rely on information bottleneck (IB) or speaker perturbation. While IB filters out timbre, it discards prosody, forcing models to explicitly inject features like fundamental frequency. This often requires buffering future frames, creating algorithmic lookahead latency. On the other hand, existing perturbation methods largely overlook the crucial trade-off between timbre leakage and utility preservation. Recognizing this neglected trade-off, we find that the inherent objective of Speaker Anonymization (SA) aligns well with balancing these factors. Thus, we introduce SA as a novel perturbation mechanism to explicitly mitigate timbre leakage while retaining prosodic utility. Crucially, SA's robust representations significantly alleviate the generator's reliance on future context, enabling our strictly causal, zero-lookahead network. Audio samples are available at https://amphionteam.github.io/Zero-VC-demo/.

2606.20216 2026-06-19 cs.LG cs.AI 新提交

Learner-based Concept Drift Detection: Analysis and Evaluation

基于学习器的概念漂移检测:分析与评估

Md Moman Ul Haque Khan, Samira Sadaoui

发表机构 * Department of Computer Science, University of Regina(里贾纳大学计算机科学系)

AI总结 本文从理论上分析概念漂移特征,并评估多种漂移检测算法在合成和真实数据集上的性能,旨在增强对漂移检测器行为及其适用性的理解。

Comments 2 authors, 29 pages

详情
AI中文摘要

部署于演化流环境中的机器学习算法必须处理非平稳数据分布,即所谓的概念漂移。概念漂移的存在对许多实际应用构成重大挑战,因为它会严重降低预测性能,阻碍其支持稳健决策的能力。因此,及时高效地检测漂移事件对于长期保持高准确性至关重要。本研究从理论上考察了概念漂移特征以及多个类别的多种漂移检测算法。此外,我们评估了它们在合成和真实数据集上的性能,这些数据集展示了多样的流场景和漂移特征,如突变和渐变。本研究旨在增强对概念漂移特征和漂移检测器行为这一复杂概念的理解,以及它们在不同情境下的适用性。

英文摘要

Machine learning algorithms deployed for evolving streaming environments must handle the non-stationary data distributions, commonly referred to as concept drift. The presence of concept drift poses a major challenge for many real-world applications because it can severely degrade their predictive performance, hindering their ability to support robust decision-making. Consequently, the timely and efficient detection of drift events is critical for sustaining high accuracy over time. This study examines theoretically the concept drift characteristics and numerous drift detection algorithms across several categories. Furthermore, we evaluate their performance on both synthetic and real-world datasets exhibiting diverse streaming scenarios and drift characteristics, such as abrupt and gradual changes. This study aims to enhance understanding of the complex notion of concept drift characteristics and behavior of drift detectors, along with their applicability to diverse contexts.

2606.20215 2026-06-19 cs.CR 新提交

GNSS Spoofing Threat for V2X communications

GNSS欺骗对V2X通信的威胁

Adolfo P. Jimenez, Juan Arquero-Gallego, Mario P. Luna, Jose E. Naranjo, Felipe Jimenez Alonso

AI总结 本文提出利用廉价软件定义无线电(SDR)对V2X通信实施GNSS欺骗攻击的方法,并在真实设备上验证了攻击效果,揭示了V2X通信易受欺骗且难以检测的安全漏洞。

Comments 2026 IEEE\@. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
AI中文摘要

全球导航卫星系统(GNSS)是车联网(V2X)领域提供关键定位、导航和授时(PNT)服务的核心技术,对于生成维护网络可靠性和车辆安全性的协作感知消息(CAM)不可或缺。然而,GNSS信号极易受到欺骗攻击,这是一种高级攻击,攻击者发送模拟合法卫星特征的精心构造信号,误导接收器计算出错误位置。本文提出了一种使用廉价软件定义无线电(SDR)进行物理欺骗的方法,描述了一个坐标生成流水线,该流水线采用基于Haversine的距离计算、时间离散化以模拟恒定速度,以及线性插值来生成高保真GPS基带信号。所提出的攻击在真实的Commsignia车载单元(OBU)和路侧单元(RSU)设备上,使用HackRF One在三种场景下进行了实验验证,这些场景模拟了90 km/h、145 km/h和200 km/h稳定速度下的合成轨迹。本文最重要的贡献是证明了V2X通信并不安全,因为它们容易受到GNSS欺骗攻击,导致服务降级而未被检测到。

英文摘要

Global Navigation Satellite Systems (GNSS) constitute a core technology for delivering crucial positioning, navigation, and timing (PNT) services in the Vehicle-to-Everything (V2X) domain, where they are indispensable for generating Cooperative Awareness Messages (CAM) that uphold network reliability and vehicular safety. Yet, GNSS signals are acutely exposed to spoofing, an advanced attack in which an adversary transmits crafted signals that replicate legitimate satellite characteristics, misleading the receiver into computing a false position. This work presents a methodology for conducting physical spoofing with inexpensive Software Defined Radio (SDR), describing a coordinate generation pipeline that employs Haversine-based distance calculations, temporal discretization to emulate constant velocity, and linear interpolation to produce high-fidelity GPS baseband signals. The proposed attack is experimentally validated on real Commsignia OnBoard Unit (OBU) and RoadSide Unit (RSU) devices using a HackRF One across three scenarios that emulate synthetic trajectories at steady speeds of 90 km/h, 145 km/h, and 200 km/h. The most significant contribution of this paper is the demonstration that V2X communications are not secured, as they are susceptible to GNSS spoofing attacks, which cause service degradation without being detected.

2606.20214 2026-06-19 cs.CR 新提交

Accelerating Trust Convergence in IIoT: A ML Approach for Dynamic Network Conditions

加速工业物联网中的信任收敛:一种针对动态网络条件的机器学习方法

Aymen Bouferroum, Valeria Loscri, Abderrahim Benslimane

AI总结 针对工业物联网中网络质量波动导致信任收敛慢的问题,提出基于机器学习的信任收敛加速方法,通过预测收敛时间并动态调整转移概率,在挑战性条件下将收敛时间减少28.6%,并提升恶意节点场景下的评估准确性。

Comments Symposium: Communication \& Information Systems Security (CISS)

Journal ref IEEE Global Communications Conference (GLOBECOM) 2025, Dec 2025, Taipei, Taiwan. pp.4427-4432

详情
AI中文摘要

在工业物联网(IIoT)环境中,信任管理在保障系统安全方面起着至关重要的作用,尤其是在处理资源受限设备时。传统的信任模型往往忽视了网络质量波动的影响,导致信任收敛速度慢且评估不准确。在本文中,我们提出了一种动态信任管理解决方案,称为信任收敛加速(TCA)方法,该方法集成了机器学习(ML)以在恶劣网络条件下加速信任收敛。我们的模型基于关键网络指标预测信任收敛所需的时间单位,并动态调整信任模型中的转移概率以提高收敛速度。通过使用基于IEEE 802.11标准的模拟框架,该框架包含真实的Wi-Fi信道条件,我们展示了基于TCA方法的有效性,在挑战性条件下实现了高达28.6%的信任收敛时间减少。此外,所提出的解决方案在涉及恶意节点的场景中表现出韧性,提高了信任评估的准确性。这项工作为动态工业环境中的IIoT系统提供了一个可扩展且自适应的信任框架,确保了在不同网络条件下的稳健性能。

英文摘要

In Industrial Internet of Things (IIoT) environments, trust management plays a vital role in securing systems, especially when dealing with resource-constrained devices. Traditional trust models often overlook the impact of fluctuating network quality, leading to slower trust convergence and inaccurate assessments. In this paper, we propose a dynamic trust management solution, known as the Trust Convergence Acceleration (TCA) approach, which integrates Machine Learning (ML) to accelerate trust convergence under poor network conditions. Our model predicts the number of time units needed for trust convergence based on key network metrics and dynamically adapts transition probabilities in the trust model to enhance convergence speed. Using a simulation framework that incorporates realistic Wi-Fi channel conditions based on the IEEE 802.11 standard, we demonstrate the effectiveness of the TCA-based approach, achieving up to a 28.6% reduction in trust convergence time under challenging conditions. Furthermore, the proposed solution exhibits resilience in scenarios involving malicious nodes, improving trust evaluation accuracy. This work provides a scalable and adaptive trust framework for IIoT systems in dynamic industrial environments, ensuring robust performance under varying network conditions.

2606.20212 2026-06-19 cs.CL 新提交

CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in Czechia

CzechDocs:捷克少数民族语言格式化文档的多路平行数据集

Josef Jon, Ondřej Bojar

发表机构 * Charles University, Faculty of Mathematics Physics Institute of Formal

AI总结 提出CzechDocs多路平行格式化文档数据集,覆盖捷克及少数民族语言,支持评估保留格式的机器翻译系统,并公开验证子集与评估工具。

详情
AI中文摘要

我们提出了CzechDocs,一个多路平行的格式化文档(HTML、DOCX和PDF)数据集,涵盖捷克语及捷克境内使用的少数民族语言——主要是乌克兰语和英语,以及少量越南语、俄语和其他语言。该数据集旨在支持评估旨在翻译过程中保留文档格式的机器翻译系统。我们在数据集的验证子集上比较了最常见的格式保留机器翻译方法。该验证子集连同评估工具包已公开发布,以供进一步研究。一个保留的测试子集将用于未来专注于文档级翻译并保留格式的共享任务。

英文摘要

We present CzechDocs, a multiway parallel dataset of formatted documents (HTML, DOCX, and PDF) covering Czech and minority languages used in Czechia-primarily Ukrainian and English, with smaller portions of Vietnamese, Russian and other languages. The dataset is designed to support the evaluation of machine translation systems that aim to preserve document formatting during translation. We provide a comparison of the most common approaches to format-preserving machine translation on a validation subset of the dataset. This validation split, together with the evaluation toolkit, is publicly released for further research. A held-out test split will be reserved for a future shared task focused on document-level translation with formatting preservation.

2606.20210 2026-06-19 cs.AI 新提交

Augmenting Game AI with Deep Reinforcement Learning

用深度强化学习增强游戏AI

Alessandro Sestini, Joakim Bergdahl, Amir Baghi, Jean-Philippe Barrette-LaPierre, Florian Fuchs, Linus Gisslén

发表机构 * Electronic Arts (EA), Stockholm, Sweden(美国艺电公司(EA),斯德哥尔摩,瑞典)

AI总结 本文提出一种框架,通过深度强化学习训练游戏AI,以增强角色行为的真实感,并探讨了部署中的挑战与未来研究方向。

Comments Vision paper, published in Conference on Games 2026

详情
AI中文摘要

视频游戏的沉浸感不仅取决于图形、音频和游戏机制,还取决于游戏内角色的质量。产生可信的角色(即游戏AI)仍然是一个重大挑战,因为行为复杂性难以通过手工编码系统捕捉。游戏AI是沉浸感和参与度的来源;然而,由于创建游戏AI的挑战所带来的限制,常常导致玩家沮丧并打破游戏内的真实感幻觉。机器学习模型的引入为在游戏中创建更可信、更真实、更易共鸣的角色打开了大门。其前景是,它们要么通过与游戏互动学习,要么从玩家数据中学习,以发展出真正类似人类的行为。在本文中,我们展望未来强化学习在游戏AI中的更多应用。为实现这一目标,当前的研究限制阻碍了其在各种游戏类型中的广泛部署。因此,我们提出一个框架,用于训练强化学习模型,并考虑了一套适合游戏AI和游戏开发的需求。我们展示了带有强化学习增强游戏AI的游戏示例,并描述了在现代游戏中部署面向玩家的机器学习代理的实践。此外,我们识别了这些领域的瓶颈和难题,我们认为这些为加速机器学习在游戏AI中的应用提供了有前景的研究方向,以推动视频游戏行业的发展。

英文摘要

Immersion in video games depends not only on graphics, audio, and game mechanics, but also on the quality of in-game characters. Producing believable characters, or game AI, remains a significant challenge as behavioral complexity is hard to capture with hand-coded systems. Game AI is a source of immersion and engagement; however, the limitations stemming from the challenges of creating game AI often lead to frustration and the breaking of the illusion of realism within the game. The introduction of machine learning models opens the door to creating more believable, authentic, and relatable characters in games. The promise is that they either learn from interacting with the game, or from player data, to develop true human-like behavior. In this paper, we envision more applications of reinforcement learning for game AI in the future. For this to materialize, current research limitations are prohibitive to broad deployment across game genres. Therefore, we propose a framework for training reinforcement learning models with a set of requirements in mind that are suited towards game AI and game development. We present examples of games with reinforcement learning-augmented game AI and describe the practicalities of deploying player-facing machine learning agents in modern games. Furthermore, we identify bottlenecks and hard problems in these areas, which we believe offer promising research directions to accelerate the adoption of machine learning in game AI for the video game industry.

2606.20209 2026-06-19 cs.RO cs.AI 新提交

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

FlowMaps: 使用流匹配建模长期多模态物体动态

Francesco Argenziano, Miguel Saavedra-Ruiz, Sacha Morin, Charlie Gauthier, Daniele Nardi, Liam Paull

发表机构 * Sapienza University of Rome(罗马大学) Université de Montréal(蒙特利尔大学) Mila - Quebec AI Institute(米拉-魁北克人工智能研究所)

AI总结 提出FlowMaps模型,通过潜在流匹配学习物体位置的多模态时空分布,预测动态物体未来位置,提升机器人在变化家庭环境中的导航性能。

详情
AI中文摘要

对3D场景的联合空间和时间理解是部署在日常家庭环境中的机器人的关键要求。这些智能体不仅必须理解和导航空间布局,还必须推理这些空间如何随时间演变。特别是,人类每天与物体互动,导致物体在整个环境中改变位置,使机器人难以可靠地将当前观察与先前看到的物体关联起来。然而,这些互动并非随机:人类的习惯和日常行为在物体位置上产生了时空一致的模式,机器人智能体可以学习这些模式,然后将其用于下游任务,如导航。为此,我们引入了FlowMaps,一种潜在流匹配模型,用于估计连续3D空间中动态物体未来位置的多模态分布。通过学习物体之间的隐式依赖关系及其时间演变,FlowMaps预测物体位置在人类过去互动条件下的可能变化,同时支持在具有相似物体习惯的未见环境中的泛化。为了展示该方法的实用性,我们在模拟和真实环境中将FlowMaps部署到下游的动态物体导航任务中。在超过600个回合中,FlowMaps优于最先进的方法,表明通过连续、多模态的时空分布建模物体动态可以改善机器人在变化家庭环境中的搜索和导航。代码和附加材料可在此https URL获取。

英文摘要

Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

2606.20208 2026-06-19 cs.AI cs.DB cs.NE 新提交

Beyond Accuracy: Measuring Logical Compliance of Predictive Models

超越准确性:衡量预测模型的逻辑合规性

Guillaume Olivier Delplanque, Pierre Genevès, Nabil Layaïda, Zephirin Faure

AI总结 提出规则违反分数(RVS),一种独立于预测准确性的评估指标,用于量化预测模型对逻辑规则的遵守程度,并通过实验证明两个准确率相近的模型可能表现出截然不同的逻辑合规性。

详情
AI中文摘要

机器学习模型主要通过预测性能指标进行评估,如排序质量、预测误差或分类准确性。虽然这些指标有效量化了预测与真实值的匹配程度,但它们不评估模型输出是否尊重预定义的逻辑或领域特定约束。在医疗、金融和自主系统等高安全性应用中,逻辑一致性与预测准确性同样关键,但尚无标准指标捕捉这一维度。我们引入了规则违反分数(RVS),这是一种互补的评估指标,独立于预测准确性,量化预测模型对给定逻辑规则集的遵守程度。RVS 对硬规则(严格约束)和软规则(统计规律)区别对待,可在任何数据集和任何在关系词汇上表达的预测模型上进行评估,并可通过为 Horn 规则自动生成的 SQL 查询进行计算。除了评估模型,RVS 还可以评估训练数据集的逻辑一致性,并帮助识别定义不良的规则。我们在三个基准测试上评估了 RVS,涵盖知识图谱链接预测和关系回归,包括基于规则、基于嵌入和神经符号的预测模型。我们的结果表明,两个实现相当预测准确性的模型可能表现出显著不同的逻辑合规性,揭示了标准指标无法捕捉的模型行为差异。

英文摘要

Machine learning models are predominantly evaluated through predictive performance metrics such as ranking quality, prediction error, or classification accuracy. While these metrics effectively quantify how closely predictions match the ground truth, they do not assess whether model outputs respect predefined logical or domain-specific constraints. In high-stakes applications, including healthcare, finance, and autonomous systems, logical consistency can be as critical as predictive accuracy, yet no standard metric captures this dimension. We introduce the Rule Violation Score (RVS), a complementary evaluation metric that quantifies the extent to which a predictive model respects a given set of logical rules, independently of predictive accuracy. RVS treats hard rules (strict constraints) and soft rules (statistical regularities) differently, can be evaluated on any dataset and on any predictive model expressed over a relational vocabulary, and can be computed using SQL queries that are automatically generated for Horn rules. Beyond evaluating models, RVS can also evaluate the logical consistency of training datasets and help identify poorly defined rules. We evaluate RVS on three benchmarks covering knowledge graph link prediction and relational regression, including rule-based, embedding-based, and neuro-symbolic predictive models. Our results demonstrate that two models achieving comparable predictive accuracy can exhibit substantially different levels of logical compliance, revealing differences in model behavior that standard metrics fail to capture.

2606.20205 2026-06-19 cs.AI cs.CL cs.HC 新提交

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

大语言模型的心理特征很大程度上是测量假象

Jelena Meyer, David Garcia, Dirk U. Wulff

发表机构 * Max Planck Institute for Human Development(马克斯·普朗克人类发展研究所) University of Konstanz(康斯坦茨大学) Barcelona Supercomputing Center(巴塞罗那超级计算中心) University of Basel(巴塞尔大学)

AI总结 通过心理测量框架分析56个指令微调LLM,发现模型间差异主要源于方向性响应偏差而非特质,该偏差解释了81-90%的变异,且可通过题目选择操控,表明LLM心理特征是测量假象。

详情
AI中文摘要

专为人类设计的心理测量工具越来越多地被用于赋予大型语言模型(LLM)稳定的心理特征,这些特征影响其可用性、安全评估以及作为人类参与者的研究代理。使用正式的心理测量框架,我们表明这些特征很大程度上是测量假象。我们对56个指令微调LLM以及大型人类参考样本施测了一系列涵盖自我报告和行为任务的人格与风险偏好工具,报告了四个发现。第一,模型间差异并非由工具所针对的特质驱动,而是由方向性响应偏差驱动,即倾向于向量表一端或某个标签选项做出反应,而不考虑项目内容;方差分解将81-90%的模型间变异归因于这种偏差,而在人类中这一比例为9-16%。第二,偏差随模型能力提升而下降,但并未被消除。第三,由于响应由偏差而非特质驱动,工具的表面信度几乎完全由其响应正交性预测,这是我们提出的术语,指特质和偏差指向相反方向的项目比例。第四,模型呈现的特征随所用项目而变化,并可通过项目选择来制造。这些结果表明,LLM的表面心理特征是用于测量它们的工具的假象,而非模型本身的属性。由于从人类心理学借用的工具很少完全正交,且可能对LLM天生缺乏效度,我们呼吁以响应正交性为中心进行专门的评估。

英文摘要

Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings. First, differences between models are driven not by the traits an instrument targets but by a directional response bias, a tendency to respond toward one end of the scale, or one labeled option, regardless of item content; a variance decomposition attributes 81-90% of between-model variation to this bias, against 9-16% in humans. Second, the bias declines with model capability but is not eliminated by it. Third, because bias rather than trait drives responding, an instrument's apparent reliability is almost entirely predicted by its response orthogonality, a term we coin for the proportion of items for which trait and bias point in opposite directions. Fourth, the profile a model appears to have shifts with the items used and can be manufactured through item selection. These results demonstrate that the apparent psychological profiles of LLMs are artifacts of the instrument used to measure them, not properties of the models themselves. As instruments borrowed from human psychology are rarely fully orthogonal and may inherently lack validity for LLMs, we call for dedicated assessments centered on response orthogonality.

2606.20202 2026-06-19 cs.DS 新提交

Tight Algorithm and Hardness for Submodular Linear Ordering

子模线性排序的紧致算法与难度

Evan Abboud, Roy Schwartz

AI总结 针对一般子模函数的最小线性排序问题,提出多项式时间O(√(n/ln n))近似算法,并证明信息论下界匹配,任何多项式时间算法无法达到o(√(n/ln n))近似比。

Comments 25 pages. Accepted to the 53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)

详情
AI中文摘要

我们考虑最小线性排序问题:给定基数为$n$的集合$N$和非负集函数$f\colon 2^N\rightarrow \mathbb{R}_{\geq 0}$,目标是找到$N$的一个排列$\pi$,使得$\pi$的所有前缀上$f$值的和最小。该问题已被研究用于各种集函数类,其中子模$f$的情况特别受关注,因为它涵盖了经典问题,包括最小线性排列和最小包含区间图。在这项工作中,我们通过建立匹配的上界和下界,解决了一般子模$f$的最小线性排序问题的近似性,并给出:$(1)$一个多项式时间算法,实现$O(\sqrt{n/\ln n})$-近似;以及$(2)$一个匹配的信息论难度结果,表明任何对$f$进行多项式次数求值的算法都无法实现$o(\sqrt{n/\ln n})$-近似。此前,已知的最佳近似难度为$2$,而$O(\sqrt{n/\ln n})$-近似仅对$f$既是子模又是对称的特殊情况已知。

英文摘要

We consider the Minimum Linear Ordering Problem: given a ground set $N$ of cardinality $n$ and a non-negative set function $f\colon 2^N\rightarrow \mathbb{R}_{\geq 0}$, the goal is to find an ordering $π$ of $N$ that minimizes the sum of the values of $f$ over all prefixes of $π$. This problem has been studied for various classes of set functions, and the case of a submodular $f$ is of special interest, as it captures classic problems including Minimum Linear Arrangement and Minimum Containing Interval Graph. In this work, we resolve the approximability of the Minimum Linear Ordering Problem for a general submodular $f$ by establishing matching upper and lower bounds and present: $(1)$ a polynomial-time algorithm achieving an $O(\sqrt{n/\ln n})$-approximation; and $(2)$ a matching information-theoretic hardness result, showing that no algorithm evaluating $f$ a polynomial number of times can achieve an $o(\sqrt{n/\ln n})$-approximation. Previously, the best known hardness of approximation was $2$, and an $O(\sqrt{n/\ln n})$-approximation was known only for the special case where $f$ is both submodular and symmetric.

2606.20199 2026-06-19 cs.CV 新提交

Evaluation of Image Matching for Art Skills Assessment

艺术技能评估中的图像匹配评价

Asaad Alghamdi, Michael Poor, Trung-Nghia Le, Tam V. Nguyen

发表机构 * University of Dayton(代顿大学) University of Science, VNU-HCM(胡志明市国家大学理科大学) Vietnam National University, Ho Chi Minh City(胡志明市国家大学)

AI总结 提出通过手绘图像与模板匹配来评估绘画技能的方法,比较SIFT特征与孪生网络,发现SIFT关键点匹配更有效。

Comments MAPR 2024

详情
AI中文摘要

虽然有些人天生具有绘画天赋,但掌握这项技能需要专门的训练和练习。确定一个人的绘画技能需要适当的全面评估。在本文中,我们提出了一种通过将手绘图像与原始模板匹配来衡量绘画技能的方法。现有技术通常涉及复杂的过程。然而,计算机视觉的进步使我们能够训练计算机以类似人类的水平进行这些比较,从而解决了繁琐且耗时的传统过程。使用计算机视觉应用,确定图像相似性涉及识别图像与参考图像的相似程度。我们实现并分析了SIFT特征和孪生网络来衡量图像相似性。我们的结果表明,评估艺术技能水平是可行的。通过特征分析,我们发现基于SIFT的关键点匹配为检测绘画技能提供了更有效的手段。

英文摘要

While some individuals possess a natural talent for drawing, mastering this skill requires dedicated training and practice. Determining one's skill in the art of drawing requires proper comprehensive assessment. In this paper, we propose a method to measure drawing skill by by matching the hand-drawn image with the original template. Existing techniques often involve complex processes. However, advancements in computer vision allow us to train computers to perform these comparisons at a human-like level, thereby resolving the tedious and overwhelming traditional process. Using computer vision applications, determining image similarity involves identifying the level of similarities in an image with a reference image. We have implemented and analyzed the SIFT feature and Siamese network to measure image similarity. Our results indicate that it is feasible to assess art skill levels. Through feature analysis, we found that SIFT-based key point matching provides a more effective means of detecting drawing skills.

2606.20198 2026-06-19 cs.CL 新提交

Pitch Spelling Jazz Lead Sheets, Solo Transcriptions, Classical Piano and Monophonic Scores

爵士乐领谱、独奏转录、古典钢琴与单声部乐谱的音高拼写

Augustin Bouquillard, Florent Jacquemard

发表机构 * École polytechnique(巴黎综合理工学院) INRIA(法国国家信息与自动化研究所)

AI总结 提出一种音高拼写与调性估计算法,通过两阶段优化(模态与调性)联合估计音符名称、全局调号和每小节局部音阶,在多种数字乐谱数据集上验证有效性。

详情
AI中文摘要

我们提出了一种用于音高拼写和调性估计的算法。给定MIDI格式的输入,包含音符音高(以半音表示,相对于最低参考音)和小节边界信息,该算法估计适当的音符名称、全局调号以及每小节的局部音阶。这些相关信息元素在两个优化阶段中联合评估。在初始的“模态”阶段,通过最短路径搜索为每个小节提出一个可能的音阶,以最小化印刷乐谱中的临时记号数量。然后,在称为“调性”的第二阶段,这些局部音阶被用于估计调号和音符名称,从而为整首作品生成最佳音乐记谱。我们在包含多种数字乐谱的数据集上进行了评估:来自《Real Book》的爵士领谱、爵士独奏和贝斯线的录音转录、传统曲调,以及钢琴和单声部乐器的古典乐谱。我们的程序最初设计用于音乐转录,特别是构建从音频录音转录的爵士独奏数字集合,用于音乐分析、教学和文化遗产保护。该方法也应有助于其他与音乐记谱处理相关的任务。此外,为此我们定义了各种常见爵士音阶之间的新距离,这可能对音乐学研究有一定意义。

英文摘要

We present an algorithm for pitch spelling and key estimation. Given an input in MIDI-like format, containing information on note pitches (expressed in semitones relative to the lowest reference note) and bar boundaries, it estimates the appropriate note names, a global Key Signature, and a local scale for each bar. This related information elements are evaluated jointly during two stages of optimisation. During an initial 'modal' stage, a probable scale is proposed for each bar, minimising the number of accidentals to be printed in the printed score with a shortest-path search. Then, during a second stage called 'tonal', these local scales are used to estimate the Key Signature and note names that would result in the best musical notation for the entire piece. We present evaluations conducted on datasets comprising a variety of digital musical scores: jazz lead sheets taken from the Real Book, transcriptions of recordings of jazz soli and bass lines, traditional tunes, as well as classical scores for piano and monophonic instruments. Our procedure was originally designed for use in music transcription, specifically for building digital collections of jazz solos transcribed from audio recordings, for the purposes of music analysis, teaching and the preservation of cultural heritage. This method should also prove useful for other tasks related to the processing of musical notation. Furthermore, to this end, we have defined new distances between various common jazz scales, which may be of some interest to musicological studies.

2606.20197 2026-06-19 cs.RO 新提交

Stable Transformer-Actor-Critic Model Predictive Control: A Contraction Analysis Approach

稳定的Transformer-Actor-Critic模型预测控制:一种收缩分析方法

Antonio Marino, Valerio Modugno, Marco Cognetti

AI总结 提出一种Transformer-Actor-Critic MPC架构,通过证明Transformer满足增量输入-状态稳定性并利用黎曼收缩理论分析互联动力学,将理论界作为训练正则化项,实现可证明鲁棒的控制策略。

详情
AI中文摘要

Actor-Critic模型预测控制(MPC)有效解决了复杂的非凸控制问题,但保证这些流程中基于序列的学习模型的闭环稳定性仍然具有挑战性。本文介绍了一种新颖的Transformer-Actor-Critic MPC架构,具有形式化的鲁棒性保证。首先,我们证明了Transformer网络可以满足全局增量输入-状态稳定性($\delta$ISS)。然后,我们利用黎曼收缩理论分析物理对象与预测神经网络之间的互联动力学。最后,我们将这些理论界作为训练正则化项,以产生可证明鲁棒的策略。该框架在非线性3D无人机模型上进行了验证,执行目标到达和避障机动。

英文摘要

Actor-Critic Model Predictive Control (MPC) effectively addresses complex, non-convex control problems, but guaranteeing the closed-loop stability of sequence-based learning models within these pipelines remains challenging. This paper introduces a novel Transformer-Actor-Critic MPC architecture with formal robustness guarantees. First, we prove that Transformer networks can satisfy global incremental Input-to-State Stability ($δ$ISS). We then leverage Riemannian contraction theory to analyze the interconnected dynamics between the physical plant and the predictive neural network. Finally, we integrate these theoretical bounds as a training regularizer to yield a certifiably robust policy. The framework is validated on a nonlinear 3D drone model executing target-reaching and obstacle-avoidance maneuvers.

2606.20196 2026-06-19 cs.CV 新提交

Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation

一次蒸馏,终身适应:探索数据集蒸馏用于持续测试时适应

Hyun-Kurl Jang, Jihun Kim, Hyeokjun Kweon, Kuk-Jin Yoon

发表机构 * KAIST, Visual Intelligence Lab(韩国科学技术院,视觉智能实验室) Chung-Ang University, FOV Lab(中央大学,FOV实验室)

AI总结 提出DO-ALL框架,通过数据集蒸馏生成紧凑的合成锚点,在持续测试时适应中提供稳定参考,无需保留原始源数据,提升长期鲁棒性。

Comments ECCV 2026

详情
AI中文摘要

持续测试时适应(CTTA)旨在通过在线适应无标签数据,在目标域不断变化的情况下保持模型性能。然而,实际部署中由于隐私或许可限制,通常无法保留源数据集,而纯无源CTTA方法在长期分布偏移下容易变得不稳定,遭受累积的自训练错误和灾难性遗忘。我们提出DO-ALL(一次蒸馏,终身适应),一个即插即用的框架,通过数据集蒸馏(DD)以紧凑且保护隐私的形式重新利用源信息。在部署前,DO-ALL执行DD生成一小组合成蒸馏锚点,总结源分布。在适应过程中,每个目标样本与其语义最匹配的锚点对齐,该锚点通过源重放、表示对齐和流形平滑正则化为各种CTTA提供稳定参考。DO-ALL可以无缝集成到现有CTTA算法中,在CIFAR100-C、ImageNet-C和CCC基准测试中持续提升长期鲁棒性。这展示了利用DD在不保留原始源数据的情况下实现稳定连续适应的潜力。代码可在该https URL获取。

英文摘要

Continual Test-Time Adaptation (CTTA) aims to maintain model performance under evolving target domains by adapting online without labeled data. However, practical deployments often cannot retain the source dataset due to privacy or licensing constraints, and purely source-free CTTA methods tend to become unstable under long-term distribution shift, suffering from compounding self-training errors and catastrophic forgetting. We introduce DO-ALL (Distill Once, Adapt Life-Long), a plug-and-play framework that revisits source information in a compact and privacy-conscious form via Dataset Distillation (DD). Before deployment, DO-ALL performs DD to produce a small set of synthetic distilled anchors that summarize the source distribution. During adaptation, each target sample is matched with its most semantically aligned anchor, which provides a stable reference for various CTTA via source replay, representation alignment, and manifold-smoothing regularization. DO-ALL can be seamlessly integrated into existing CTTA algorithms, consistently improving long-term robustness across CIFAR100-C, ImageNet-C, and the CCC benchmark. This demonstrates the potential of leveraging DD to enable stable and continuous adaptation without retaining raw source data. The code is available at https://github.com/blue-531/DOALL.

2606.20193 2026-06-19 cs.RO 新提交

Belt-Finger: An Affordable Soft Belt-Driven Gripper for Dexterous In-Hand Manipulation

Belt-Finger: 一种经济实惠的软带驱动夹爪,用于灵巧的手内操作

Boya Zhang, Andreas Zell, Georg Martius

发表机构 * University of Tübingen(图宾根大学) Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所)

AI总结 提出一种双软带手指模块,为平行夹爪增加三个手内自由度(平移、俯仰、滚动),在保持低成本、易集成的同时提升灵巧操作能力,并通过MPC和遥操作验证其有效性。

详情
AI中文摘要

平行夹爪是机器人中默认的操纵器选择,因为它们简单、坚固且廉价。然而,其有限的手内移动性常常迫使大幅度的臂部运动,并限制了在狭窄工作空间中的灵巧操作。我们提出了一种平行夹爪的升级方案:一种基于双软带的指模块,在保留标准开合功能的同时增加了三个手内自由度(DoF):平移、俯仰和滚动。该机制故意保持简单,并设计为经济制造和直接集成,保留了传统平行夹爪的可靠性和精确控制,同时大大拓宽了操作能力的范围。为了展示新增自由度的实用性,我们将该夹爪集成到两个控制流程中。首先,我们调整了一个模型预测控制器,用于已知物体的手内操作。其次,我们引入了一个轻量级遥操作接口,能够以最少的硬件同时控制机器人臂和夹爪(总共10个自由度)。通过遥操作、MPC和训练策略执行的一系列具有挑战性的操作任务,与传统的平行夹爪相比,所提出的夹爪在灵巧性和任务可行性上持续改进。

英文摘要

Parallel-jaw grippers are the default manipulator choice in robotics because they are simple, robust, and inexpensive. Their limited in-hand mobility, however, often forces large arm motions and restricts dexterous manipulation in confined workspaces. We present a parallel-gripper upgrade: a double-soft-belt-based finger module that preserves standard opening/closing while adding three in-hand degrees of freedom (DoF): translation, pitch, and roll. The mechanism is deliberately kept simple and engineered for inexpensive manufacturing and straightforward integration, preserving the reliability and precise control of traditional parallel grippers while greatly broadening the range of manipulation capabilities. To demonstrate the utility of the added DoFs, we integrate the gripper in two control pipelines. First, we adapt a model predictive controller for in-hand manipulation of known objects. Second, we introduce a lightweight teleoperation interface that enables simultaneous control of the robot arm and gripper (10 DoFs total) with minimal hardware. Across a suite of challenging manipulation tasks executed via teleoperation, MPC, and trained policies, the proposed gripper consistently improves dexterity and task feasibility compared to a conventional parallel gripper

2606.20189 2026-06-19 cs.CV cs.AI cs.RO 新提交

HilDA: Hierarchical Distillation with Diffusion for Advancing Self-Supervised LiDAR Pre-trainin

HilDA:利用扩散的分层蒸馏推进自监督LiDAR预训练

Maciej Wozniak, Jesper Ericsson, Hariprasath Govindarajan, Truls Nyberg, Thomas Gustafsson, Patric Jensfelt, Olov Andersson

发表机构 * KTH Royal Institute of Technology(瑞典皇家理工学院) Linköping University(林雪平大学) TRATON AB(TRATON公司) Qualcomm Auto Ltd Sweden Filial(高通汽车有限公司瑞典分公司)

AI总结 提出HilDA框架,通过分层蒸馏(多层蒸馏和全局上下文蒸馏)结合时间占用扩散目标,自监督预训练LiDAR骨干网络,在3D检测、场景流和语义占用预测任务上达到最先进水平。

Comments Accepted to ECCV 2026. Maciej and Jesper contributed equally

详情
AI中文摘要

利用视觉基础模型(VFM)进行相机到LiDAR的知识蒸馏为解决真实世界自动驾驶中巨大的几何和运动多样性所需的标注数据稀缺问题提供了一种有前景的方案。然而,当前方法通常将VFM视为黑盒教师,仅依赖逐帧特征相似性。因此,它们未能充分利用教师的逐层语义结构和全局上下文,以及LiDAR序列中固有的丰富时空信息。我们提出HilDA,一个用于LiDAR骨干网络的自监督预训练框架,能更好地捕捉驾驶任务所需的语义“是什么”和几何“在哪里”。HilDA结合了分层蒸馏(包括用于渐进语义对齐的多层蒸馏和用于场景级语义的全局上下文蒸馏)与一个促进时空一致性的时间占用扩散目标。使用HilDA预训练的模型在跨模态蒸馏基准上取得了最先进的结果,并在3D目标检测、场景流和语义占用预测任务上优于通过先前蒸馏方法训练的模型。代码见:此 https URL。

英文摘要

Leveraging Vision Foundation Models (VFMs) for camera-to-LiDAR knowledge distillation offers a promising solution to the scarcity of annotated data needed to represent the immense geometric and kinematic diversity of real-world autonomous driving (AD). However, current approaches typically treat VFMs as black-box teachers, relying exclusively on frame-wise feature similarity. Consequently, they do not fully exploit the teacher's layer-wise semantic structure and global context, as well as the rich spatiotemporal information inherent in LiDAR sequences. We propose HilDA, a self-supervised pretraining framework for LiDAR backbones that better captures the semantic what and geometric where needed for driving tasks. HilDA combines hierarchical distillation comprising multi-layer distillation for progressive semantic alignment and global context distillation for scene-level semantics, with a temporal occupancy diffusion objective promoting spatiotemporal consistency. Models pre-trained with HilDA achieve state-of-the-art results on cross-modal distillation benchmarks and outperform models trained via prior distillation approaches on 3D object detection, scene flow, and semantic occupancy prediction. Code available at: https://maxiuw.github.io/hilda.

2606.20183 2026-06-19 cs.LG 新提交

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

有效维度主导量子核视觉模型的泛化

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

AI总结 通过有效维度d_eff解释量子视觉模型中纠缠结构增强泛化与量子噪声提升测试精度的现象,提出噪声形状核的谱分解与正则化机制。

详情
AI中文摘要

最近的量子视觉模型——量子视觉变换器和量子卷积网络——报告了两个引人注目但尚未解释的经验现象:(i) 具有更多或更均匀分布纠缠的拟设泛化更好,以及(ii) 注入量子噪声可以提高测试精度而不是降低它。这些观察目前被视为奇闻,通过网格搜索发现,并且如果有解释的话,也是手工进行的。我们表明,两者都是一个单一可测量量的表现:即(噪声形状的)量子特征核的\emph{有效维度}$d_{\rm eff}$。主要使用量子核视觉模型——由核分类器读出的量子特征映射——我们给出了一个谱解释,其中纠缠结构和量子噪声是调节$d_{\rm eff}$的两个旋钮;在过拟合区域,收缩$d_{\rm eff}$起到类似岭正则化的作用。我们分析了机制:退极化核$K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$的\emph{精确}分解,其中$d_{\rm eff}(K_p)\to1$,振幅阻尼的收缩结果(及其边界),核机器容量界,以及容量/对齐风险分解;在我们的纠缠实验中运作的单调收缩是经验验证的,并非普遍证明。沿着单参数退极化族,坍缩反而是通过构造精确的;我们仅用它来确认核分解到机器精度,最多达12个量子比特,而不是作为$d_{\rm eff}$的证据。振幅阻尼收缩$d_{\rm eff}$并沿倒U型最佳点将测试精度提升高达+13%;效应符号在过拟合和欠拟合区域之间翻转;噪声注入匹配显式谱过滤前沿。我们的结果将两个报告的现象组织成一个单一可测量原则,用于设计量子视觉模型。

英文摘要

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

2606.20179 2026-06-19 cs.CL 新提交

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

ReNikud:音频监督的希伯来语字素到音素转换

Maxim Melichov, Yakov Kolani, Morris Alper

AI总结 提出ReNikud方法,利用音频监督和伪元音化架构,通过无标注音频的ASR伪标签和字符级对齐,解决希伯来语G2P转换中的元音缺失和发音歧义问题,在多个基准上达到最优。

详情
AI中文摘要

现代希伯来语的字素到音素(G2P)转换对于文本到语音(TTS)等应用是必需的,但由于该语言的辅音音素文字系统(abjad)使元音大多不写出来,造成大量歧义,因此具有挑战性。标准方法首先预测元音变音符号(nikud)以生成国际音标(IPA)转录,但这存在局限性:元音化数据稀缺且制作费力,它不指定词汇重音等特征,并且反映的是正式语法规则而非日常口语发音。同时,直接的序列到序列IPA预测在有限数据上表现不佳,且未能利用辅音音素文字特有的字符级对齐。我们的方法ReNikud通过两个关键洞察克服了这些限制:(1)通过基于音素的自动语音识别(ASR)伪标签流水线,在数千小时无标注希伯来语音频上进行弱音频监督,生成反映自然口语规范的音位转录,无需人工标注。(2)一种伪元音化架构,在每个字符位置预测IPA音素,强制字符级对齐作为归纳偏置。在现有希伯来语G2P基准和针对口语希伯来语的新MILIM基准上的结果表明,ReNikud超越了先前的最先进方法。我们将发布代码和训练模型,以支持希伯来语TTS和语音技术的进一步研究。

英文摘要

Grapheme-to-phoneme (G2P) conversion for Modern Hebrew is needed for applications like text-to-speech (TTS), but is challenging due to the language's abjad writing system, which leaves vowels largely unwritten, creating substantial ambiguity. Standard approaches first predict vowel diacritics (nikud) to produce International Phonetic Alphabet (IPA) transcriptions, but this is limited: vocalization data is scarce and laborious to produce, it does not specify features such as lexical stress, and it reflects formal grammatical rules rather than everyday spoken pronunciation. Direct sequence-to-sequence IPA prediction, meanwhile, struggles on limited data and fails to exploit the character-level alignment characteristic of abjads. Our method, ReNikud, overcomes these limitations with two key insights: (1) Weak audio supervision via a phoneme-based automatic speech recognition (ASR) pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio, yielding phonemic transcriptions that reflect natural spoken norms without manual annotation. (2) A pseudo-vocalization architecture that predicts IPA phonemes at each character position, enforcing character-level alignment as an inductive bias. Results on existing Hebrew G2P benchmarks and the new targeted MILIM benchmark for spoken Hebrew show that ReNikud surpasses previous state-of-the-art methods. We will release our code and trained models to support further work on Hebrew TTS and speech technologies.

2606.20177 2026-06-19 cs.CV cs.AI 新提交

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

评估与增强遥感多模态大语言模型的否定理解能力

Haochen Han, Jue Wang, Alex Jinpeng Wang, Fangming Liu

发表机构 * Peng Cheng Laboratory(鹏城实验室) Tsinghua University(清华大学) Central South University(中南大学)

AI总结 提出RS-Neg基准评估遥感MLLMs的否定理解,并设计NeFo方法通过测试时学习利用约5%未标注样本显著提升模型性能。

Comments ECCV 2026 Accepted

详情
AI中文摘要

多模态大语言模型(MLLMs)在各种遥感(RS)任务中取得了显著成功。然而,它们理解否定的能力仍未得到充分探索,限制了在现实应用中的部署,其中模型必须明确识别什么是错误的或不存在的,例如,应急响应人员需要定位非洪水路线进行疏散。为了全面研究这一局限性,我们引入了RS-Neg,这是第一个从区域级到场景级任务评估否定理解的基准。具体来说,我们为遥感图像设计了一个自动数据生成流程,使用LLMs合成多样化的否定查询,并引入了一个动态视觉焦点模块进行验证。我们的评估表明,先进的遥感MLLMs在否定理解上存在困难,表现出幻觉和显著的性能下降。为了弥补这一差距,我们提出了NeFo,一种新颖的测试时学习方法,将否定的逻辑角色明确纳入模型优化。值得注意的是,使用约5%的未标注测试样本,NeFo显著提升了模型的否定理解能力,并展现出对未见任务的强泛化能力。代码和数据将在接收后发布。

英文摘要

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sensing (RS) tasks. However, their ability to comprehend negation remains underexplored, limiting deployment in real-world applications where models must explicitly identify what is false or absent, e.g., emergency responders need to locate non-flooded routes for evacuation. To comprehensively study this limitation, we introduce RS-Neg, the first benchmark to evaluate negation understanding across region-level to scene-level tasks. Specifically, we design an automated data generation pipeline for RS imagery, using LLMs to synthesize diverse negation queries, and introduce a dynamic visual focus module for verification. Our evaluation reveals that advanced RS MLLMs struggle with negation, exhibiting hallucinations and substantial performance degradation. To close this gap, we propose NeFo, a novel test-time learning method that explicitly incorporates the logical role of negation into the model optimization. Remarkably, using about 5\% unlabeled test samples, NeFo significantly improves the negation understanding of models and shows strong generalization to unseen tasks. Code and data will be released upon acceptance.

2606.20174 2026-06-19 cs.LG 新提交

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

基于无细胞DNA分析的多癌早期检测的计算方法与挑战

Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki

发表机构 * AGH University of Krakow(AGH克拉科夫大学) Norwegian Institute of Public Health(挪威公共卫生研究所)

AI总结 综述2022-2025年cfDNA多癌早期检测的计算方法,重点分析片段组学和表观遗传特征提取技术,指出多模态集成方法最具临床整合潜力,但需标准化评估协议。

详情
AI中文摘要

无细胞DNA(cfDNA)是非侵入性多癌早期检测(MCED)的一个有前景的途径,因为它可以通过单次抽血同时检测多种癌症,尤其对目前缺乏既定筛查程序的癌症具有敏感性。本文综述了2022年至2025年间基于cfDNA的MCED计算方法。我们重点关注如何提取和分析片段组学和表观遗传特征以在早期阶段检测癌症。我们首先简要概述cfDNA信号的生物学基础,然后回顾经典的统计和机器学习方法以及深度学习框架,包括基于自编码器的模型。对于每种方法,我们讨论其生物学可解释性、验证策略以及临床整合的准备情况。此外,我们将当前挑战分为技术、计算和方法论三类,并概述该领域的开放问题。本综述表明,多模态集成方法在临床整合方面具有最强的前景和最高的准备度。然而,为了更好地评估未来工作和进行并排比较,标准化评估协议和报告结果至关重要。

英文摘要

Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

2606.20173 2026-06-19 cs.SE 新提交

Qiskit Code Migration with LLMs

使用大语言模型进行Qiskit代码迁移

Jose Manuel Suarez, Luis Mariano Bibbo, Joaquin Bogado, Alenandro Fernandez

AI总结 针对量子软件开发套件版本演进导致的代码维护问题,提出结合大语言模型与检索增强生成(RAG)的混合方法,利用自动生成的迁移场景分类体系引导模型,实现Qiskit代码跨版本自动迁移,有效减少幻觉并提升迁移建议质量。

详情
AI中文摘要

量子开发套件(QDK)的快速演进引入了一种特定形式的技术债务,损害了代码可维护性并阻碍了软件复用。在量子软件工程(QSE)这一专业领域,高质量训练数据的稀缺和新兴框架的高波动性加剧了这一挑战,常导致通用大语言模型(LLM)产生不可靠或幻觉结果。本文提出一种将LLM与检索增强生成(RAG)相结合的混合方法,用于自动化Qiskit代码的跨版本迁移。所提方法通过利用自动生成的迁移场景分类体系作为结构化、版本特定的知识源来指导模型,从而提升迁移建议的精度和可靠性。该方法通过一个自动化、可扩展的工作流实现,评估了不同检索方案(无约束和限制性)下的LLM(Google Gemini Flash-2.5和OpenAI Gpt-oss-20b)。结果表明,基于分类体系的RAG架构,特别是在限制性方案下,显著减少了幻觉并提高了描述质量,其中Google Gemini Flash-2.5在检测复杂重构场景方面表现出更优性能。这些发现证实了这种以数据为中心的方法在促进技术独立性、提供缓解API过时问题的鲁棒智能助手方面的潜力,从而确保量子算法在快速变化的生态系统中的长期可用性,并降低量子软件工程(QSE)的学习曲线。

英文摘要

The rapid evolution of Quantum Development Kits (QDKs) introduces a specific form of technical debt that compromises code maintainability and hinders software reuse. In the specialized domain of Quantum Software Engineering (QSE), this challenge is intensified by the scarcity of high-quality training data and the high volatility of emerging frameworks, which often lead general-purpose Large Language Models (LLMs) to produce unreliable or hallucinated results. This paper proposes a hybrid approach integrating LLMs with Retrieval-Augmented Generation (RAG) to automate the migration of Qiskit code across versions. The proposed methodology enhances the precision and reliability of migration suggestions by leveraging an automatically generated taxonomy of migration scenarios as the structured, version-specific knowledge source to guide the models. The approach is implemented through an automated, extensible workflow evaluating LLMs (Google Gemini Flash-2.5 and OpenAI Gpt-oss-20b) under different retrieval schemes (unconstrained and restrictive). Results demonstrate that the taxonomy-based RAG architecture, particularly under the restrictive scheme, significantly reduces hallucinations and improves descriptive quality, with Google Gemini Flash-2.5 showing superior performance in detecting complex refactoring scenarios. These findings confirm the potential of this data-centric methodology to foster technological independence and provide robust, intelligent assistants that mitigate API obsolescence, ensuring the long-term availability of quantum algorithms within a rapidly shifting ecosystem and flattening the learning curve within Quantum Software Engineering (QSE).

2606.20172 2026-06-19 cs.LG 新提交

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

基于多模态胎儿MRI预测早产背景下的出生胎龄

Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

发表机构 * Leibniz University Hannover(莱布尼茨汉诺威大学)

AI总结 提出结合多模态胎儿MRI和机器学习流程预测出生胎龄,包括数据插补、特征选择和回归模型,在333例对照和93例早产数据上评估,R²=0.13,MAE=2.74周,准确率0.77。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:013

Journal ref Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

详情
AI中文摘要

早产与高死亡率和终身发病风险相关。复杂的多因素病因阻碍了准确预测和最佳护理。我们开发并评估了一个包含定制机器学习方法的流程,用于数据插补、特征选择和回归模型,以从333例对照和93例早产病例的综合多模态形态和功能胎儿MRI数据预测出生胎龄。将出生胎龄预测分为足月和早产类别,并报告其准确性、敏感性和特异性。进行了消融研究以进一步验证流程设计。使用分层10折交叉验证评估性能。该流程实现了0.13的R²分数和2.74周的平均绝对误差。在交叉验证中,准确率为0.77,敏感性为0.59,特异性为0.82。流程选择的主要特征包括宫颈长度和基于胎盘T2*值的统计量。快速、运动鲁棒的多模态胎儿MRI技术与机器学习预测的结合使得能够预测出生胎龄。这些信息对任何妊娠都至关重要。据我们所知,早产在文献中仅作为分类问题处理。因此,这项工作提供了概念验证。未来工作将增加队列规模,以允许在早产队列内进行更精细的分层。我们的代码可在以下网址获取:此https URL。

英文摘要

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

2606.20167 2026-06-19 cs.LG 新提交

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

多模态对比学习用于基于位置绑定的隐式地球嵌入

Jonathan Hecht, Lukas Arzoumanidis, Ziyue Li, Youness Dehbi

发表机构 * Computational Methods Lab, HafenCity University Hamburg(汉堡港城大学计算方法实验室) Dept. of Operations & Technology, Technical University of Munich(慕尼黑工业大学运营与技术系;海尔布隆数据科学中心;慕尼黑数据科学研究所) Heilbronn Data Science Center(波恩大学大地测量与地理信息研究所) Munich Data Science Institute Institute of Geodesy and Geoinformation, University of Bonn

AI总结 提出两种多模态对比学习架构MELT和SALT,通过位置绑定整合未配对地理数据,在四个下游任务中匹配最强双模态基线SATCLIP,但增加模态数未持续提升性能,表明位置编码器是主要瓶颈。

详情
AI中文摘要

空间预测任务通常受限于缺乏高质量标记的地面真值观测。为克服这一挑战,自监督预训练是一种可能的解决方案,其中对比学习在位置编码器中占主导地位。这些方法通常仅将地理坐标与一种额外模态对齐。我们提出了两种多模态对比学习架构:通过位置绑定的多模态嵌入(MELT)和顺序交替位置训练(SALT)。这些架构通过利用未配对的地理空间数据,将框架扩展到两种模态以上。两种方法在技术上均可行,并在四个下游任务中匹配了最强的双模态基线(SATCLIP)的性能。然而,增加模态数量并未持续提升性能,这表明所选的位置编码器是主要限制——对比目标在早期达到峰值,无论模态多样性或预训练量如何。MELT比SALT提供更稳定的训练,并为未来的扩展提供了更强的基础。

英文摘要

Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for location encoders. Those approaches usually align geographic coordinates with just one additional modality. We propose two multimodal contrastive learning architectures: Multimodal Embedding via Location Tying (MELT) and Sequential Alternating Location Training (SALT). These architectures expand this framework beyond two modalities by utilising unpaired geospatial data. Both methods are technically viable and match the performance of the strongest two-modality baseline (SATCLIP) across four downstream tasks. However, increasing the number of modalities does not consistently improve performance, suggesting that the chosen location encoder is the main limitation - the contrastive objective reaches its peak early, regardless of modality diversity or pre-training volume. MELT provides more stable training than SALT and presents a stronger foundation for future scaling.