arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.28697 2026-05-28 eess.IV cs.AI cs.CV

Deep Learning Strain Estimation: Is Physics-Based Simulation the Solution?

深度学习应变估计:基于物理的模拟是解决方案吗?

Thierry Judge, Nicolas Duchateau, Andreas Østvik, Khuram Faraz, Anders Austlid Taskén, Sigve Karlsen, Thor Edvardsen, Harald Brunvand, Md Abulkalam Azad, Havard Dalen, Bjørnar Grenne, Gabriel Kiss, Pierre-Yves Courand, Lasse Lovstakken, Pierre-Marc Jodoin, Olivier Bernard

AI总结 针对超声心动图中应变估计缺乏可靠运动参考的问题,提出一种结合真实视频散斑去相关测量与迭代细化过程的模拟策略,生成逼真数据集训练运动估计算法,在全局和区域应变上达到优于临床参考的性能。

详情
Comments
10 pages
AI中文摘要

斑点追踪超声心动图(STE)是心肌应变估计的临床标准。尽管在全局应变(GLS)上表现良好,但其区域应变的准确性仍然有限,尽管这一生物标志物对于早期诊断和表征细微异常高度相关。深度学习是一种有前景的替代方案,但其发展受到缺乏可靠运动参考的限制。现有解决方案要么依赖于STE衍生的标签,要么依赖于基于物理模型生成的模拟,但这些合成序列与临床数据相比仍缺乏足够的真实性。在本文中,我们提出了一种新的模拟策略,该策略结合了来自真实视频的散斑去相关测量,并使用迭代细化过程来改善模拟中的运动真实性。我们创建了一个包含1,478个视频及其参考运动的开源逼真数据集,用于训练超声心动图运动估计算法。所提出的方法在全局和区域应变上实现了无与伦比的性能,特别是在专家间设置中,GLS变异性达到1.42%,而临床参考为1.78%。

英文摘要

Speckle tracking echocardiography (STE) is the clinical standard for myocardial strain estimation. Despite good performance on global strain (GLS), its accuracy for regional strain remains limited, even though this biomarker is highly relevant for early diagnosis and the characterization of subtle abnormalities. from clinical data. Deep learning is a promising alternative, but its development is constrained by the lack of reliable motion references. Existing solutions rely either on STE-derived labels or on simulations generated by physics-based models, but these synthetic sequences still have limited realism compared with clinical data.In this paper, we propose a novel simulation strategy that incorporates speckle decorrelation measures from real videos and uses an iterative refinement process to improve the motion realism in the simulations. We created an open-source photorealistic dataset of 1,478 videos with reference motion, which was used to train an echocardiographic motion estimation algorithm. The proposed method achieves unmatched performance on global and regional strain, notably reaching a GLS variability of 1.42% in an inter-expert setting compared to 1.78% for the clinical reference.

2605.28674 2026-05-28 math.OC cs.DS cs.SY eess.SY math.AG

Disjunctive Sum of Squares

析取平方和

Amir Ali Ahmadi, Sanjeeb Dash, Yixuan Hua, Bartolomeo Stellato

AI总结 本文提出析取平方和概念,通过多个代数恒等式并行验证多项式非负性,并基于析取Positivstellensatz构建半定规划层次结构以优化多项式。

详情
AI中文摘要

我们引入了析取平方和的概念,用于证明多项式的非负性。与流行的平方和方法(通过单个代数恒等式证明非负性)不同,析取平方和方法通过多个可并行求解的代数恒等式来证明非负性。我们的主要结果是一个析取Positivstellensatz,证明了我们可以保持每个代数恒等式的次数与被问及非负性的多项式的次数相同。基于这一结果,我们构建了一个基于半定规划的收敛下界层次结构,用于在紧致基本半代数集上最小化多项式的问题,其中整个层次结构中最大的半定约束的大小是固定的。我们进一步证明了第二个析取Positivstellensatz,它导致了多项式优化的无优化层次结构。我们将这一结果专门用于证明矩阵的协正性问题。最后,我们描述了如何将析取平方和方法与分支定界算法相结合,并给出了多项式、协正和组合优化问题的数值实验。

英文摘要

We introduce the concept of disjunctive sum of squares for certifying nonnegativity of polynomials. Unlike the popular sum of squares approach where nonnegativity is certified by a single algebraic identity, the disjunctive sum of squares approach certifies nonnegativity with multiple algebraic identities which can be found in parallel. Our main result is a disjunctive Positivstellensatz proving that we can keep the degree of each algebraic identity as low as the degree of the polynomial whose nonnegativity is in question. Based on this result, we construct a semidefinite programming based converging hierarchy of lower bounds for the problem of minimizing a polynomial over a compact basic semialgebraic set, where the size of the largest semidefinite constraint is fixed throughout the hierarchy. We further prove a second disjunctive Positivstellensatz which leads to an optimization-free hierarchy for polynomial optimization. We specialize this result to the problem of proving copositivity of matrices. Finally, we describe how the disjunctive sum of squares approach can be combined with a branch-and-bound algorithm and we present numerical experiments on polynomial, copositive, and combinatorial optimization problems.

2605.28665 2026-05-28 eess.SY cs.SY math.OC

On the Solvability of Quasi-Regulator Equations in Non-smooth Output Regulation

非光滑输出调节中准调节方程的可解性

Zirui Niu, Daniele Astolfi, Giordano Scarciotti

AI总结 针对非光滑非周期外生信号下的线性系统输出调节问题,研究准调节方程的可解性,通过将其重述为微分代数方程并引入非光滑非共振条件,给出了可解性的充要刻画。

详情
Comments
7 pages, accepted by MTNS 2026
AI中文摘要

受实际应用中非光滑、可能非周期信号的普遍性驱动,线性系统在非光滑非周期外生信号下的输出调节已成为一个具有挑战性的问题。解决该问题的一个基本前提是所谓的“准调节方程”解的存在性。本文研究了这些方程的可解性。为此,我们将准调节方程重述为微分代数方程,并强调了系统相对度所起的关键作用。最终,我们提出了一个“非光滑非共振条件”,在特定的相对度要求下,该条件为准调节方程的可解性提供了充要刻画。

英文摘要

Motivated by the prevalence of non-smooth, possibly non-periodic signals in real-world applications, the output regulation of linear systems subject to non-smooth non-periodic exogenous signals has emerged as a challenging problem. A fundamental prerequisite for solving this problem is the existence of solutions to the so-called ``quasi-regulator equations''. In this paper, we investigate the solvability of these equations. To this end, we reformulate the quasi-regulator equations as differential-algebraic equations and highlight the critical role played by the system's relative degree. We finally propose a ``non-smooth non-resonance condition'' that, under specific relative degree requirements, provides a necessary and sufficient characterization of the solvability of the quasi-regulator equations.

2605.28654 2026-05-28 cs.RO cs.SY eess.SY math.OC

Integrated Exploration-Aware UAV Route Optimization and Path Planning

集成探索感知的无人机路径优化与轨迹规划

Jimin Choi, Grant Stagg, Cameron K. Peterson, Max Z. Li

AI总结 提出一种集成探索感知的无人机路径优化与轨迹规划框架,通过风险地图、不确定兴趣区域建模、B样条轨迹优化和在线重规划,在灾害监测中平衡报告点访问与新信息探索,实现平均KL散度降低15.9%。

详情
AI中文摘要

无人机越来越多地用于危险环境(如灾区、污染场地、野火区域和受损基础设施)中的探索驱动监测,此时有限的飞行续航必须在访问报告位置和收集新信息之间分配。在这些场景中,关于危险的先验信息通常不完整、空间不精确,并且在执行过程中可能发生变化。例如,初始报告可能识别出危险可能存在的区域,但实际危险可能被移动、部分观察到或完全未被报告。我们提出了一种集成的探索感知无人机路径优化与轨迹规划框架,用于在不确定和演变的先验信息下进行危险监测。环境被表示为空间风险地图,每个位置都有相关的危险状况信念。报告的危险被建模为不确定的兴趣区域(ROI),而不是确认的目标位置,要求无人机在检查报告区域的同时,利用有限的飞行续航探索信息丰富的区域。所提出的方法解决了报告ROI上的车辆路径问题,通过辅助伪节点增强路径以改善空间覆盖,将剩余飞行距离预算分配到路径段,并优化局部探索的动态可行B样条轨迹。在执行过程中,无人机测量更新基于网格的信念地图,当新信息和剩余预算证明调整合理时,对剩余轨迹进行重规划。在48种场景配置中,在线重规划相比离线优化规划器平均KL散度降低15.9%,相比直线遍历降低48.6%。

英文摘要

Uncrewed aerial vehicles (UAVs) are increasingly used for exploration-driven monitoring in hazardous environments such as disaster zones, contaminated sites, wildfire areas, and damaged infrastructure, where limited flight endurance must be allocated between visiting reported locations and gathering new information. In these settings, prior information regarding hazards is often incomplete, spatially imprecise, and subject to change during execution. For example, initial reports may identify a region where a hazard is likely to exist, but the actual hazard may be displaced, partially observed, or entirely unreported. We present an integrated exploration-aware UAV route optimization and path planning framework for hazard monitoring under uncertain and evolving prior information. The environment is represented as a spatial risk map, where each location has an associated belief of hazardous conditions. Reported hazards are modeled as uncertain regions of interest (ROIs) rather than confirmed target locations, requiring the UAV to inspect reported areas while also using its limited flight endurance to explore informative regions. The proposed method solves a vehicle routing problem over reported ROIs, augments the route with auxiliary pseudo-nodes to improve spatial coverage, allocates the remaining flight distance budget across route segments, and optimizes dynamically feasible B-spline trajectories for local exploration. During execution, UAV measurements update a grid-based belief map, and the remaining trajectory is replanned when new information and the remaining budget justify adaptation. Across 48 scenario configurations, online replanning improves average KL reduction by 15.9% over the offline optimized planner and 48.6% over straight-line traversal.

2605.28618 2026-05-28 eess.AS

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

多样场景下长篇语音生成的综合基准测试

Changhao Pan, Rui Yang, Han Wang, Zhuan Zhou, Xuming He, Wenxiang Guo, Ziyue Jiang, Ruiqi Li, Yu Zhang, Chenyuhao Wen, Ke Lei, Xiang Yin, Jingyu Lu, Zhiyuan Zhu, Zhou Zhao

AI总结 针对现有评估局限于有限领域且忽略一致性与连贯性等问题,提出Swanbench-Speech基准,从声学、语义和表现力三个维度分解长篇语音质量,涵盖17种场景、1101个样本,并定义七项自动评估指标,揭示当前模型在高表现力场景中的不足及与真实录音的差距。

详情
Comments
Accepted by ACL 2026(Findings). 36pages, 14figures
AI中文摘要

近期语音生成的进展实现了高保真合成,但在长上下文条件下对模型的系统评估仍 largely 未充分探索。长篇语音的综合评估基准不可或缺,原因有二:1)现有测试场景通常局限于有限领域,与多样化的下游应用之间存在显著差距;2)现有指标忽视了诸如一致性和连贯性等关键的长文本因素,无法可靠地泛化。为此,我们提出Swanbench-Speech,一个将长篇语音质量分解为特定、解耦维度的综合基准。SwanBench-Speech具有三个关键特性:1)丰富的语音场景:聚焦于长篇语音生成和对话生成,SwanBench-Speech涵盖声学、语义和表现力挑战,由跨越17种常见语音场景的1101个样本组成;2)全面的评估维度:沿声学、语义和表现力轴,SwanBench-Speech定义了一个包含七项指标的自动评估协议,以提供全面、准确和标准化的评估;3)有价值的见解:通过大量实验,我们揭示当前模型在高表现力场景中仍存在困难,并在一致性和层次结构方面与真实录音存在显著差距。

英文摘要

Recent advances in speech generation have enabled high-fidelity synthesis, yet systematic evaluation of models under long-context conditions remains largely underexplored. A comprehensive evaluation benchmark for long-form speech is indispensable for two reasons: 1) existing test scenarios are often confined to limited domains, creating a significant gap with the diverse downstream applications; 2) existing metrics overlook critical long-text factors such as consistency and coherence, failing to generalize reliably. To this end, we propose Swanbench-Speech, a comprehensive benchmark that decomposes long-form speech quality into specific, disentangled dimensions. SwanBench-Speech has three key properties. 1) Rich speech scenarios: Focusing on long-form speech generation and dialog generation, SwanBench-Speech covers acoustics, semantics, and expressiveness challenges, and consists of 1,101 samples spanning 17 common speech scenarios; 2) Comprehensive evaluation dimensions: Along the acoustics, semantics, and expressiveness axes, SwanBench-Speech defines an automated evaluation protocol with seven metrics to provide a comprehensive, accurate, and standardized assessment; 3) Valuable Insights: Through extensive experiments, we reveal that current models still struggle in highly expressive scenarios and exhibit a notable gap in consistency and hierarchy compared to real recordings.

2605.28583 2026-05-28 cs.RO cs.AI cs.LG cs.SY eess.SY

SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving

SARAD:基于LLM的安全感知混合强化学习与碰撞预测在自动驾驶中的应用

Kangyu Wu, Peng Cui, Guoxi Chen, Ya Zhang

AI总结 提出SARAD框架,结合大语言模型和深度强化学习,通过检索增强生成和碰撞预测模块提升自动驾驶的安全性和效率。

详情
Comments
7 pages, 4 figures, accepted by IJCNN 2026
AI中文摘要

确保自动驾驶系统决策的安全性和效率仍然是一个基本挑战。传统的深度强化学习(DRL)存在不安全的随机探索和收敛缓慢的问题,而大语言模型(LLM)在实时推理操作中表现出固有的延迟。为了解决这些限制,本文提出了SARAD,一种新颖的安全感知混合框架,协同LLM和DRL用于自动驾驶。SARAD用来自动态专家知识库的、经检索增强生成(RAG)增强的LLM引导决策替代了DRL的随机探索。提出了一个注意力判别器,将LLM的先验知识整合到DRL策略优化中。进一步设计了一个碰撞预测模块,使用历史碰撞数据进行微调,以提高车辆安全性。大量实验表明,SARAD在Highway-Env模拟器中实现了显著的性能提升,验证了所提模型在自动驾驶中的有效性。

英文摘要

Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large Language Models (LLMs) demonstrate inherent latency in real-time inference operations. To address these limitations, this paper proposes SARAD, a novel safety-aware hybrid framework that synergizes LLMs and DRL for autonomous driving. SARAD substitutes the random exploration of DRL with Retrieval-Augmented Generation (RAG)-enhanced, LLM-guided decisions sourced from a dynamic expert knowledge repository. An attention discriminator is proposed to integrate the prior knowledge of LLMs into DRL policy optimization. A collision predictor module, fine-tuned with historical collision data, is further designed to improve vehicle safety. Extensive experiments show that SARAD achieves significant performance improvements in the Highway-Env simulator, validating the effectiveness of the proposed model in autonomous driving.

2605.28560 2026-05-28 eess.SP physics.optics

Unified Analytical Framework for SPAD Array Receivers with Dead-Time-Induced Blocking Loss and Inter-Symbol Interference in PAM-OWC Systems

PAM-OWC系统中具有死区时间引起的阻塞损失和符号间干扰的SPAD阵列接收器的统一分析框架

Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu

AI总结 针对PAM-OWC系统中SPAD阵列接收器的死区时间引起的阻塞损失和符号间干扰,提出统一分析框架,建立统计模型并推导精确分布,提出低复杂度近最优阈值检测方案。

详情
AI中文摘要

利用单光子雪崩二极管(SPAD)阵列的光学无线通信(OWC)为光子匮乏链路提供了卓越的灵敏度。然而,SPAD固有的死区时间通过引入非线性光子计数失真——符号持续时间内阻塞损失和跨符号间干扰(ISI)——严重限制了可达数据速率。本文针对脉冲幅度调制(PAM),通过建立SPAD阵列接收器的全面统计模型,提出了一个统一的分析框架,该框架捕获了所有运行速度范围内的两种失真。对于低速和中速系统(符号持续时间长于死区时间),我们利用更新理论推导了光子计数概率分布的精确闭式表达式,明确包含了阻塞损失和ISI。对于高速系统(符号持续时间短于死区时间),我们开发了一个马尔可夫链模型来表征稳态运行状态,并将其与触发概率相结合以获得精确的二项式光子计数分布。此外,我们基于这些模型提出了低复杂度、近最优的阈值检测方案。这项工作为设计和优化采用PAM的高性能SPAD基OWC系统提供了必要的理论工具。

英文摘要

Optical wireless communication (OWC) leveraging single-photon avalanche diode (SPAD) arrays offers exceptional sensitivity for photon-starving links. However, the inherent dead time of SPADs critically limits achievable data rates by introducing non-linear photon-counting distortions: blocking loss within a symbol duration and inter-symbol interference (ISI) across durations. This paper proposes a unified analytical framework capturing both distortions across all operational speed regimes for pulse-amplitude modulation (PAM), by establishing comprehensive statistical models for SPAD array receivers. For low and medium-speed systems (symbol duration longer than dead time), we derive exact closed-form expressions for the photon counts probability distribution using renewal theory, explicitly incorporating blocking loss and ISI. For high-speed systems (symbol duration shorter than dead time), we develop a Markov chain model characterizing the steady-state operational states and integrate it with trigger probability to obtain the exact binomial photon counts distribution. Furthermore, we propose low-complexity, near-optimal threshold detection schemes based on these models. This work provides essential theoretical tools for designing and optimizing high-performance SPAD-based OWC systems employing PAM.

2605.28550 2026-05-28 math.OC cs.SY eess.SY

Model Predictive Control for Constrained Linear Positive Systems on Graphs

图上约束线性正系统的模型预测控制

Roland Schurig, David Ohlin, Anders Rantzer, Emma Tegling, Rolf Findeisen

AI总结 针对图上具有状态和输入容量约束的线性正系统,利用无约束问题的解析结构构建显式次优容许控制器,得到图可计算的性能界和无需终端条件的模型预测控制器的最小稳定化时域长度。

详情
AI中文摘要

描述具有固有非负状态和输入的网络的正系统自然出现在路由、物流和隔室建模中。我们考虑以关联形式建模的具有线性成本的正线性系统问题。状态(存储)和输入(节点间流量)上容量约束的加入显著增加了问题复杂度。利用无约束问题的解析结构,构建了一个显式次优容许控制器。这产生了图可计算的性能界和无需终端条件的模型预测控制器的最小稳定化时域长度。一个凸规划能够高效计算最优界和时域。这些结果突显了系统结构如何使得通常不可得的显式MPC保证成为可能。

英文摘要

Positive systems describing networks with inherently non-negative states and inputs arise naturally in routing, logistics, and compartmental modelling. We consider problems modelled as positive linear systems in incidence form with linear cost. The addition of capacity constraints on states (storage) and inputs (flows between nodes) significantly increases the problem complexity. Leveraging the analytic structure of the unconstrained problem, an explicit suboptimal admissible controller is constructed. This yields graph-computable performance bounds and a minimum stabilising horizon length for a model predictive controller without terminal conditions. A convex program enables efficient computation of the optimal bound and horizon. These results highlight how system structure enables explicit MPC guarantees that are typically not available.

2605.28547 2026-05-28 eess.SP

On Unified CRLB Framework from Generic Signals to ISAC Waveforms with Virtual Array Sensing

从通用信号到具有虚拟阵列感知的ISAC波形的统一CRLB框架

Yanpeng Su, Norman Franchi, Maximilian Lübke

AI总结 提出一个统一的克拉美-罗下界(CRLB)框架,用于集成感知与通信(ISAC)雷达系统中的信号级参数估计,解决了时延与多普勒耦合问题,并扩展到虚拟阵列感知系统。

详情
AI中文摘要

本文提出了一个统一的克拉美-罗下界(CRLB)框架,用于集成感知与通信(ISAC)雷达系统中的信号级参数估计。从通用信号模型出发,我们分析了Fisher信息矩阵(FIM)中时延与多普勒之间的耦合,这一问题在相关研究中尚未解决且常被忽视。针对这一问题,我们推导了耦合项可被消除的条件,并证明这些条件对于ISAC波形通常是满足的。随后,在该统一框架下推导了代表性ISAC波形的CRLB,实现了跨波形的一致和可比较分析,避免了模型依赖的差异。进一步,该框架被扩展到虚拟阵列(VA)感知系统,分析了不同复用方案的影响。仿真结果表明,从所提框架导出的CRLB与从波形特定分析得到的CRLB具有一致性。所提框架表现出强通用性、波形兼容性和灵活性,为各种波形的CRLB分析(包括缺乏现有分析结果的波形)提供了多功能工具。

英文摘要

This paper presents a unified Cramér-Rao lower bound (CRLB) framework for signal-level parameters in integrated sensing and communications (ISAC)-enabled radar systems. Starting from the generic signal model, we analyze the coupling between delay and Doppler in the Fisher information matrix (FIM), which is unsolved and often overlooked in relevant studies. Addressing this issue, we derive the conditions under which the coupling terms can be eliminated and demonstrate that these conditions are typically satisfied for ISAC-enabled waveforms. Afterward, the CRLBs of representative ISAC waveforms are derived within the unified framework, enabling consistent and comparable analysis across the waveforms and avoiding model-dependent discrepancies. Further, the framework is extended to virtual array (VA) sensing systems, where the impact of different multiplexing schemes is analyzed. Simulation results demonstrate the consistency between the CRLBs derived from the proposed framework and those obtained from waveform-specific analyses. The proposed framework shows strong generality, waveform-compatibility, and flexibility, offering a versatile tool for the CRLB analysis of various waveforms, including those lacking existing analytical results.

2605.28514 2026-05-28 eess.SP

Channel Measurements and Characterization with Phase Drift Compensation for Outdoor 330-360 GHz MIMO Communications

面向室外330-360 GHz MIMO通信的相位漂移补偿信道测量与特性分析

Tian Qiu, Taihao Zhang, Cunhua Pan, Hong Re, Yongchao He, Chenzhou Lin, Bingchang Hua, Jiangzhou Wang

AI总结 本文通过128×4虚拟天线阵列MIMO配置在330-360 GHz频段进行室外信道测量,提出相位漂移感知的SAGE算法以提升时延分辨率和参数估计精度,并全面表征了路径损耗、时延扩展、角度扩展等关键信道特性及近场效应与MIMO空间非平稳性。

详情
AI中文摘要

本文采用基于128×4虚拟天线阵列(VAA)的多输入多输出(MIMO)配置,在330-360 GHz频段进行了室外信道测量活动。发射机(Tx)和接收机(Rx)位置对分为视距(LoS)和非视距(OLoS)场景,以便详细研究室外太赫兹(THz)频段信道特性。在测量过程中,仔细验证了室外环境的平稳性,并识别出线性相位漂移(PD)效应。然后,我们提出了一种相位漂移感知的空间交替广义期望最大化(SAGE)算法,该算法显著提高了时延分辨率和信道参数估计精度。基于处理后的测量数据,我们表征了关键信道特性,包括功率时延谱、路径损耗、阴影衰落、时延扩展、角度扩展、莱斯K因子,以及它们的累积分布函数和相关特性。此外,还分析了近场效应和MIMO特定特性,包括空间非平稳性和簇生灭特性。

英文摘要

In this paper, an outdoor channel measurement campaign at 330-360 GHz employing a 128 * 4 virtual antenna array (VAA)-based multiple-input multiple-output (MIMO) configuration is conducted. The transmitter (Tx) and receiver (Rx) location pairs are classified into line-of-sight (LoS) and obstructed-LoS (OLoS) scenarios to enable a detailed investigation of outdoor terahertz (THz) band channel characteristics. During the measurement process, the stationarity of the outdoor environment is carefully verified, and a linear phase drift (PD) effect is identified. Then, we propose a PD-aware Space-Alternating Generalized Expectation-Maximization (SAGE) algorithm, which significantly improves both delay resolution and channel parameter estimation accuracy. Based on the processed measurement data, we characterize key channel properties, including the power delay profile, path loss, shadow fading, delay spread, angular spread, Rician K-factor, as well as their cumulative distribution functions and correlation characteristics. In addition, near-field effects and MIMO-specific properties, including the spatial non-stationarity and the cluster birth-death property, are analyzed.

2605.28480 2026-05-28 eess.AS cs.SD

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

Audio-Mind: 一种可审计的音频理解智能体框架

Yucheng Wang, Jing Peng, Hanqi Li, Chenghao Wang, Wenming Tu, Yu Xi, Zhaokai Sun, Kai Yu, Shuai Wang

AI总结 提出Audio-Mind框架,通过条件性证据获取动态结合强前端与规划器引导的工具使用,解决音频理解中智能体证据获取的时机问题,在MMAR和MSU-Bench上分别达到80.4%和82.8%的准确率,并生成可审计的推理轨迹。

详情
AI中文摘要

音频智能体通过将音频问题分解为工具调用、中间证据和迭代推理步骤来扩展大型音频语言模型(LALM)。然而,随着LALM变得更强,关键挑战从启用工具使用转变为确定智能体证据获取何时真正有益于音频理解。我们提出Audio-Mind,一个用于音频理解中条件性证据获取的可审计且可插拔框架。Audio-Mind动态结合强前端与规划器引导的工具使用,在初始证据足够时保留前端判断,同时为存在未解决证据差距的问题获取有界的外部证据。在MMAR和MSU-Bench上的实验表明,Audio-Mind优于先前的音频智能体基线,在MMAR上达到80.4%的准确率,在MSU-Bench上达到82.8%的准确率。匹配骨干网络的比较突显了这种设计的重要性:在强音频前端下,如果工作流不保留前端的整体音频基础判断,智能体分解可能成为编排瓶颈。除了准确性,Audio-Mind还产生更高质量、可审计的推理轨迹,暴露不确定性、工具证据和答案理由,为更可靠的音频问答标注和错误分析提供潜在基础。

英文摘要

Audio agents extend large audio-language models (LALMs) by decomposing audio questions into tool calls, intermediate evidence, and iterative reasoning steps. However, as LALMs become stronger, the key challenge shifts from enabling tool use to determining when agentic evidence acquisition genuinely benefits audio understanding. We propose Audio-Mind, an auditable and pluggable framework for conditional evidence acquisition in audio understanding. Audio-Mind dynamically combines a strong frontend with planner-guided tool use, preserving frontend judgment when initial evidence is sufficient while acquiring bounded external evidence for questions with unresolved evidence gaps. Experiments on MMAR and MSU-Bench show that Audio-Mind outperforms prior audio-agent baselines, reaching 80.4% accuracy on MMAR and 82.8% accuracy on MSU-Bench. A matched-backbone comparison highlights why this design matters: under strong audio frontends, agentic decomposition can become an orchestration bottleneck when the workflow does not preserve the frontend's holistic audio-grounded judgment. Beyond accuracy, Audio-Mind produces higher-quality, auditable reasoning traces that expose uncertainty, tool evidence, and answer rationales, offering a potential basis for more reliable audio-QA annotation and error analysis.

2605.28478 2026-05-28 eess.SY cs.SY

Towards Autonomous Commissioning of Industrial Drives via Multi-Objective Bayesian Optimization

通过多目标贝叶斯优化实现工业驱动器的自主调试

David Petrovic, Gian Antonio Susto, Angelo Cenedese

AI总结 提出一种基于多目标贝叶斯优化的全自动电流环调试方法,无需系统模型或固件修改,在真实硬件上实现与专家调试相当的性能。

详情
Comments
Submitted to IEEE ETFA 2026
AI中文摘要

工业电力驱动器的调试仍然严重依赖级联控制回路的手动调节,需要专家知识和大量时间。在本文中,我们提出了一种全自动方法,直接在真实硬件上使用贝叶斯优化(BO)来调节工业驱动器的电流控制回路,无需系统模型或固件修改。驱动器被视为黑盒系统,通过闭环实验迭代更新控制器参数。该调节问题被表述为多目标优化任务,直接最小化跟踪误差、时间加权误差、超调量和振荡行为,从而能够识别帕累托最优的控制器配置。为了处理离散参数、噪声评估和有限预算,我们采用多变量树形结构Parzen估计器(TPE)作为底层BO策略。所提出的方法在实际工业约束下运行,包括通信延迟和有限的评估预算。在空载条件下对真实电机驱动系统进行的实验验证表明,该方法在几分钟内无需人工干预即可达到与专家调试相当的性能。结果表明,基于高斯过程(GP)的BO可以产生极具竞争力的最终解决方案,但基于TPE的BO由于更快的收敛速度、更丰富的帕累托前沿近似和更低的计算开销,更适合此设置。

英文摘要

The commissioning of industrial electric drives still relies heavily on manual tuning of cascaded control loops, requiring expert knowledge and significant time. In this paper, we propose a fully automated approach for tuning the current control loop of industrial drives using Bayesian Optimization (BO) directly on real hardware, without requiring a system model or firmware modifications. The drive is treated as a black-box system, and the controller parameters are iteratively updated through closed-loop experiments. The tuning problem is formulated as a multi-objective optimization task that directly minimizes tracking error, time-weighted error, overshoot, and oscillatory behavior, enabling the identification of Pareto-optimal controller configurations. To address discrete parameters, noisy evaluations, and limited budgets, we adopt a multivariate Tree-structured Parzen Estimator (TPE) as the underlying BO strategy. The proposed method operates under practical industrial constraints, including communication latency and limited evaluation budgets. The experimental validation on a real motor drive system under no-load conditions shows that the method achieves performance comparable to expert tuning within a few minutes and without human intervention. Results show that Gaussian Process (GP)-based BO can yield highly competitive final solutions, but TPE-based BO is better aligned with this setting due to faster convergence, richer Pareto-front approximation, and lower computational overhead.

2605.28456 2026-05-28 cs.AI cs.CV eess.AS

Diffusion Large Language Models for Visual Speech Recognition

用于视觉语音识别的扩散大语言模型

Jeong Hun Yeo, Chae Won Kim, Hyeongseop Rha, Yong Man Ro

AI总结 提出首个基于扩散大语言模型(DLLM)的视觉语音识别框架DLLM-VSR,通过迭代掩码去噪和灵活顺序解码,结合置信度引导的解掩码策略及两阶段训练,并引入长度引导候选解码以降低目标长度不确定性,在LRS3上取得19.5%的词错误率。

详情
Comments
Code: https://github.com/JeongHun0716/dllm-vsr
AI中文摘要

现有的视觉语音识别(VSR)系统通常依赖于从左到右的自回归解码,这可能在获得足够上下文之前,迫使对视觉模糊的令牌做出过早决策。我们提出DLLM-VSR,据我们所知,这是首个基于扩散大语言模型(DLLM)的VSR框架,将转录过程表述为具有灵活顺序解码的迭代掩码去噪。通过基于置信度的解掩码,DLLM-VSR早期提交高置信度位置,并利用已提交的令牌作为双向上下文来细化模糊令牌。为了使DLLM适应VSR,我们引入了一种两阶段掩码去噪训练策略,将视觉到文本的内容对齐与长度建模分离。我们进一步观察到,在假设知道真实转录长度的oracle长度解码下存在性能差距,这表明减少目标长度不确定性可以改善基于DLLM的VSR。为了缩小这一差距,我们开发了长度引导的候选解码,利用视频时长构建合理的转录长度假设,在多个假设下解码,并使用长度合理性和解码置信度对候选进行重新排序。所提出的方法仅使用LRS3的标注训练数据,就实现了19.5%的词错误率(WER),达到了最先进水平。

英文摘要

Existing Visual Speech Recognition (VSR) systems commonly rely on left-to-right autoregressive decoding, which can force premature decisions on visually ambiguous tokens before sufficient context is available. We propose DLLM-VSR, to the best of our knowledge, the first Diffusion Large Language Model (DLLM)-based VSR framework, formulating transcription as iterative masked denoising with flexible-order decoding. With confidence-based unmasking, DLLM-VSR commits high-confidence positions early and uses the committed tokens as bidirectional context to refine ambiguous ones. To adapt DLLMs to VSR, we introduce a two-stage masked-denoising training strategy that separates visual-to-text content alignment from length modeling. We further observe a performance gap with oracle-length decoding, which assumes access to the true transcript length, indicating that reducing target-length uncertainty can improve DLLM-based VSR. To reduce this gap, we develop length-guided candidate decoding, which uses video duration to construct plausible transcript-length hypotheses, decodes under multiple hypotheses, and reranks candidates using length plausibility and decoding confidence. The proposed method achieves a state-of-the-art WER of 19.5\% on LRS3 using only its labeled training data.

2605.28453 2026-05-28 eess.SP

A Unified Framework for Unbiased Non-Coherent Over-the-Air Computation

无偏非相干空中计算的统一框架

Martin Dahl, Zheng Chen, Erik G. Larsson

AI总结 针对非相干空中计算(NC-OAC),提出一个包含数据-码字映射的三步框架,研究两种信道幅度知识下的无偏估计,并比较两种仿射映射,证明增强仿射映射的估计方差低一个数量级,同时提出性能更优的新映射。

详情
Comments
Accepted to IEEE Transactions on Communications
AI中文摘要

空中计算(OAC)通过利用无线多址信道的叠加特性,在大规模分布式系统中实现高效数据聚合。与大多数现有假设精确信道状态信息的OAC研究不同,我们考虑非相干OAC(NC-OAC),其中发射机未知信道相位。提出了一个包含源数据与码字映射的NC-OAC三步框架:1)设备将其数据编码为非负码字;2)设备发送幅度与码字成比例的符号序列,使得接收机能够估计码字和。在全局信道幅度知识(统计或瞬时)两种场景下研究码字和的估计;3)接收机将估计的码字和解码为所需的源数据和。利用所提框架,我们首先研究NC-OAC的先前工作并将其映射到该框架。接着,定义并比较NC-OAC中最常用的两种(通常隐含的)映射:仿射映射和增强仿射映射。在无偏估计约束下,我们证明对于均匀分布的数据和标准信道假设,增强仿射映射在统计和瞬时信道知识下均表现出比仿射映射低一个数量级的估计方差。该结果通过大量仿真得到验证。最后,我们提出并分析一种新映射,其性能优于前两种仿射映射。

英文摘要

Over-the-Air Computation (OAC) enables efficient data aggregation in large-scale distributed systems by exploiting the superposition property of wireless multiple-access channels. In contrast to most existing studies on OAC assuming exact channel state information, we consider non-coherent OAC (NC-OAC) where the channel phase is unknown at the transmitters. A three-step framework for NC-OAC with a mapping between source data and codewords is proposed: 1) Devices encode their data to non-negative codewords; 2) Devices transmit a sequence of symbols with amplitude proportional to their codewords, such that the receiver can estimate the codeword sum. Estimation of the codeword sum is studied under two scenarios of global channel amplitude knowledge: statistical or instantaneous; 3) The estimated codeword sum is decoded to the desired source data sum at the receiver. With the proposed framework, we first study prior work on NC-OAC and map these to the framework. Next, we define and compare the two most commonly (often implicitly) used mappings for NC-OAC: the Affine and the Augmented Affine mappings. Under the constraint of unbiased estimation, we show that with uniformly distributed data and standard channel assumptions, the Augmented Affine mapping exhibits an order of magnitude lower estimation variance than the Affine mapping with both statistical and instantaneous channel knowledge. This result is validated by extensive simulations. Finally, we propose and analyze a new mapping, which demonstrates superior performance over the previous two affine mappings.

2605.28434 2026-05-28 eess.SP

Experimental Characterization of a Multifunction X-Band AESA Radar Demonstrator

多功能X波段AESA雷达演示样机的实验表征

Francesco Mancuso, Giulio Meucci, Matteo Pardi, Giulio Giovannetti, Alberto Lupidi

AI总结 本文通过沿海现场实验,表征了一款紧凑型X波段有源电子扫描阵列(AESA)雷达演示样机在波达方向估计、自适应干扰抑制和高分辨率逆合成孔径雷达成像三种核心功能上的性能,验证了其适用于先进海上监视任务。

详情
Comments
6 pages, 7 figures. Accepted for publication in the Proceedings of the 2026 IEEE Radar Conference (RadarConf26)
AI中文摘要

现代海上监视需要能够在杂波和对抗环境中运行的多功能雷达系统。本文介绍了一款紧凑型X波段有源电子扫描阵列(AESA)雷达演示样机的实验表征。该系统在海军支援与实验中心(CSSN)及其专业机构G. Vallauri研究所的真实沿海现场环境中进行了评估,该机构在测试和评估运行中及开发中传感器的性能方面拥有历史专长,使用了真实海上目标和有源噪声干扰机。试验评估了三种核心功能:波达方向(DoA)估计、使用MVDR波束形成的自适应干扰抑制以及高分辨率逆合成孔径雷达(ISAR)成像。结果证实,该演示样机成功检测并定位目标,有效抑制高功率干扰,并生成非合作船只的清晰ISAR图像。这些发现验证了AESA演示样机的多功能性能,确认了其适用于先进海上监视应用。

英文摘要

Modern naval surveillance demands multifunction radar systems capable of operating in cluttered and contested environments. This paper presents the experimental characterization of a compact, X-band Active Electronically Scanned Array (AESA) radar demonstrator. The system was evaluated in a realistic coastal field environment at Naval Support and Experimentation Centre (CSSN) and, specifically, its specialized institute, the G. Vallauri Institute, which has historical expertise in testing and evaluating the performance of operational sensors as well as those under development, using real maritime targets and an active noise jammer. The trials assessed three core functions: direction-of-arrival (DoA) estimation, adaptive jammer suppression using MVDR beamforming, and high-resolution Inverse Synthetic Aperture Radar (ISAR) imaging. The results confirm that the demonstrator successfully detects and localizes targets, effectively suppresses high-power interference, and generates clear ISAR images of non-cooperative vessels. These findings validate the multifunction performance of the AESA demonstrator, confirming its suitability for advanced naval surveillance applications.

2605.28432 2026-05-28 eess.SP

Transformer-Based Heartbeat Monitoring with FMCW Radar Under Random Body Motion

基于Transformer的FMCW雷达心跳监测在随机身体运动下的研究

Matteo Pardi, Amir Hosein Oveis, Saba Kharabadze, Ajeet Kumar

AI总结 提出一种结合模型驱动信号处理与CNN-Transformer网络的混合框架,从77 GHz FMCW雷达数据中重构PPG信号,实现随机身体运动下的可靠心率与心率变异性估计。

详情
Comments
6 pages, 5 figures. Accepted for publication in the Proceedings of the 2026 IEEE Radar Conference (RadarConf26)
AI中文摘要

毫米波调频连续波(FMCW)雷达可实现非接触式心脏监测,但当呼吸和随机身体运动(RBM)扭曲雷达信号时,心跳估计变得具有挑战性。本文提出一种针对77 GHz FMCW雷达的混合框架,该框架结合了基于模型的信号处理与卷积神经网络(CNN)-Transformer网络。第一个模块从原始雷达数据中提取胸部位移并构建有意义的运动特征,第二个模块从提取的特征中重构类似光电容积描记(PPG)的信号。本研究中,同步PPG信号作为心跳监测的监督训练真值。该方法按照IEEE AESS雷达挑战问题I协议,使用官方数据集和性能指标在三种运动场景(静止、深呼吸和RBM)下进行评估。结果表明,所提出的架构在所有场景下都能可靠地重构PPG信号,在受控条件下实现高保真度,并在运动下保持稳健性能。即使在基准方法失效的情况下,也能实现可靠的平均心率(AHR)和心率变异性(HRV)估计,并在比较方法中获得最高总分。

英文摘要

Millimeter-wave Frequency Modulated Continuous Wave (FMCW) radar enables contactless cardiac monitoring, but heartbeat estimation becomes challenging when respiration and random body motion (RBM) distort the radar signal. In this paper, we propose a hybrid framework for 77 GHz FMCW radar that combines model-based signal processing with a Convolutional Neural Network (CNN)-Transformer network. The first block extracts chest displacement and constructs meaningful high-level motion features from raw radar data, while the second block reconstructs a photoplethysmography (PPG)-like signal from the extracted features. In this study, a synchronized PPG signal is used as the ground truth for heartbeat monitoring in supervised training. The method is evaluated following the IEEE AESS Radar Challenge Problem I protocol using the official datasets and figures of merit across three motion scenarios: stationary, deep breathing, and RBM. Results show that the proposed architecture reliably reconstructs the PPG signal in all scenarios, achieving high fidelity in controlled conditions and maintaining robust performance under motion. This enables reliable average heart rate (AHR) and heart rate variability (HRV) estimation even where benchmark methods fail, and leads to the highest total score among the compared approaches.

2605.28431 2026-05-28 eess.SP

A Unified Maximum-Likelihood Framework for 3D InISAR Phase Unwrapping with Outlier Rejection

一种用于3D InISAR相位解缠的统一最大似然框架及异常值剔除

Matteo Pardi, Francesco Mancuso, Elisa Giusti, Marco Martorella

AI总结 提出一种基于混合整数最小二乘理论的统一最大似然框架,用于3D InISAR相位解缠,无需空间连续性假设,并自然生成后验质量度量以剔除异常值。

详情
Comments
6 pages, 4 figures. Accepted for publication in the Proceedings of the 2026 IEEE Radar Conference (RadarConf26)
AI中文摘要

本文提出了一种用于三维干涉ISAR(3D InISAR)成像中相位解缠的新型数学框架。该方法基于逐散射体工作,不依赖任何空间连续性假设,因此适用于稀疏点云。该公式源自混合整数最小二乘(MILS)理论,这是一个在高斯噪声存在下联合估计整数和实数未知量的最优最大似然框架。它提供了一种统一的方式来处理通用传感器几何、多基线、多频率或混合设置。该方法还为每个解缠相位自然产生一个后验质量度量,可用于构建统计测试以剔除异常值。该算法易于实现,计算成本适合操作化系统。本文介绍了该框架的理论基础,并使用蒙特卡洛模拟在标准L形双频设置上进行了首次验证研究。结果表明,所提出的框架能够在具有挑战性的模糊条件下实现可靠的三维重建。

英文摘要

This paper presents a novel mathematical framework for phase unwrapping in three-dimensional interferometric ISAR (3D InISAR) imaging. The approach works on a scatterer-by-scatterer basis and does not rely on any spatial continuity assumptions, making it suitable for sparse point clouds. The formulation is derived from the Mixed-Integer Least Squares (MILS) theory, an optimal maximum-likelihood framework for joint estimation of integer and real unknowns in the presence of Gaussian noise. This provides a unified way to handle generic sensor geometries, multi-baseline, multi-frequency, or hybrid setups. The method also produces a natural a posteriori quality metric for each unwrapped phase, which can be used to build a statistical test to reject outliers. The algorithm is simple to implement and has a computational cost suitable for operational systems. This paper presents the theoretical foundations of the framework and a first validation study on a standard L-shaped dual-frequency setup using Monte Carlo simulations. Results show that the proposed framework enables reliable 3D reconstruction in challenging ambiguity conditions.

2605.28403 2026-05-28 eess.SP

A Gray-Box Approach for Decentralized Grid-Equivalent Model Identification

一种用于分散式电网等效模型辨识的灰盒方法

Sanjay Chandrasekaran, Florian Dörfler, Silvia Mastellone

AI总结 提出一种分散式频域辨识算法,通过解耦等效阻抗和等效电压的影响,利用约束最小二乘和卡尔曼滤波估计电网等效模型,并在5变流器互联系统中验证了其准确性和性能。

详情
AI中文摘要

我们提出了一种分散式频域辨识算法,从本地变流器的角度估计电网等效模型。由于多变流器设置中的本地电信号受到其他变流器(视为电网)电压输入的影响,估计单一视在阻抗会产生偏差和不准确的结果。为克服这一问题,我们设计了一个框架,将等效阻抗(被动)的影响与等效电压(主动)的影响解耦。然后,分别使用约束最小二乘和卡尔曼滤波算法,在频率样本上估计参数和等效电网电压。我们随后在一个以构网模式运行的互联5变流器系统中,在最小电压激励和非理想运行条件下,展示了我们算法的准确性和性能。

英文摘要

We propose a decentralized, frequency-domain identification algorithm that estimates the grid-equivalent model from the perspective of local converters. Since local electric signals in a multi-converter setup are affected by voltage inputs from other converters, considered as the grid, estimating a single apparent impedance yields biased and inaccurate results. To overcome this, we design a framework that decouples the effect of the equivalent impedance (passive) from that of the equivalent voltage (active). The parameters and equivalent grid voltages are then estimated using a constrained least squares and a Kalman filter algorithm, respectively, applied across frequency samples. We then demonstrate the accuracy and performance of our algorithm on an interconnected 5-converter system in grid-forming mode, with minimal voltage excitations and non-ideal operating conditions.

2605.28399 2026-05-28 eess.SY cs.IT cs.SY math.IT

Information Age-Controllability Trade-offs in Communication-Constrained Networks

通信受限网络中信息年龄-可控性权衡

Songita Das, Gourab Ghatak, Chen Quan, Geethu Joseph

AI总结 研究无线控制网络中可控性、信道接入与信息年龄之间的权衡,提出自适应接入概率策略,并推导出块可控性概率、峰值延迟和峰值信息年龄的闭式表达式。

详情
AI中文摘要

我们研究了无线控制网络中可控性、信道接入和年龄相关性能之间的权衡。控制器共享一个随机接入信道,在分时隙的块上向执行器传输控制输入。我们通过块可控性来衡量可靠控制,如果一个块包含所需数量的连续成功传输,则该块是可控的。同时,我们通过信息年龄来捕获信息的新鲜度。为了随时间有效分配信道资源,我们在块级别引入自适应接入概率,优先考虑尚未实现可控性的控制器。然后,我们推导出块可控性概率、块间连续成功之间的峰值延迟以及峰值信息年龄的闭式表达式。我们进一步定义了峰值控制延迟,即连续可控块之间的时间。最后,我们优化接入概率以联合平衡可控性和年龄相关指标。数值结果说明了所提出的自适应接入策略在管理干扰受限无线控制网络中这种权衡的有效性。

英文摘要

We investigate the trade-off between controllability, channel access, and age-related performance in a wireless network of control systems. Controllers share a random-access channel to transmit control inputs to actuators over slotted blocks. We measure reliable control via block controllability, where a block is controllable if it contains a required number of consecutive successful transmissions. In parallel, we capture information freshness via the age of information. To enable efficient allocation of channel resources over time, we introduce adaptive access probabilities at the block level, prioritizing controllers that have not yet achieved controllability. We then derive closed-form expressions for block controllability probability, the peak latency between inter-block consecutive successes, and peak age of information. We further characterize the peak control latency, defined as the time between consecutive controllable blocks. Finally, we optimize access probabilities to jointly balance controllability and age-related metrics. Numerical results illustrate the effectiveness of the proposed adaptive access policies in managing this trade-off in interference-limited wireless control networks.

2605.28361 2026-05-28 eess.SP

A Lightweight Method for Multiple Signal Direction Estimation with Adaptive Notch Filters

一种基于自适应陷波滤波器的多信号方向估计轻量方法

Burak Soner

AI总结 针对传统Capon波束形成在发射器数量超过接收天线数减一时性能下降的问题,提出一种仅使用两个接收天线、通过级联自适应陷波滤波器分离信号并利用Capon估计各信号方向的方法,计算成本低,在仿真中性能接近理想先验知识方法。

详情
Comments
Original article language is Turkish. This author version has the English-translated version before the original Turkish version. Accepted for presentation at SİU 2026. Final Turkish version to appear in IEEE Xplore
AI中文摘要

对于多信号检测和到达方向(DoA)估计,当发射器数量超过接收天线数减一时,传统的Capon波束形成性能下降。本文提出了一种轻量方法,使用仅有两个接收天线的自适应陷波滤波器(ANF),用于同时估计两个或更多窄带信号的DoA。级联的ANF阶段为每个信号形成隔离通道,Capon在每个通道上估计方向。该方法计算成本非常低,在仿真中,对于在时间、频率和角度上可分离的发射器,其性能接近于具有先验信号知识的理想方法。与ANF本身一样,频率间隔较近的多个分量难以分辨,这是该方法的主要局限性。仿真还辅以低成本软件定义无线电(ADALM-Pluto)上的实验实现。

英文摘要

For multi-signal detection and direction-of-arrival (DoA) estimation, conventional Capon beamforming degrades when there are more transmitters than receive antennas minus one. This paper proposes a lightweight method using adaptive notch filters (ANFs) with only two receive antennas for simultaneous DoA of two or more narrowband signals. Cascaded ANF stages form isolated channels per signal, and Capon estimates direction on each. The method has very low computational cost, and in simulation, for transmitters separable in time, frequency, and angle, performance approaches that of an oracle with prior signal knowledge. As with ANFs themselves, multiple components at closely spaced frequencies are poorly resolved, forming the main limitation of the proposed method. Simulations are complemented by an experimental implementation on a low-cost software-defined radio (ADALM-Pluto).

2605.28345 2026-05-28 cs.AI cs.LG eess.SP

Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains

Picid: 一种跨任务和领域的可复现PHM模块化评估基础设施

Lev Telyatnikov, Raffael Theiler, Leandro Von Krannichfeldt, Olga Fink

AI总结 提出模块化评估基础设施Picid,通过标准化数据契约和评估边界,实现跨任务、跨数据集的故障检测、诊断和预测的可复现与公平比较。

详情
AI中文摘要

预测与健康管理(PHM)领域的进展受到跨任务、数据集和应用领域缺乏标准化和可复用评估实践的阻碍。报告的结果往往难以复现和比较,因为关键协议选择(如数据划分、预处理、标签对齐、时间窗口和指标)通常是隐式的或临时实现的。我们引入了\picid,一个模块化评估基础设施,将PHM评估流程形式化为显式、可执行和可复现的协议。通过定义良好的抽象,\picid在保持对不同PHM设置的灵活性的同时,强制执行确定性、无泄漏的数据集构建。该框架通过统一接口支持故障检测、诊断和预测,并且可以扩展到新的数据集和模型类别,而不违反协议不变性。通过标准化数据契约和评估边界,\picid还实现了跨诊断(分类)和预测(回归)的公平任务比较,允许相同的模型系列在不同设置中一致地进行评估。我们通过对跨越电池、轴承、涡轮风扇发动机、液压系统、过滤系统和建筑的十二个数据集上的十三个模型进行实证评估来展示\picid。这项工作为PHM中标准化、公平和可复现的评估建立了可复用的基础。

英文摘要

Progress in Prognostics and Health Management (PHM) is hindered by the lack of standardized and reusable evaluation practices across tasks, datasets, and application domains. Reported results are often difficult to reproduce and compare, as key protocol choices, such as data splits, preprocessing, label alignment, temporal windowing, and metrics, are often implicit or implemented ad hoc. We introduce \picid, a modular evaluation infrastructure that formalizes the PHM evaluation pipeline as an explicit, executable, and reproducible protocol. Through well-defined abstractions, \picid enforces deterministic, leakage-safe dataset construction while remaining flexible across diverse PHM settings. The framework supports fault detection, diagnostics, and prognostics through a unified interface and can be extended to new datasets and model classes without violating protocol invariants. By standardizing data contracts and evaluation boundaries, \picid also enables fair cross-task comparisons across diagnostics (classification) and prognostics (regression), allowing identical model families to be evaluated consistently across heterogeneous settings. We demonstrate \picid through an empirical evaluation of thirteen models on twelve datasets spanning batteries, bearings, turbofan engines, hydraulics, filtration systems, and buildings. This work establishes a reusable foundation for standardized, fair and reproducible evaluation in PHM.

2605.28325 2026-05-28 cs.IT cs.CR cs.SY eess.SP eess.SY math.IT

ISAC Privacy: Challenges and Solutions for 6G

ISAC隐私:6G的挑战与解决方案

Onur Günlü, Stefano Tomasin, João P. Vilela, Francesco Chiti, Prajnamaya Dass, Angeliki Alexiou, Utz Roedig

AI总结 本文针对6G中集成感知与通信(ISAC)的隐私问题,将敏感数据分为位置环境、行为和生理三个感知层级,并据此讨论应用、挑战及解决方案。

详情
AI中文摘要

集成感知与通信(ISAC)是未来通信网络的一个有前景的特性。虽然空间感知可以提高网络性能并支持外部服务,但它也带来了超出通信内容机密性的隐私挑战。使用毫米波(mmWave)和亚太赫兹(THz)频率的未来网络可能收集或推断第六代(6G)部署区域内的人员、设备、旁观者、被动物体和环境的详细信息。这种感知可以揭示位置和环境数据,支持行为分析(如运动或活动识别),并在高级情况下暴露生理信息(如呼吸频率或心率相关数据)。因此,必须控制空间感知的能力以满足隐私要求。在这项工作中,我们将隐私敏感的ISAC数据组织为三个感知层级:位置和环境数据、行为数据以及生理数据,并将此分类作为全文的组织原则。基于此分类,我们讨论了内部和外部ISAC应用,识别了与同意、透明度、数据所有权、分析、旁观者暴露以及敏感感知数据相关的隐私挑战,回顾了代表性的解决方案方向,并概述了隐私保护ISAC的未来研究方向。

英文摘要

Integrated sensing and communication (ISAC) is a promising feature of future communication networks. While spatial sensing can improve network performance and enable external services, it also creates privacy challenges that go beyond the confidentiality of communication content. Future networks using millimeter-wave (mmWave) and sub-terahertz (THz) frequencies may collect or infer detailed information about people, devices, bystanders, passive objects, and environments in a sixth-generation (6G) deployment area. Such sensing can reveal location and environment data, support behavioral profiling such as movement or activity recognition, and, in advanced cases, expose physiological information such as breathing frequency or heart-rate-related data. Thus, the capabilities of spatial sensing must be controlled to satisfy privacy requirements. In this work, we organize privacy-sensitive ISAC data into three sensing levels: location and environment data, behavioral data, and physiological data, and use this classification as the organizing principle throughout the paper. Based on this classification, we discuss internal and external ISAC applications, identify privacy challenges related to consent, transparency, data ownership, profiling, bystander exposure, and sensitive sensing data, review representative solution directions, and outline future research directions for privacy-preserving ISAC.

2605.28254 2026-05-28 cs.RO cs.SY eess.SY math.DS

Natural Locomotion: Principle and Method

自然运动:原理与方法

Mirado Mortel, Luc Jaulin, Lionel Lapierre, Simon Rohou

AI总结 本文提出自然运动作为系统与环境约束或相互作用介导的运动交换原理,通过构建自然运动流形(NLM)并采用闭/开构造方法,在理想非完整无滑移系统上验证了该原理。

详情
Comments
Preprint. 20 pages, 7 figures
AI中文摘要

当机构利用被动动力学、柔顺性和共振而非跟踪预定轨迹时,机器人运动可以变得高效。本文将自然运动表述为一种交换原理,适用于运动由环境约束或相互作用介导的系统。当内部振荡器周期性返回、身体姿态漂移且平均推进-振荡器交换功率(POE功率)在一个周期内为零时,运动是自然的。所选族是自然运动流形(NLM)。我们针对连续理想环境约束发展了该原理的保守实现:约束不做外部功,总机械能守恒,零平均POE功率是与环境介导的推进通道的内部交换,而非外部能量输入。该方法是一种闭/开构造。首先关闭推进通道以揭示有效的内部振荡器,该振荡器由一个有效自由度中的标量作用-角结构或多个自由度中的非线性模态扇区组织。然后重新打开通道,重建姿态,接受的周期必须保持内部递归和零平均POE功率。我们在两个理想非完整无滑移系统上演示了该原理:一个Chaplygin雪橇/摆驱动小车和一个三体扩展。在标量情况下,POE闭合等价于缺失的内部返回条件,从而给出NLM族的定理支持计算。在多自由度情况下,POE闭合仍然是必要的,但必须由模态恒等性、内部返回、动力学一致性、相同的固定被动架构和非零位移来补充。自然运动成为一个设计问题:哪些被动架构支持零个、一个或多个经过认证的NLM族?

英文摘要

Robotic locomotion can become efficient when mechanisms exploit passive dynamics, compliance, and resonance rather than track prescribed trajectories. This paper formulates natural locomotion as an exchange principle for systems whose motion is mediated by environmental constraints or interactions. A motion is natural when an internal oscillator returns periodically, the body pose drifts, and the mean Propulsion--Oscillator Exchange power (POE power) vanishes over one cycle. The selected family is a Natural Locomotion Manifold (NLM). We develop the conservative realization of this principle for continuous ideal environmental constraints: the constraints do no external work, total mechanical energy is conserved, and zero mean POE power is an internal exchange with the environment-mediated propulsive channel, not external energy input. The method is a closed/open construction. The propulsive channel is first closed to reveal an effective internal oscillator, organized by scalar action-angle structure in one effective degree of freedom or by nonlinear modal sectors in several degrees of freedom. The channel is then reopened, pose is reconstructed, and accepted cycles must preserve internal recurrence and zero mean POE power. We demonstrate the principle on two ideal nonholonomic no-slip systems: a Chaplygin-sleigh / pendulum-driven car and a three-body extension. In the scalar case, POE closure is equivalent to the missing internal return condition, giving a theorem-backed computation of the NLM family. In the multi-degree case, POE closure remains necessary but must be completed by modal identity, internal return, dynamics consistency, same fixed passive architecture, and nonzero displacement. Natural locomotion becomes a design question: which passive architectures support no, one, or several certified NLM families?

2605.28252 2026-05-28 eess.SY cs.SY

Digital-Based Potentiostat and Mesoporous Microelectrode Co-Design for Non-Enzymatic Glucose Detection at 0.3V-VDD and 1.65nW-Power

基于数字的恒电位仪与介孔微电极协同设计用于0.3V电源电压和1.65nW功率的非酶葡萄糖检测

Andrea De Gregorio, Mara Serrapede, Danilo Kaddouri, Paolo Angelini, Giuseppe Bruno, Simone Luigi Marasso, Salvatore Guastella, Andrea Lamberti, Paolo Crovetti

AI总结 提出一种超低电压、超低功耗的数字基恒电位仪与介孔微电极协同设计的非酶葡萄糖检测芯片,在130nm CMOS工艺下实现,首次通过等效线性化模型在频域分析信号传输和噪声特性,实现了600pA至650nA宽电流范围检测,功耗低至1.65nW,并在生理水平成功检测葡萄糖。

详情
AI中文摘要

本文提出了一种概念验证的超低电压和超低功率计时安培电化学传感器,用于130nm CMOS检测中的非酶葡萄糖读出集成电路(IC),其特点是可重构的数字基(DB)恒电位仪。通过等效线性化模型首次在频域中解析描述了新数字基架构的信号传输和噪声特性,该模型通过仿真和实验验证。基于实验,所提出的DB恒电位仪能够检测从600pA到650nA的宽电化学电流范围,线性度R²=0.991,在Vdd=300mV(Vdd=500mV)时仅消耗1.65nW(53.5nW)。所提出的DB读出在概念验证平台上与纳米结构微电极一起用于非酶葡萄糖检测,在最低报告电压和功率下成功实现了生理水平的非酶葡萄糖检测,即使在干扰物(抗坏血酸)存在和有氧条件下也是如此,从而揭示了新兴即时诊断(PoC)应用的巨大潜力。

英文摘要

This paper presents a proof-of-concept ultra-low voltage and ultra-low power chronoamperometric electrochemical sensor for non-enzymatic glucose readout integrated circuit (IC) in 130nm CMOS detection featuring a reconfigurable Digital-Based (DB) Potentiostat. The signal transfer and noise characteristics of the new digital-based architecture are analytically described in the frequency domain for the first time by an equivalent linearized model that is validated by simulations and experiments. Based on experiments, the proposed DB potentiostat enables the detection of a wide electrochemical current range, spanning from 600pA to 650nA, with R2=0.991 linearity and consumes only 1.65nW (53.5nW) at V dd = 300mV (V dd = 500mV). The proposed DB readout is tested in a proof of-concept platform for non-enzymatic glucose detection with nanostructured microelectrodes, demonstrating successful non enzymatic glucose detection at physiological levels at the lowest reported voltage and power, even in the presence of an interferent (ascorbic acid) and under aerobic conditions, thus revealing a strong potential for emerging Point of Care (PoC) diagnostics applications.

2605.28191 2026-05-28 eess.SP

Spatiotemporal Tracking in Cooperative ISAC Networks: A Stochastic Geometry Framework

协作ISAC网络中的时空跟踪:随机几何框架

Bowen Wang, Nanxi Li, Jingzhou Wu, Zheng Jiang, Jianchi Zhu

AI总结 采用随机几何框架研究集成感知与通信网络中的连续目标跟踪,通过动态聚类模型克服静态聚类缺陷,实现线性密度扩展和数量级容量提升。

详情
AI中文摘要

我们采用随机几何框架研究集成感知与通信(ISAC)网络中的连续目标跟踪,其中基站位置建模为泊松点过程。单基站分析表明,天线能量守恒恒等式迫使平均基站间耦合增益为1,使得密集化成为单站感知的天线不可约负担,而首次通过时间分析揭示了依赖于目标距离的波束宽度陷阱。这些发现排除了密集化下的单站跟踪,从而促使多基站协作处理。静态聚类协作平均跟踪寿命表现出尖锐的渗流相变,导致感知容量上限在临界宏密度以上饱和。然而,静态聚类理想化本身歪曲了现代网络部署,其中协作聚类随着目标漂移而动态重新选择;因此,我们通过动态聚类模型消除了这一假设,该模型将K近邻切换映射为具有随机重置的二维布朗运动,并得到了动态平均跟踪寿命的贝塞尔函数闭式解,该解在任何正切换率下消除了相变。在每链路可靠性下限下,动态聚类框架在整个现实6G范围内保持了经典线性密度缩放,并在小小区密度下实现了数量级的容量提升。蒙特卡洛模拟证实了所有理论预测。

英文摘要

We adopt a stochastic-geometry framework to study continuous target tracking in integrated sensing and communication (ISAC) networks, with base-station locations modelled as a Poisson point process. The single-BS analysis shows that the antenna energy-conservation identity forces the mean inter-BS coupling gain to unity, making densification an antenna-irreducible liability for monostatic sensing, while a first-passage-time analysis reveals a target-distance-dependent beamwidth trap. These findings rule out single-BS tracking under densification, motivating a multi-BS cooperative treatment. The static-cluster cooperative mean tracking lifetime is then shown to exhibit a sharp percolation phase transition, with the resulting sensing-capacity ceiling saturating above a critical macro density. Yet the static-cluster idealisation itself misrepresents modern network deployments, where the cooperating cluster is dynamically re-selected as the target drifts; we therefore lift this assumption with a dynamic clustering model that maps the $K$-nearest-neighbour handover onto a 2D Brownian motion with stochastic resetting, and obtain a Bessel-function closed form for the dynamic mean tracking lifetime that dissolves the phase transition under any positive handover rate. With a per-link reliability floor, the dynamic clustering framework preserves classical linear density scaling throughout the realistic 6G regime and delivers an order-of-magnitude capacity lift at small-cell densities. Monte-Carlo simulations corroborate all theoretical predictions.

2605.28182 2026-05-28 eess.SP

Cross-Predictive Sparse Bayesian Learning with Application to XL-MIMO Channel Estimation

交叉预测稀疏贝叶斯学习及其在XL-MIMO信道估计中的应用

Arttu Arjas, Italo Atzeni

AI总结 提出交叉预测稀疏贝叶斯学习(CP-SBL),通过最小化随机交叉预测目标替代似然最大化来学习稀疏诱导权重,用于近场XL-MIMO信道估计,在多种条件下均优于传统SBL。

详情
Comments
To be presented at the European Signal Processing Conference (EUSIPCO) 2026
AI中文摘要

准确的信道估计是超大规模多输入多输出(XL-MIMO)系统的一个关键要求。稀疏贝叶斯学习(SBL)是一个成熟的利用信道稀疏性的框架,但其性能依赖于参数先验假设和基于边际似然的超参数优化,这可能对噪声、有限的导频观测和模型失配敏感。在这项工作中,我们提出了 extit{交叉预测SBL(CP-SBL)},一种数据驱动的SBL变体,其中通过最小化随机交叉预测目标而非似然最大化来学习稀疏诱导权重。所提出的方法保留了SBL的层次贝叶斯结构,同时用来自随机数据分割的预测一致性准则替代了参数先验学习。近场XL-MIMO信道估计的数值结果表明,CP-SBL在广泛的信噪比、导频长度、天线数量和传播路径数量范围内,始终比基线SBL实现更低的归一化均方误差,且复杂度相当,无需手动超参数调整。

英文摘要

Accurate channel estimation is a key requirement in extremely large-scale multiple-input multiple-output (XL-MIMO) systems. Sparse Bayesian learning (SBL) is a well-established framework for exploiting channel sparsity, but its performance depends on parametric prior assumptions and hyperparameter optimization based on marginal likelihood, which may be sensitive to noise, limited pilot observations, and model mismatch. In this work, we propose \textit{cross-predictive SBL (CP-SBL)}, a data-driven variant of SBL in which the sparsity-inducing weights are learned by minimizing a randomized cross-predictive objective rather than through likelihood maximization. The proposed method preserves the hierarchical Bayesian structure of SBL while replacing parametric prior learning with a predictive consistency criterion derived from random data splitting. Numerical results for near-field XL-MIMO channel estimation show that CP-SBL consistently achieves lower normalized mean squared error than the baseline SBL across a wide range of signal-to-noise ratios, pilot lengths, numbers of antennas, and numbers of propagation paths, with comparable complexity and without requiring manual hyperparameter tuning.

2605.28180 2026-05-28 eess.SP

Tensor Train Decomposition Based Noise Reduction and Enhanced Parameter Estimation for FMCW MIMO Radar Systems

基于张量列分解的FMCW MIMO雷达系统降噪与增强参数估计

Luoyan Zhu, Sergiy A. Vorobyov, Jie Wang, Yinsheng Liu, Zhangdui Zhong

AI总结 针对FMCW MIMO雷达在低信噪比下混合噪声干扰问题,提出基于张量列分解的框架,利用低秩结构和多维相关性进行噪声子空间分离,并通过数据平滑和旋转子空间算法实现距离、速度和角度参数的联合估计,显著提升信噪比和参数估计精度。

详情
AI中文摘要

调频连续波(FMCW)雷达因其高分辨率目标定位和速度估计能力,广泛应用于自动驾驶和工业检测。然而,汽车应用中大量连接设备引入电磁干扰,并因混合噪声污染导致的低信噪比(SNR)问题给位置感知服务带来挑战。传统的基于矩阵的信号处理方法在处理低信噪比条件下的高阶信号时性能下降。为解决这一挑战,本文提出一种基于张量分解的框架,用于FMCW多输入多输出(MIMO)雷达系统中四维信号的联合降噪和参数估计。具体而言,该框架通过张量列分解利用接收信号固有的低秩结构和多维相关性,有效分离噪声子空间。然后,数据平滑处理器重构增广信号张量,以解决相干信号引起的秩亏问题。最后,采用增强旋转子空间算法,通过利用对恢复信号的结构拟合,联合解耦距离、速度和角度参数。在真实噪声环境下的仿真和现场实验表明,所提框架在提高目标信噪比和参数估计精度的同时,实现了显著的降噪效果。这些进步使得所提框架成为动态、噪声污染环境中高精度MIMO FMCW雷达应用的鲁棒解决方案。

英文摘要

Frequency modulated continuous wave (FMCW) radar is widely used in autonomous driving and industrial inspection due to its high-resolution target location and velocity estimation capability. However, the plethora of connected devices in automotive applications introduces electromagnetic interference and brings challenges to location-aware services, primarily due to the issue of low signal-to-noise ratio (SNR) caused by mixed noise contamination. Conventional matrix-based signal processing methods exhibit performance deterioration when handling higher-order signals under low SNR conditions. To address this challenge, this paper proposes a tensor decomposition-based framework that jointly performs noise reduction and parameter estimation for four-dimensional signals in FMCW multiple-input multiple-output (MIMO) radar systems. Specifically, the framework exploits the inherent low-rank structure and multidimensional correlations of the received signals through tensor train decomposition to effectively separate noise subspace. A data smoothing processor then reconstructs an augmented signal tensor to resolve rank deficiency caused by coherent signals. Finally, an enhanced rotational subspace algorithm is employed to jointly decouple the distance, velocity, and angle parameters by exploiting the structural fitting to the restored signal. Both simulation and field experiments under real-world noise demonstrate that our proposed framework achieves significant noise reduction while improving target SNR and parameter estimation accuracy. These advancements make the proposed framework a robust solution for high-precision MIMO FMCW radar applications in dynamic, noise-polluted environments.

2605.26875 2026-05-28 eess.SP

G-iMUSIC: Greedy Iterative MUSIC Algorithms for Multi-Target DoA Estimation

G-iMUSIC:用于多目标DoA估计的贪婪迭代MUSIC算法

Martin Willame, Gilles Monnoyer, François Horlin, Jérôme Louveaux

AI总结 提出两种贪婪迭代MUSIC算法(OMP-iMUSIC和OLS-iMUSIC),通过统一子空间与贪婪估计的框架,在仅需一次初始特征分解且避免逐次迭代特征分解的同时,实现比传统OMP、OLS和MUSIC更好的检测精度和更低的处理时间。

详情
Comments
12 pages; This work has been submitted to the IEEE for possible publication
AI中文摘要

本文提出了用于阵列信号处理中多目标波达方向(DoA)估计的新算法。尽管最大似然估计器(MLE)渐近达到Cramér-Rao界,但其指数复杂度促使了实用替代方案的出现,例如贪婪方法或子空间方法。在此背景下,贪婪方法如正交匹配追踪(OMP)和正交最小二乘(OLS)对早期选择误差敏感,特别是对于角度接近的目标,而子空间方法如多信号分类(MUSIC)具有角度超分辨能力,但在强目标间信号相关性下性能下降。为克服这些限制,我们提出了两种贪婪迭代MUSIC(G-iMUSIC)算法,即OMP-iMUSIC和OLS-iMUSIC,它们源自一个连接子空间和贪婪估计的统一框架。与先前的iMUSIC方法不同,所提方法仅需一次初始特征值分解(EVD),并避免在每次迭代中计算特征分解。它们还允许对均匀线性阵列(ULA)采用快速傅里叶变换(FFT)加速实现,从而实现低复杂度操作。蒙特卡洛模拟表明,与传统OMP、OLS和MUSIC相比,所提方法具有改进的检测和精度,并且与贪婪基线相比减少了处理时间。最后,我们引入了诊断指标,用于解释跨信号相关性和角度接近性区域的性能,支持推广到所考虑的正交频分复用(OFDM)雷达场景之外。

英文摘要

This paper presents novel algorithms for multi-target direction-of-arrival (DoA) estimation in array signal processing. Although the maximum likelihood estimator (MLE) asymptotically attains the Cramér-Rao bound, its exponential complexity motivates practical alternatives, such as greedy or subspace-based methods. In this context, greedy methods such as orthogonal matching pursuit (OMP) and orthogonal least squares (OLS) are sensitive to early selection errors, especially for angularly proximate targets, whereas subspace-based methods such as multiple signal classification (MUSIC) present angular super-resolution capabilities but degrade under strong inter-target signal correlation. To overcome these limitations, we propose two greedy iterative MUSIC (G-iMUSIC) algorithms, namely OMP-iMUSIC and OLS-iMUSIC, derived from a unified framework that links subspace and greedy estimations. Unlike prior iMUSIC approaches, the proposed methods require only one initial eigen value decomposition (EVD) and avoid computing eigendecomposition at each iteration. They also admit Fast Fourier Transform (FFT)-accelerated implementations for uniform linear arrays (ULAs), enabling low-complexity operation. Monte Carlo simulations demonstrate improved detection and precision over conventional OMP, OLS, and MUSIC, as well as reduced processing time compared to greedy baselines. Finally, we introduce diagnostic metrics that interpret performance across signal correlation and angular proximity regimes, supporting generalization beyond the specific orthogonal frequency-division multiplexing (OFDM) radar scenario considered.

2605.28143 2026-05-28 cs.LG cs.IT eess.SP math.IT

Sequential Neural Probabilistic Amplitude Shaping: Learning the Channel's Language

序列神经概率幅度整形:学习信道的语言

Mohammad Taha Askari, Lutz Lampe, Amirhossein Ghazisaeidi

AI总结 提出首个考虑所有实现损耗的神经概率幅度整形方法,采用无块、易于实现的序列自回归编码器与算术分布匹配,降低速率损失并提高可达信息率。

详情
Comments
4 pages, 2 figures, Submitted to the 52nd European Conference on Optical Communications
AI中文摘要

我们提出了首个神经概率幅度整形方法,它在考虑所有实现损耗的情况下优于现有方法,采用无块、易于实现的序列自回归编码器,与算术分布匹配兼容,从而降低了速率损失并提高了可达信息率。

英文摘要

We present the first neural probabilistic amplitude shaping that outperforms existing methods while accounting for all implementation losses, using a block-less, easily implementable sequential autoregressive encoder compatible with arithmetic distribution matching, yielding reduced rate loss and higher achievable information rates.

2605.28064 2026-05-28 eess.AS cs.AI cs.HC

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

我听见,故我信任:人类作为合成语音检测器的社会技术研究

Lelia Erscoi, Tomi Kinnunen

AI总结 通过定位任务实验,研究人类在感知和语境中检测语音深度伪造的能力,发现话语类别是检测准确性和感知质量的主要决定因素,信任线索无主效应但影响检测行为,完全合成语音的检测低于随机水平。

详情
Comments
To be included in Odyssey 2026: The Speaker and Language Recognition Workshop, Session 4.2, 23-26 June, Lisbon, Portugal
AI中文摘要

自动深度伪造检测已受到大量研究关注,然而人类实际遇到合成语音的社会技术环境仍知之甚少。我们将语音深度伪造检测作为感知和语境过程进行研究,呈现一个定位任务,其中47名参与者在三种操纵的信任线索下(指导框架、情感启动和来源标注)标记真实、完全合成和部分合成话语中的疑似合成片段。参与者提供了关于机械性、表现力、可懂度、清晰度、平静度和评估自信度的质量评分。话语类别是检测准确性和感知质量的主要决定因素;信任线索未产生主效应,但激发了检测行为。完全合成语音的检测低于随机水平。质量评分与话语类型相关,表明在显性检测失败时存在隐性区分。

英文摘要

Automatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually encounter synthetic speech remains poorly understood. We investigate voice deepfake detection as a perceptual and contextual process, presenting a localization task in which 47 participants marked suspected synthetic segments across authentic, fully synthetic, and partially synthetic utterances under three manipulated trust cues: instructional framing, affective priming, and provenance labeling. Participants provided quality ratings on mechanicalness, expressiveness, intelligibility, clarity, calmness, and confidence of evaluation. Utterance class was the primary determinant of detection accuracy and perceptual quality; trust cues produced no main effects but motivated detection behavior. Fully synthetic speech was detected at below-chance levels. Quality ratings tracked utterance type, indicating implicit discrimination where overt detection failed.