arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 115
2606.19203 2026-06-18 eess.AS 新提交

DASH: Dual-View Self-Distillation with Multi-Layer Hidden Representations for Robust Speech Recognition

DASH: 基于多层隐藏表示的双视角自蒸馏用于鲁棒语音识别

Jaeeun Baik, Ui-Hyeop Shin, Jiwoon Lee, Woocheol Jeong, Hyung-Min Park

AI总结 提出DASH自蒸馏框架,通过双视角学习干净-噪声一致性,从多层编码器蒸馏隐藏表示并最小化原型分配分布的KL散度,在保持干净准确率的同时提升噪声鲁棒性,额外开销仅约微调时间的4%。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

自动语音识别(ASR)在现实噪声环境中常常性能下降,因此噪声鲁棒性对于部署至关重要。有监督的噪声增强微调是一种常见的补救措施,但它可能引入鲁棒性与干净性能之间的权衡,并过度拟合特定噪声,导致干净条件下的识别性能下降。我们提出了DASH,一种自蒸馏框架,通过从配对视图中学习干净-噪声一致性来提高鲁棒性。DASH从多个编码器层蒸馏隐藏表示,以捕获从低级声学到高级语义的特征,并通过最小化干净视图和噪声视图的原型分配分布之间的KL散度来稳定训练。在LibriSpeech上的实验表明,DASH在保持干净准确率的同时,在各种噪声条件下持续提高识别性能,这是通过在标准微调之外增加一个无标签的预训练阶段实现的,额外开销极小(约为微调时间的4%)。

英文摘要

Automatic Speech Recognition (ASR) often degrades in real-world noisy environments, making noise robustness essential for deployment. Supervised noise-augmented fine-tuning is a common remedy, but it can introduce a robustness-clean trade-off and overfit to specific corruptions, degrading recognition in clean conditions. We propose DASH, a self-distillation framework that improves robustness by learning clean--noisy consistency from paired views. DASH distills hidden representations from multiple encoder layers to capture features from low-level acoustics to high-level semantics, and stabilizes training by minimizing KL divergence between prototype assignment distributions of clean and noisy views. Experiments on LibriSpeech show that DASH consistently improves recognition under diverse noisy conditions while preserving clean accuracy, achieved by a label-free pre-training stage with minimal additional overhead (about 4% of fine-tuning time) beyond standard fine-tuning.

2606.19182 2026-06-18 eess.IV 新提交

Optimized Multi-Contrast Self-Supervised MRI Reconstruction using Learned k-space Partitioning

使用学习型k空间划分的优化多对比度自监督MRI重建

Brenden Kadota, Charles Millard, Mark Chiew

AI总结 提出一种多对比度自监督学习框架,通过端到端学习最优k空间数据划分,无需全采样数据即可提升MRI重建质量。

详情
AI中文摘要

目的:深度学习在通过从欠采样数据重建高质量图像来加速MRI方面显示出前景。虽然最近的工作利用多对比度信息来提高重建性能,但这些方法依赖于监督学习,需要全采样k空间进行训练。一种方法,通过数据欠采样的自监督学习(SSDU),通过将k空间划分为两个集合,并在两者之间进行网络映射,从而能够直接在欠采样k空间上进行训练。在这项工作中,我们通过两项修改改进了MRI自监督重建。方法:我们提出了一个多对比度自监督学习框架,该框架联合训练多个欠采样对比度,无需全采样k空间数据作为参考。此外,我们以端到端的方式为每个对比度学习最优的自监督数据划分,进一步提高了重建质量。具体来说,我们学习一个最优的划分概率分布,对其进行采样以生成用于划分的掩码。结果:在两个公开可用的多对比度MRI数据集上的实验表明,与当前的单对比度自监督学习方法相比,我们提出的自监督多对比度学习划分方法提高了重建质量。我们还证明了学习k空间数据的划分进一步增强了重建的保真度。结论:多对比度重建与学习划分相结合,比单对比度自监督MRI重建提高了重建保真度。意义:与之前的自监督方法相比,我们的方法可以实现更高的图像保真度和/或加速MRI协议时间,并且无需全采样k空间进行训练。

英文摘要

Objective: Deep Learning has shown promise in accelerating MRI by reconstructing high-quality images from under-sampled data. While recent work has leveraged multi-contrast information to improve reconstruction performance, these methods rely on supervised learning, which requires fully sampled k-space for training. One method, self-supervised learning via data undersampling (SSDU), enables direct training on under-sampled k-space by partitioning it into two sets, with a network mapping between the two. In this work, we improve MRI self-supervised MRI reconstruction with two modifications. Methods: We propose a multi-contrast self-supervised learning framework that jointly trains on multiple under-sampled contrasts without requiring fully sampled k-space data as a reference. Moreover, we learn an optimal self-supervised data partitioning for each contrast in an end-to-end manner, further enhancing reconstruction quality. Specifically, we learn an optimal partitioning probability distribution, which is sampled to generate a mask for partitioning. Results: Experiments on two publicly available multi-contrast MRI datasets demonstrate the improved reconstruction quality of our proposed self-supervised multi-contrast learned partitioning method compared to the current single-contrast self-supervised learning methods. We also demonstrate that learning the partitioning of k-space data further enhances the fidelity of reconstructions. Conclusion: Multi-contrast reconstruction combined with learned partitioning improves reconstruction fidelity over single-contrast self-supervised MRI reconstructions. Significance: Our method can facilitate higher image fidelity and/or accelerated MRI protocol times compared to previous self-supervised methods, and without requiring fully sampled k-space for training.

2606.19102 2026-06-18 eess.SP 新提交

Decentralized Power Control for Over-the-Air Computation with Phase Noise

含相位噪声的空中计算去中心化功率控制

Martin Dahl, Erik G. Larsson

AI总结 针对空中计算中信道估计仅本地可用的问题,提出基于截断信道反转的分布式功率控制方案,给出近似闭式解和精确数值解法,证明均方误差与接收天线数无关,并揭示其与聚合相位误差的关系。

Comments SPAWC 2026

详情
AI中文摘要

相干空中计算(OAC)需要上行信道估计。当使用校准互易性进行信道估计时,估计值仅对设备本地可用。这对预编码和解码构成了挑战,因为无法集中协调。为此,我们使用截断信道反转(TCI),并提出了一个近似闭式解和一个精确数值求解器来优化TCI参数。重要的是,我们证明了所提出的TCI方案在均方误差(MSE)方面与接收天线数量无关。此外,我们的分析揭示了MSE与设备间预期聚合相位误差之间的明确联系,这有助于理解OAC的可扩展性。最后,与先前工作中使用全局可用无误差信道估计的参考方法进行的仿真比较表明,所提出的方法在某些条件下甚至优于这些参考方法的MSE。

英文摘要

Estimation of uplink channels is required for coherent over-the-air computation (OAC). When channel estimation is done using calibrated reciprocity, the estimates are only available locally to the devices. This poses a challenge for precoding and decoding, which cannot be coordinated centrally. To this end we use truncated channel inversion (TCI) and propose an approximate closed form solution and an exact numerical solver to optimize the TCI parameters. Importantly, we prove that the proposed TCI scheme is independent of the number of receiver antennas in terms of mean-square-error (MSE). Furthermore, our analysis reveals a clear connection between the MSE and expected aggregate phase error across devices which gives insight to the scalability of OAC. Finally, simulations with comparisons to reference methods from prior work with globally available error-free channel estimates show that proposed is close, even outperforming these references in MSE under some conditions.

2606.19010 2026-06-18 eess.SP 新提交

Channel Charting With Physical Channel Fingerprints For Massive MIMO-OFDM Channel Acquisition

基于物理信道指纹的大规模MIMO-OFDM信道获取的信道图构建

Jinke Tang, Xiqi Gao, Li You, Xiang-Gen Xia, Cheng-Xiang Wang

AI总结 提出基于物理信道指纹的信道图构建方法,利用簇几何随机信道模型提取参数,实现大规模MIMO-OFDM系统中信道状态信息的高效获取,并用于波束域统计信道估计,性能接近传统在线探测方法。

Comments 15 pages, 9 figures

详情
AI中文摘要

6G移动通信和定位技术的发展增强了位置感知工具的重要性,如位置索引信道指纹(CFs)和信道图,它们正成为大规模MIMO-OFDM系统的关键使能技术。本文提出一种基于物理CFs(PCFs)的新型信道图构建方法,并展示其在信道状态信息(CSI)获取中的有效性。首先,基于簇几何随机信道模型(GBSM)定义PCF,使其能够用紧凑参数集全面表示物理信道特性。然后,开发了大规模MIMO-OFDM系统中PCF获取的方法。通过利用PCF与空频时(SFT)域信道的关系,所提方法从多位置信道测量中提取PCF,并构建具有位置索引PCF的结构化信道图。此外,我们提出一种低复杂度算法,利用信道图中的PCF获取波束域统计CSI(sCSI)。所得sCSI可直接用作信道估计的先验信息。仿真结果表明,所提方法提供的sCSI性能与传统在线探测技术相当,且生成的sCSI可作为可靠先验知识显著提升信道估计精度。这些结果验证了所提PCF作为下一代移动通信信道获取和系统设计的强大且多功能工具。

英文摘要

The advancement of 6G mobile communication and positioning technologies has amplified the significance of location-aware tools, such as location-indexed channel fingerprints (CFs) and channel charting, which are becoming key enablers for massive MIMO-OFDM systems. In this paper, we propose a novel channel charting with physical CFs (PCFs) and demonstrate its effectiveness in channel state information (CSI) acquisition. First, we define the PCF based on a cluster-based geometric stochastic channel model (GBSM), enabling a comprehensive representation of physical channel characteristics using a compact set of parameters. We then develop a methodology for PCF acquisition in massive MIMO-OFDM systems. By exploiting the relationship between PCFs and the space-frequency-time (SFT) domain channel, the proposed method extracts PCFs from multi-location channel measurements and constructs a structured channel charting with location-indexed PCFs. Furthermore, we propose a low-complexity algorithm to acquire beam domain statistical CSI (sCSI) using the PCFs in the channel charting. The resulting sCSI can be directly employed as prior information for channel estimation. Simulation results show that the proposed method delivers sCSI performance comparable to traditional online probing techniques, and the generated sCSI can serve as reliable prior knowledge to significantly enhance the accuracy of channel estimation. These results validate the proposed PCF as a powerful and versatile tool for channel acquisition and system design of the next-generation mobile communication.

2606.18985 2026-06-18 eess.AS 新提交

SingFox: A Multi-Lingual Singfake Detection Corpus

SingFox: 多语言歌唱深度伪造检测语料库

Arth J. Shah, Devanshi K. Trivedi, Himanshi U. Borad, Hemant A. Patil

AI总结 为应对歌唱深度伪造检测中的语言多样性和伪造方法多样性挑战,构建了包含20种语言、11.3万音频片段的大规模多语言语料库SingFox,并设计了六个评估轨道,实验表明跨数据集最高准确率达77.84%。

Comments Accepted at INTERSPEECH 2026

详情
AI中文摘要

在这项工作中,我们介绍了SingFox,这是一个全面且大规模的数据集,专门设计用于支持歌唱深度伪造检测和源追踪系统的稳健评估。SingFox分为六个不同的轨道(T1--T6),每个轨道针对一种独特的新颖性形式,涵盖从语言多样性(全球和印度)到特定类型音乐和替代伪造生成方法。该数据集包含超过113,802个音频片段,涵盖20种语言,总计超过126.32小时的音频数据,并包含1,150位歌手。每个轨道旨在模拟真实场景,并评估模型在不同条件下的可靠性,从而评估其稳健性。SingFox旨在通过为歌唱深度伪造检测任务和源验证任务(模型可解释性)提供可靠的基准,促进可重复性并加速歌唱深度伪造检测的研究。实验结果显示,在跨数据集评估设置中最高准确率为77.84%。所有重现数据集所需的代码和资源均可在此https URL公开获取。

英文摘要

In this work, we introduce SingFox, a comprehensive and large-scale dataset specifically designed to support robust evaluation of singing deepfake detection and source tracing systems. SingFox is divided into six distinct tracks (T1--T6), each targeting a unique form of novelty, ranging from language diversity (global and Indian) to genre-specific music and alternative fake generation methods. The dataset encompasses over 113,802 audio clips across 20 languages, totaling more than 126.32 hours of audio data and featuring 1,150 singers. Each track is designed to emulate real-world scenarios and evaluate how reliably models perform under different conditions, thereby assessing their robustness. SingFox aims to foster reproducibility and accelerate research in singing deepfake detection by providing a reliable benchmark for both the singfake detection task and the source verification task (model explainability). Experimental results show a highest accuracy of 77.84\% in cross-dataset evaluation settings. All code and resources required to reproduce the dataset are publicly available at https://github.com/Arth-Shah/SingFox.

2606.18968 2026-06-18 eess.AS 新提交

Audio-to-Audio via Diffusion Warm Initialization

通过扩散热初始化实现音频到音频的转换

Cristóbal Andrade, Sebastian J. Schlecht

AI总结 提出扩散热初始化方法,用于音色迁移、MIDI到真实合成及音频增强等任务,通过选择合适初始化时间$t_\ ext{init}$,单个预训练扩散模型即可支持多种转换目标,无需任务特定训练。

详情
AI中文摘要

在本文中,我们提出扩散热初始化作为一种简单而有效的方法,用于一系列音频到音频的转换任务。为了说明该方法的通用性,我们展示了其在音色迁移、MIDI到真实合成以及多种音频增强任务中的应用。我们对音色迁移进行了详细的实证分析,以研究初始化时间$t_\ ext{init}$的作用。使用基于音高的Jaccard距离和Fréchet音频距离评估$t_\ ext{init}$的效果,以量化对输入信号的忠实度和与目标分布的对齐程度。我们的结果为选择$t_\ ext{init}$提供了实用指导,并表明一旦适当选择,单个预训练扩散模型结合热初始化即可支持多种转换目标,无需任务特定训练或条件化。尽管方法简单,但与专门为这些任务设计的更复杂流程相比,该方法已取得有竞争力的结果。我们进一步观察到,热初始化不一定需要显式噪声注入,因为引导信号本身通常可以作为反向扩散过程的有效初始化状态。总之,这些发现表明热初始化提供了一种简单而有效的框架,可作为更复杂音频转换流程的基本构建块。

英文摘要

In this paper, we propose diffusion warm initialization as a simple yet effective approach for a range of audio-to-audio transformation tasks. To illustrate the generality of the approach, we demonstrate its use in timbre transfer, MIDI-to-Real synthesis, and multiple audio enhancement tasks. We conduct a detailed empirical analysis on timbre transfer to investigate the role of the initialization time $t_\text{init}$. The effect of $t_\text{init}$ is evaluated using pitch-based Jaccard Distance and Fréchet Audio Distance to quantify faithfulness to the input signal and alignment with the target distribution. Our results provide practical guidance for selecting $t_\text{init}$ and show that, once properly chosen, a single pretrained diffusion model combined with warm initialization can support multiple transformation objectives without task-specific training or conditioning. Despite its simplicity, this approach already achieves competitive results when compared with more complex pipelines designed specifically for these tasks. We further observe that warm initialization does not necessarily require explicit noise injection, as the guide signal itself can often serve as a valid initialization state for the backward diffusion process. Together, these findings show that warm initialization provides a simple and effective framework that serves as a fundamental building block for more complex audio transformation pipelines.

2606.18917 2026-06-18 eess.SP 新提交

Spaceborne SAR Change Detection and Coherence Analysis for Maritime Port Monitoring

星载SAR变化检测与相干分析在海事港口监测中的应用

Necati Kagan Erkek, Kudret Esmer

AI总结 利用星载SAR幅度与干涉相干性,通过多时相分析检测港口结构变化,分辨率达0.42米。

Comments 6 pages

详情
AI中文摘要

星载合成孔径雷达(SAR)提供相干微波图像,适用于在光照和天气无关的采集条件下进行海事基础设施监测。针对中国天津港的SAR幅度和地理编码多时相数据,进行了学术会议风格的分析。处理流程包括幅度可视化、辐射定标、视角方向解释、距离和方位分辨率评估、斑点噪声抑制、基于幅度的变化检测、用于地理检查的GeoTIFF导出以及干涉相干性估计。直方图引导的显示限制提高了复杂SAR幅度图像的可解释性,而对阴影和亮叠掩响应的放大检查支持对光照几何的定性解释。使用二维傅里叶分析来表征主导频谱内容,并在可用图像坐标校准下估计出约0.42米的距离分辨率和0.19度的方位角间隔。随后,通过滤波后的幅度差和多个空间平均窗口计算的相干图,对多时相主从图像进行比较。结果突出了SAR幅度和相干产品在检测密集港口环境中(包括船舶、储罐、码头结构、工业场地和水陆过渡带)的结构和表面条件变化的相关性。

英文摘要

Spaceborne synthetic aperture radar (SAR) provides coherent microwave imagery suitable for maritime infrastructure monitoring under illumination-independent and weather-independent acquisition conditions. An academic conference-style analysis is presented for SAR amplitude and geocoded multitemporal data over Tianjin Port, China. The processing chain includes amplitude visualization, radiometric scaling, view-direction interpretation, range and azimuth resolution assessment, speckle reduction, amplitude-based change mapping, GeoTIFF export for geographic inspection, and interferometric coherence estimation. Histogram-guided display limits improve the interpretability of the complex SAR magnitude images, while zoomed inspection of shadows and bright layover responses supports qualitative interpretation of illumination geometry. A two-dimensional Fourier analysis is used to characterize dominant spectral content and to estimate an approximate range resolution of 0.42 m and an azimuth angular separation of 0.19 degrees under the available image-coordinate calibration. Multitemporal master and slave images are subsequently compared through filtered amplitude differences and coherence maps computed with multiple spatial averaging windows. The results highlight the relevance of SAR amplitude and coherence products for detecting structural and surface-condition variations in dense port environments characterized by vessels, storage tanks, quay structures, industrial yards, and water-land transitions.

2606.18827 2026-06-18 eess.SP 新提交

Controlled Out-of-Band Device-to-Device Communication in Cellular Networks Using a Backup Channel in Television White Space

蜂窝网络中利用电视白空间备份信道的受控带外设备到设备通信

Saifur Rahman, Syed Luqman Shah, Salim Nasar Faraj Mursal, Ziaul Haq Abbas, Muhammad Usman, Muhammad Irfan, Fazal Muhammad

AI总结 针对蜂窝网络频谱稀缺问题,提出利用认知无线电在电视白空间检测备份信道,当常规信道繁忙时建立受控带外D2D链路,降低阻塞和延迟概率。

Comments Published in: 2023 18th International Conference on Emerging Technologies (ICET)

详情
AI中文摘要

本文解决了蜂窝网络(CN)中的频谱稀缺问题。我们为位于同一宏小区内、受单个宏基站(eNB)控制的蜂窝用户(CU)提出了一种备份信道(BuC)。该BuC在电视白空间中运行,并通过认知无线电能量检测信道感知技术以一定的成功概率被CU检测到。当所有与蜂窝eNB的常规信道都被占用时,同一宏eNB覆盖区域内的CU可以利用感知到的BuC建立受控的带外设备到设备链路进行通信。BuC绕过eNB进行数据通信,减轻了CN核心的负担,从而提高了蜂窝eNB的容量。在所提出的系统模型中,每个CU和eNB配备两根天线,用于在两个独立的频段(即蜂窝频段和电视频段)进行通信。仿真结果表明,阻塞概率和呼叫延迟概率显著降低。

英文摘要

In this article, we address the problem of spectrum scarcity in cellular networks (CNs). We propose a backup channel (BuC) for cellular users (CUs) located in the same macro-cell under the control of a single macro base station (eNB). This BuC operates in television white space and is detected by the CUs through a cognitive radio energy-detection channel-sensing technique with a certain probability of success. When all regular channels with the cellular eNB are occupied, the CUs within the same coverage area of the macro eNB can utilize the sensed BuC to establish a controlled out-of-band device-to-device link for communication. The BuC bypasses the eNB for data communication and reduces the burden on the core of the CN. This leads to improved cellular eNB capacity. In the proposed system model, each CU and eNB is equipped with two antennas for communication in two separate bands, i.e., cellular and TV bands. Simulations show significant reductions in the blocking probability and probability of call delay.

2606.18766 2026-06-18 eess.SP 新提交

Rotatable Antenna-Enhanced Secure Integrated Sensing and Communications Under Imperfect CSI

不完美信道状态信息下可旋转天线增强的安全集成感知与通信

Qi Yang, Kai Liu, Jingjing Zhao, Xidong Mu, Tianqi Mao, Kaiquan Cai

AI总结 针对不完美窃听信道状态信息,提出可旋转天线增强的安全集成感知与通信系统,通过联合优化发射波束成形、人工噪声协方差矩阵和天线指向,在最大信息泄露和最小感知功率约束下最大化最小数据速率。

详情
AI中文摘要

研究了一种可旋转天线增强的安全集成感知与通信系统,其中基于RA的收发器同时与合法用户通信并感知被视为潜在窃听者的目标。在不完美窃听信道状态信息下,通过联合优化发射波束成形、人工噪声协方差矩阵以及RA的发射/接收指向,在最大信息泄露和最小感知功率约束下,制定了一个最大-最小数据速率优化问题。为了解决高度非凸的问题,分别通过S-Procedure方法和Cauchy-Schwarz不等式将信息泄露和感知功率约束转化为凸约束。随后,开发了一种交替优化算法,将重新表述的问题分解为两个子问题。具体地,利用逐次凸逼近和半定松弛方法优化发射波束成形和人工噪声协方差矩阵,而通过粒子群优化获得RA指向。仿真结果表明,基于RA的方案显著优于基准方案,并且随着最大旋转范围的增加,对不完美CSI的鲁棒性增强。

英文摘要

A rotatable antenna (RA)-enhanced secure integrated sensing and communications system is investigated, where an RA-based transceiver simultaneously communicates with legitimate users and senses a target that is regarded as a potential eavesdropper. Under imperfect eavesdropping channel state information (CSI), a max-min data rate optimization problem is formulated by jointly optimizing the transmit beamforming, artificial noise (AN) covariance matrix, and transmit/receive boresights of RAs, subject to the maximum information leakage and minimum sensing power constraints. To address the highly non-convex problem, the information leakage and sensing power constraints are transformed into convex ones via S-Procedure method and Cauchy-Schwarz inequality, respectively. Subsequently, an alternating optimization algorithm is developed to decompose the reformulated problem into two subproblems. In particular, the transmit beamforming and AN covariance matrix are optimized by utilizing successive convex approximation and semi-definite relaxation methods, while the RA boresights are obtained by invoking the particle swarm optimization. Simulation results show that the RA-based scheme significantly outperforms the benchmarks, and offers enhanced robustness against imperfect CSI with the increase of the maximum rotation range.

2606.18758 2026-06-18 eess.SP 新提交

EH-FedSAG: Variance-Reduced Federated Learning with Energy-Aware Participation in Energy-Harvesting IoT

EH-FedSAG:能量采集物联网中具有能量感知参与的方差缩减联邦学习

Shahab Jahanbazi, Mateen Ashraf, Richard Demo Souza, Onel L. A. Lopez

AI总结 针对能量采集网络中设备参与不稳定和通信成本高的问题,提出基于服务器存储的方差缩减方法EH-FedSAG,在统一仿真框架下与EH-FedAvg对比,实验表明EH-FedSAG在测试精度和训练方差上均优于EH-FedAvg,尤其在能量稀缺和非独立同分布数据下优势更明显。

详情
AI中文摘要

能量采集网络中的联邦学习面临两大挑战:间歇性和随机性能量到达导致训练轮次中设备参与不稳定,以及有限能量预算下的高通信成本降低了整体训练效率。本文研究了基于时隙能量采集模型的联邦学习,并提出了EH-FedSAG,一种基于服务器存储的方差缩减方法。我们在相同的多信道正交多址接入上行链路模型下,并在一个统一的仿真框架内比较了EH-FedSAG与原始EH-FedAvg,该框架捕获了不同能量到达概率下的电池充电、本地计算成本和传输成本。性能根据同质和异质数据分布下训练轮次的测试准确率进行评估。结果表明,在所考虑的设置中,EH-FedSAG始终比EH-FedAvg获得更高的测试准确率,同时表现出显著更低的训练方差。在能量稀缺和非独立同分布数据下,EH-FedSAG的优势更为明显。

英文摘要

Federated learning (FL) in energy-harvesting (EH) networks is challenged by intermittent and stochastic energy arrivals that lead to unstable device participation across training rounds, and by high communication costs under limited energy budgets, reducing overall training efficiency. This paper studies FL under a slot-based EH model and proposes EH-FedSAG, a server-memory-based variance-reduced method. We compare EH-FedSAG with vanilla EH-FedAvg under the same multi-channel orthogonal multiple-access uplink model and within a unified simulation framework that captures battery charging, local computation cost, and transmission cost under different energy-arrival probabilities. Performance is assessed in terms of test accuracy over training rounds for both homogeneous and heterogeneous data distributions. The results show that EH-FedSAG consistently achieves higher test accuracy than EH-FedAvg in the considered settings, while exhibiting substantially lower training variance. The advantage of EH-FedSAG is more pronounced under scarce energy availability and non-independent/identically-distributed data.

2606.18615 2026-06-18 eess.AS eess.SP 新提交

A Survey of Methods for the Discretization of Phonograph Record Playback Filters

留声机唱片回放滤波器离散化方法综述

Benjamin R. Thompson, Tre DiPassio, Jenna Rutowski, Michael C. Heilemann

AI总结 本文综述了将留声机唱片回放均衡曲线从连续时间离散化为数字滤波器的方法,比较了多种方法在性能、计算成本和延迟方面的差异,为数字回放均衡系统开发提供参考。

Comments Presented at the AES 157th Convention, Best Student Paper Winner

详情
Journal ref
2024 Journal of the Audio Engineering Society, AES Convention Paper 10191
AI中文摘要

自1924年留声机唱片电气录音问世以来,唱片故意采用非均匀频率响应进行刻录,以最大化唱片上的信息密度并提高信噪比。为了在可用带宽内再现名义上平坦的信号,必须在回放时应用逆曲线来消除这种刻录曲线的影响。直到1953年引入所谓的RIAA曲线之前,任何特定唱片所需的回放曲线可能因唱片公司和时间而异。因此,任何想要聆听或恢复唱片信息的人必须拥有能够实现多种回放均衡的设备。这种校正可以通过模拟硬件或数字处理来实现。数字方法具有成本低和灵活性高的优点,但需要从连续时间(原始曲线定义域)到离散时间的变换。这种变换不可避免地会在奈奎斯特频率附近产生与连续时间响应的偏差。有许多成熟的方法用于离散化连续时间滤波器,这些方法在性能、计算成本和固有延迟方面各不相同。本文在留声机回放均衡的背景下探讨了执行这种变换的几种方法,并量化了每种方法的性能。本文旨在为开发数字回放均衡系统或类似需要数字近似连续时间滤波器响应的应用的人员提供参考。

英文摘要

Since the inception of electrical recording for phonograph records in 1924, records have been intentionally cut with a non-uniform frequency response to maximize the information density on a disc and to improve the signal-to-noise ratio. To reproduce a nominally flat signal within the available bandwidth, the effects of this cutting curve must be undone by applying an inverse curve on playback. Until 1953, with the introduction of what has become known as the RIAA curve, the playback curve required for any particular disc could vary by record company and over time. As a consequence, anyone seeking to hear or restore the information on a disc must have access to equipment that is capable of implementing multiple playback equalizations. This correction may be accomplished with either analog hardware or digital processing. The digital approach has the advantages of reduced cost and expanded versatility, but requires a transformation from continuous time, where the original curves are defined, to discrete time. This transformation inevitably comes with some deviations from the continuous-time response near the Nyquist frequency. There are many established methods for discretizing continuous-time filters, and these vary in performance, computational cost, and inherent latency. In this work, several methods for performing this transformation are explored in the context of phonograph playback equalization, and the performance of each approach is quantified. This work is intended as a resource for anyone developing systems for digital playback equalization or similar applications that require approximating the response of a continuous-time filter digitally.

2606.18573 2026-06-18 eess.AS eess.SP 新提交

Evaluating Dynamic Range Compressor Models Using Control-Voltage Measurements: an Approach and Dataset

使用控制电压测量评估动态范围压缩器模型:一种方法和数据集

Benjamin R. Thompson, Michael C. Heilemann

AI总结 提出通过直接比较模型与硬件的增益控制电压信号来评估动态范围压缩器模型,实验表明基于代理损失训练的模型在控制轨迹上不如直接训练模型,并发布包含增益控制电压信号的数据集。

Comments Accepted to DAFx 2026

详情
AI中文摘要

定义动态范围压缩器行为的量是作为输入电平函数的时变增益。然而,由于从现有数据集中包含的音频输入输出数据中隔离增益降低信号会产生病态逆问题,这些设备的模型通常使用代理指标进行评估。目前尚不清楚这些指标在多大程度上准确描述了模型需要模拟的行为,尤其是当基于波形的指标可能受到模拟处理和捕获引入的次要影响(即使这些影响听不见)时。我们研究了一种评估方法,其中模型产生的增益降低信号直接与硬件产生的增益降低控制电压信号进行比较。为了评估该指标作为学习目标的有效性,我们训练了一个灰盒模型,其损失直接基于增益控制信号计算,同时训练了两个使用常见代理损失的模型。在底层控制轨迹评估中,使用代理损失训练的模型未能达到与直接基于增益控制信号训练的模型相当的性能,并且波形域指标为直接指标明显区分的模型分配了相似的误差。为了促进这种评估方法的进一步探索,我们发布了一个Solid State Logic总线压缩器数据集,其中包含与音频输出一起捕获的增益控制电压信号。

英文摘要

The quantity that defines the behavior of a dynamic range compressor is the time-varying gain applied to the signal as a function of the input level. However, models of these devices are typically evaluated using proxy metrics because isolating the gain reduction signal from the audio input-output data included in existing datasets creates an ill-conditioned inverse problem. It is unclear how accurately these metrics describe the behavior the model is tasked with emulating, particularly as waveform-based metrics can be influenced by secondary effects introduced by analog processing and capture, even when those effects are inaudible. We investigate a method of evaluation in which the gain-reduction signal produced by a model is measured directly against a gain-reduction control voltage signal produced by the hardware. To evaluate the efficacy of this metric as a learning objective, a gray-box model is trained using loss computed directly over the gain control signals alongside two models trained using common proxy losses. The models trained using proxy losses did not achieve parity with models trained directly on the gain control signal when evaluated with respect to the underlying control trajectory, and the waveform-domain metrics assigned similar errors to models that were clearly separated by the direct metric. To facilitate further exploration of this method of evaluation, we present a Solid State Logic bus compressor dataset that includes the gain control voltage signal captured alongside the audio output.

2606.18492 2026-06-18 eess.IV 新提交

Dense Holographic Associative Memories

密集全息联想记忆

David J. Brady, Gregory Neory

AI总结 提出利用两级全息图级联实现现代Hopfield密集联想记忆的并行光学计算,通过一维编码层引入非线性并消除布拉格简并,同时设计非局部梯度响应介质实现线性效率缩放。

详情
AI中文摘要

联想回忆——将入射模式映射到存储中最相似的模式——是高维视觉前端的自然计算原语,正是体全息图原生执行的操作。我们证明,由一维编码层分隔的两级体全息图级联,通过并行光学计算精确地实现了现代Hopfield(密集联想记忆)检索映射 $\eta = V \text{softmax}(\lambda K^T x)$,其中逆温度通过编码层中的光学寻址空间光调制实现。通过一维编码路由输入和输出,而非直接在二维平面间路由,提供了原始Hopfield模型所缺乏的分离非线性,并通过平衡光栅波矢维度数($2+1=3$),消除了直接二维到二维全息图中强制分形采样的布拉格简并。忠实的密集存储进一步要求记录介质能够捕获神经元间连接,同时抑制导致均匀光折变材料效率下降 $M^{-2}$ 的场自能。我们提出一种非局部、梯度响应的介质,其与照明无关的衰变在原位恢复了线性 $M^{-1}$ 缩放,并在离散的对向二极管单元中展示了其接收、组合和存储功能。概述了实现OASLM堆叠和体积分子/纳米晶体的途径。

英文摘要

Associative recall -- mapping an incident pattern to the stored one it most resembles -- is the natural computational primitive of a high-dimensional vision front end, and it is precisely the operation a volume hologram performs natively. We show that a cascade of two volume holograms separated by a one-dimensional coded layer physically evaluates the modern Hopfield (dense associative memory) retrieval map, $η= V \text{softmax}(λK^T x)$, exactly as a parallel optical computation, with the inverse temperature realized via optically addressed spatial light modulation in the coded-layer. Routing the input and output through a 1D code rather than directly between 2D planes supplies the separating nonlinearity the original Hopfield model lacked and, by balancing the grating-wavevector dimension count ($2+1=3$), removes the Bragg degeneracy that otherwise forces fractal sampling on a direct 2D-to-2D hologram. Faithful dense storage further demands a recording medium that captures inter-neuron connections while rejecting the field self-energy responsible for the $M^{-2}$ efficiency falloff of homogeneous photorefractives. We propose a nonlocal, gradient-responsive medium whose illumination-independent decay recovers the linear $M^{-1}$ scaling in situ, and demonstrate its reception, combination, and storage functions in a discrete opposing-diode cell. Routes to OASLM-stack and volume molecular/nanocrystal realizations are outlined.

2606.18489 2026-06-18 eess.IV 新提交

GHOST-CAT: An Efficient and Practical Network for Mesh Generation from 3D Echocardiography

GHOST-CAT: 一种高效实用的三维超声心动图网格生成网络

Edward Ferdian, Debbie Zhao, Alistair A. Young, Martyn P. Nash

AI总结 提出GHOST-CAT两阶段网络,结合CNN、图卷积和Transformer,从3D超声心动图生成拓扑一致、时间连贯的左心室网格,在100例测试集上Dice系数达0.87(腔室)和0.75(心肌),优于现有方法。

详情
AI中文摘要

深度学习的最新进展显著加速了心脏成像工作流程,从分割到用于计算建模的网格生成。然而,由于3D超声心动图的低对比度噪声比、锥形视野以及对声影的敏感性,其分析面临独特挑战。在此,我们提出了一种专为3D超声心动图定制的高效实用网络。我们的方法由一个两阶段网络组成,结合了卷积神经网络、图卷积网络和Transformer,以创建准确的时间变化3D左心室网格,这些网格在整个心动周期中拓扑一致且时间连贯。我们的模型在100张3D超声图像的保留测试数据集上实现了比当前最先进方法更优越的网格重建精度,与心脏磁共振成像导出的参考分割相比,Dice系数为0.87±0.05(腔室)和0.75±0.07(心肌),平均±标准差表面距离为3.3±0.6毫米(心内膜)和3.5±0.5毫米(心外膜)。重建的网格能够自动计算常规临床指标,如体积、质量和应变,并支持生物物理数字孪生的高级应用。源代码在此https URL公开共享。

英文摘要

Recent advances in deep learning have significantly accelerated cardiac imaging workflows, from segmentation to the generation of meshes for computational modelling. Nevertheless, analysis of 3D echocardiograms presents unique challenges due to their low contrast-to-noise ratio, conical field of view, and susceptibility to acoustic shadowing. Here, we present an efficient and practical network tailored for 3D echocardiograms. Our method consists of a two-stage network that combines convolutional neural networks, graph convolutional networks, and transformers, to create accurate time-varying 3D meshes of the left ventricle that are topologically consistent and temporally coherent throughout the cardiac cycle. Our model achieved superior mesh reconstruction accuracy compared to current state-of-the-art methods on a held-out test dataset of 100 3D echo images, with a Dice coefficient of 0.87 +/- 0.05 (cavity) and 0.75 +/- 0.07 (myocardium), and mean +/- SD surface distances of 3.3 +/- 0.6 mm (endocardium) and 3.5 +/- 0.5 mm (epicardium), against reference segmentations derived from cardiac magnetic resonance imaging. The reconstructed mesh enables automated calculation of routine clinical indices, such as volume, mass, and strain, and enables advanced applications with biophysical digital twins. Source code is openly shared at https://github.com/EdwardFerdian/ghost-cat.

2606.18488 2026-06-18 eess.SP 新提交

Cell-Free Integrated Sensing and Communication

无蜂窝一体化感知与通信

Diluka Galappaththige, Chintha Tellambura

AI总结 综述无蜂窝架构与感知通信融合技术,涵盖分布式接入点、多站感知、资源优化等关键问题,并展望未来方向。

详情
AI中文摘要

无蜂窝(CF)一体化感知与通信(ISAC)将CF架构与ISAC功能相结合。CF-ISAC利用分布式接入点,消除小区边界,提升覆盖、频谱效率和可靠性。它还提高了能效,实现了鲁棒的多用户通信、分布式多站感知和无缝资源优化。目前缺乏对CF-ISAC的全面综述。本专著填补了这一空白,涵盖了基本原理、协作传输、雷达散射截面、目标参数估计、ISAC集成级别、感知指标和关键应用。它还探讨了多站感知的优势。讨论了性能分析、资源分配、安全性和用户/目标中心设计。最后,讨论了同步、多目标检测、干扰管理和前传限制。介绍了先进天线技术、网络辅助系统、近场CF-ISAC、跨技术集成和机器学习方法。

英文摘要

Cell-free (CF) integrated sensing and communication (ISAC) merges the CF architecture with ISAC functionalities. CF-ISAC leverages distributed access points, removes cell boundaries, and enhances coverage, spectral efficiency, and reliability. It also improves energy efficiency, enabling robust multi-user communication, distributed multi-static sensing, and seamless resource optimization. A comprehensive survey on CF-ISAC has been lacking. This monograph addresses that gap by covering the foundational principles, cooperative transmission, radar cross-section, target parameter estimation, ISAC integration levels, sensing metrics, and key applications. It also explores the advantages of multi-static sensing. Performance analysis, resource allocation, security, and user/target-centric designs are discussed. Finally, synchronization, multi-target detection, interference management, and fronthaul limitations are discussed. Advanced antenna technologies, network-assisted systems, near-field CF-ISAC, cross-technology integration, and machine learning approaches are presented.

2606.18435 2026-06-18 eess.SP 新提交

Covert Multi-Hop Communications for Heterogeneous Networks With Multiple Wardens

异构网络中多监听者场景下的隐蔽多跳通信

Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu

AI总结 针对多个被动监听者监控的异构无线网络,联合优化路由、模态选择和发射功率,在满足端到端速率要求下最大化网络隐蔽性,提出基于KL散度的低复杂度路由度量与两阶段优化算法。

详情
AI中文摘要

本文研究了由多个被动监听者监控的异构无线网络中的隐蔽多跳通信。为了在满足严格的端到端速率要求的同时最大化网络范围的隐蔽性,我们联合优化了路由、模态选择和发射功率。在同步多跳传输方案下,我们分析了两种不同的监听者模型的检测能力:采用中央融合中心的合谋监听者和独立运行的非合谋监听者。对于这两种模型,我们推导了最优检测器和检测错误概率(DEP)的精确表达式。此外,为了降低评估DEP的复杂度,我们基于伽马矩匹配开发了高精度的闭式近似,并使用Kullback-Leibler(KL)散度建立了严格的DEP下界。在此理论基础上,我们提出了一种高效的两阶段优化算法,将链路级资源分配与网络级路径选择解耦。通过将KL散度界转化为一种新颖的低复杂度路由度量(该度量普遍简化为信噪比的线性求和),与传统的基于每跳检测的度量相比,我们显著降低了计算开销。最后,数值仿真验证了理论分析,并展示了所提框架的接近最优的性能。

英文摘要

This paper investigates covert multi-hop communications in heterogeneous wireless networks monitored by multiple passive wardens. To maximize network-wide covertness while satisfying a strict end-to-end rate requirement, we jointly optimize routing, modality selection, and transmit power. Under a simultaneous multi-hop transmission scheme, we analyze the detection capabilities of two distinct warden models: colluding wardens employing a central fusion center, and non-colluding wardens operating independently. For both models, we derive optimal detectors and exact expressions for the detection error probability (DEP). In addition, to reduce the complexity of evaluating the DEP, we develop highly accurate closed-form approximations based on gamma moment matching and establish rigorous DEP lower bounds using Kullback-Leibler (KL) divergence. Building on this theoretical foundation, we propose an efficient two-stage optimization algorithm that decouples link-level resource allocation from network-level path selection. By translating the KL divergence bounds into a novel, low-complexity routing metric, which universally simplifies to a linear summation of signal-to-noise ratios, we substantially reduce the computational overhead compared to conventional per-hop detection-based metrics. Finally, numerical simulations validate the theoretical analysis and demonstrate the near-optimal performance of the proposed framework.

2606.19125 2026-06-18 eess.AS stat.ME 新提交

Continuous-Speech Parkinson's Disease Detection Using Acoustic and Inharmonicity Features

连续语音帕金森病检测:基于声学和非谐和性特征

Rujia Li, Niloofar Momeni, Susanna Whitling, Andreas Jakobsson

AI总结 提出一种基于连续语音的帕金森病检测方法,利用传统声学特征和新型非谐和性特征,实验表明连续语音模型优于持续元音模型。

详情
AI中文摘要

已有研究主要利用持续元音发声从语音数据中识别帕金森病(PD)。本文在此基础上,提出了一种针对连续语音的PD识别方法,从而实现对语音数据的实用背景监测,以检测指示PD的语音变化。使用两个不同的数据集,我们比较了最佳持续元音模型与所提出的连续语音模型的性能,清晰展示了后者的优越性能。我们研究了说话人级别评估和数据泄漏预防的方法,以及如何从连续语音中可靠提取元音信息。所提出的方法框架同时利用传统声学表示和一种有前景的新型基于非谐和性的框架,展示了后者如何提供互补信息以改善其中一个数据集的性能;然而,对于另一个数据集,该信息并未显著改善(或降低)性能,表明在得出其使用结论前需要进一步研究。总体而言,本文清晰展示了使用连续语音进行PD分类相比使用持续元音声音的优势。

英文摘要

Notable efforts have been made to identify Parkinson's disease (PD) from vocal data, primarily using sustained vowel phonations. In this work, we extend on these efforts introducing a PD identification approach for continuous speech, enabling a practical background monitoring of voice data to detect vocal changes indicative of PD. Using two distinct data sets, we compare the best sustained vowel model with that of the proposed continuous speech model, clearly illustrating the preferential performance of the latter. We examine approaches for speaker level evaluation and data leakage preventions, as well as how vowel information may be reliable extracted from continuous speech. The proposed method framework exploits both traditional acoustic representations and a promising novel inharmonicity based framework, showing how the latter provides complementary information improving the performance for one of the data sets; however, for the other data set, this information did not significantly improve (nor reduce) the performance, suggesting that further studies are required before being able to draw firm conclusions in its use. Overall, the work clearly illustrates the benefit of forming PD classification using continuous speech compared to using sustained vowel sounds.

2606.19157 2026-06-18 eess.AS cs.CL 新提交

IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

IndicContextEval:评估8种印度语言音频大语言模型上下文利用能力的基准

Sakshi Joshi, Dhruv Subhash Rathi, Sanskar Singh, Eldho Ittan George, R J Hari, Kaushal Bhogale, Mitesh M. Khapra

AI总结 提出IndicContextEval基准,包含8种印度语言555位说话人的56小时自然语音,通过7级提示框架评估音频大语言模型是否真正利用上下文而非依赖参数化知识。

Comments Accepted at Interspeech 2026

详情
AI中文摘要

音频大语言模型(AudioLLMs)能够基于文本提示(如领域描述或实体列表)进行语音识别。然而,尚不清楚这些模型是真正利用此类上下文,还是依赖预训练期间学到的参数化知识。现有基准无法回答这个问题,因为它们仅在固定提示条件下评估转录,且很少包含明确的上下文输入。我们引入IndicContextEval,这是一个56小时的多语言基准,包含来自8种印度语言和23个专业领域的555位说话人的自然语音。我们设计了一个7级提示框架,逐步引入上下文信号,包括元数据、自然语言描述、英语和本地文字的实体列表,以及包含错误实体的对抗性提示。评估五个模型揭示了上下文利用行为的显著差异,凸显了对音频大语言模型中上下文基础进行显式评估的必要性。

英文摘要

AudioLLMs enable speech recognition conditioned on textual prompts such as domain descriptions or entity lists. However, it remains unclear whether these models genuinely utilise such context or rely on parametric knowledge learned during pretraining. Existing benchmarks cannot answer this question because they evaluate transcription under fixed prompting conditions and rarely include explicit contextual inputs. We introduce IndicContextEval, a 56-hour multilingual benchmark of natural speech from 555 speakers across 8 Indian languages and 23 professional domains. We design a 7-level prompting framework that progressively introduces contextual signals, including metadata, natural-language descriptions, entity lists in English and native script, and adversarial prompts with incorrect entities. Evaluating five models reveals substantial differences in context utilisation behaviour, highlighting the need for explicit evaluation of contextual grounding in AudioLLMs.

2606.19101 2026-06-18 eess.SP cs.LG 新提交

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性:面向动力学学习的显式交互架构

Augusto Sarti

AI总结 提出基于波启发交互结构的显式动力学单元,通过结构化组织而非非线性表达实现建模能力,在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情
AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近,通常需要高模型复杂度来捕获结构化行为。在这项工作中,我们提出了一种替代范式,其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发,所提出的单元采用严格的因果组织,消除了代数循环,产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验,我们表明即使在有限的参数优化下,深度也能提高表示质量和泛化能力。特别地,所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示,这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明,结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法,突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

2606.18979 2026-06-18 eess.AS cs.CL cs.SD 新提交

Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia Assessment

缓解语音痴呆评估中的评分错误并补偿非语言子测试

Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

AI总结 研究通过融合转录分数和Whisper嵌入减少语音评估中的评分错误,并利用融合表示近似专家整体评分以补偿缺失的运动子测试,有效区分认知状态组。

Comments Accepted at INTERSPEECH 2026

详情
AI中文摘要

认知障碍的早期检测依赖于神经心理学测试,通过评估多个认知领域来最小化主观性。基于语音的评估可以支持诊断并提高可及性,但转录错误和非语言子测试(如运动技能)的遗漏限制了准确性。除了传统的测试分数,语音衍生特征可以提供对认知状态的额外见解。本研究调查了德国“综合征短测试”的语音评估,这是一种标准化的痴呆筛查测试,包含语言和运动子测试。我们训练模型,整合每个语言子测试的转录衍生分数和Whisper嵌入,以减少评分错误。为了补偿缺失的运动子测试,我们利用这些融合表示来近似专家整体评分。尽管省略了子测试,我们的模型与专家评分高度相关,并能有效且准确地区分认知状态组。

英文摘要

Early detection of cognitive impairment relies on neuropsychological tests to minimize subjectivity by assessing multiple cognitive domains. Speech-based evaluation can support diagnostics and improve accessibility, but transcription errors and the omission of nonverbal subtests (e.g., motor skills) limit accuracy. Beyond conventional test scores, speech-derived features can provide additional insights into cognitive status. This study investigates the speech-based evaluation of the German "Syndrom-Kurz-Test," a standardized dementia screening test comprising verbal and motor subtests. We train models that integrate transcript-derived scores and Whisper embeddings per verbal subtest to reduce scoring errors. To compensate for missing motor subtests, we then leverage these fused representations to approximate expert overall ratings. Despite omitting subtests, our models strongly correlate with expert ratings and efficiently and accurately discriminate between cognitive status groups.

2606.18734 2026-06-18 eess.SP cs.LG 新提交

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结 提出点云辅助切线高斯溅射(PC-TGS)框架,通过融合稀疏无线电测量与密集LiDAR几何数据,将角功率谱外推到未测量网格,实现大规模无线数字孪生中的高效信道预测。

详情
AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中,局部统计信道建模(LSCM)通过从参考信号接收功率(RSRP)测量中建模信道多径角功率谱(APS),已成为一种针对高效网络优化的最先进方法。然而,尽管其有效性,LSCM无法在绝大多数没有测量值的位置预测APS,这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战,我们提出了\emph{点云辅助切线高斯溅射}(PC-TGS),这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合,将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布,通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域,而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署,我们推导了用于APS bin积分的闭式高斯加权平均(GWA),并提供了可证明的误差界。在LiDAR扫描的城市规模数据集(500万个点,6310个RSRP样本)上的评估表明,与最先进的基线相比,PC-TGS在APS和RSRP预测性能上更优,并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

2606.18645 2026-06-18 eess.AS cs.AI 新提交

Augmenting Dysarthric Speech Severity Assessment with MOS Supervision

通过MOS监督增强构音障碍语音严重程度评估

Kaimeng Jia, Minzhu Tu, Zengrui Jin, Siyin Wang, Chao Zhang

发表机构 * Tsinghua University(清华大学) Beijing University of Posts(北京邮电大学)

AI总结 提出利用语音合成评估数据(QualiSpeech语料库的MOS标签)增强构音障碍语音评估,微调提升可懂度和自然度预测,联合训练主要提升自然度,减少对临床标注的依赖。

详情
AI中文摘要

构音障碍是一种以可懂度和交际有效性降低为特征的言语障碍。自动的构音障碍语音话语级评估可以支持可扩展的语音监测和治疗相关分析。然而,训练此类系统受到临床标注构音障碍语音稀缺的瓶颈限制。本工作提出利用语音合成评估数据,特别是来自QualiSpeech语料库的带有平均意见得分(MOS)标签的人工标注话语,来增强构音障碍语音评估。实验表明,在语音合成评估数据上微调持续提高了可懂度和自然度预测的性能,而联合训练主要在自然度上带来提升。这些结果表明,合成伪影和构音障碍语音共享感知共性,语音合成评估语料库提供了一种实用的增强来源,减少了对稀缺临床标注的依赖。

英文摘要

Dysarthria is a speech disorder marked by reduced intelligibility and communicative effectiveness. Automatic utterance-level assessment of dysarthric speech can support scalable speech monitoring and therapy-related analysis. Yet training such systems is bottlenecked by the scarcity of clinically annotated dysarthric speech. This work proposes to augment dysarthric speech assessment using data from speech synthesis evaluations, specifically human-annotated utterances with Mean Opinion Score (MOS) labels from the QualiSpeech corpus. Experiments show that fine-tuning on speech synthesis assessment data consistently improves performance on both intelligibility and naturalness prediction, while joint training yields gains primarily on naturalness. These results suggest that synthesis artifacts and dysarthric speech share perceptual commonalities, and speech synthesis evaluation corpora offer a practical augmentation source that reduces reliance on scarce clinical annotations.

2606.18480 2026-06-18 eess.AS cs.SD 新提交

Generalised Transcoding Framework for Arbitrary Spatial Audio Capture and Playback Formats

任意空间音频采集与回放格式的通用转码框架

Archontis Politis, Janani Fernandez, Leo McCormack

发表机构 * Faculty of Information Technology and Communication Sciences, Tampere University(信息科技与通讯科学学院,塔尔库大学) Department of Information and Communications Engineering, Aalto University(信息与通讯工程系,阿尔托大学)

AI总结 提出一种统一框架,通过估计时频域空间元数据(包括主成分和环境成分的角功率分布),实现从Ambisonic或原始麦克风阵列信号到任意目标回放格式的转码,支持独立旋转,实验证明其优于现有参数化渲染器。

Comments This work has been submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing for possible publication

详情
AI中文摘要

本文介绍了一种统一框架,用于对以Ambisonic信号或原始麦克风阵列信号形式捕获的空间声场景进行参数化分析和再现。所提出的方法估计时频相关的空间元数据,该元数据表征可变数量的主源分量和具有自身角功率分布的环境分量,其参数拟合捕获信号的观测空间协方差。该元数据用于构建目标回放格式的空间协方差,然后用于推导最优混合矩阵,以将场景转码用于目标再现系统上的回放。该方法还独立处理采集和回放设置的旋转。在听力测试中,使用来自Ambisonic、球形和头戴式阵列的模拟场景,比较了该方法的实时实现和其他现有的最先进参数化渲染器。结果突出了所提出框架在多种内容和接收器配置下的感知优势,特别是对于低阶和几何约束的麦克风阵列。

英文摘要

This article introduces a unified framework for the parametric analysis and reproduction of spatial sound scenes captured either as Ambisonic signals or as raw microphone array signals. The proposed method estimates time-frequency-dependent spatial metadata that characterises a variable number of primary source components and an ambience component with its own angular power distribution, whose parameters fit the observed spatial covariances of the captured signals. This metadata is used to construct spatial covariances of the target playback formats, which are then used to derive optimal mixing matrices for transcoding the scene for playback over the target reproduction system. The method additionally handles independent rotations of both capture and playback setups. Real-time implementations of the method and other existing state-of-the-art parametric renderers are compared in a listening test using simulated scenes from Ambisonic, spherical, and head-worn arrays. The results highlight perceptual benefits of the proposed framework across a diverse range of content and receiver configurations, particularly for lower-order and geometrically constrained microphone arrays.

2606.18402 2026-06-18 eess.SP cs.AI cs.AR cs.SY eess.SY 新提交

Deep-Learning-Based Pixelated Microwave Filter Design and Characterization using Electro-Optical Electric-Field Measurements

基于深度学习的像素化微波滤波器设计与表征:利用电光电场测量

Han Zhou, Richard Bannister, Caspar Pierce, Haojie Chang, David Widen, Ludvig Fornstedt, Gabriel Melin, Alexander Bohlin, Pontus Lindeberg Fredriksson, Dilbagh Singh, Christian Fager, Koen Buisman

发表机构 * Chalmers University of Technology(查尔姆斯理工大学) Advanced Technology Institute, University of Surrey(萨里大学先进科技研究所) National Physical Laboratory(国家物理实验室)

AI总结 提出结合卷积神经网络与遗传算法的深度学习方法,自动合成像素化微波滤波器,通过S参数和空间电场测量实验验证,实现7 GHz通带和9.5 GHz以上超过20 dB抑制,首次用电光测量揭示AI生成设计的电场模式。

详情
AI中文摘要

传统微波滤波器设计通常依赖迭代参数调整和预定义拓扑,这限制了设计空间并增加了开发时间。本研究采用深度学习方法,结合卷积神经网络与遗传算法,自动合成像素化微波滤波器。为实验验证该方法,分析了S参数和空间电场测量。合成的低通滤波器在仿真与实测性能之间表现出极好的一致性,实现了7 GHz通带,并在9.5 GHz以上具有超过20 dB的抑制。电光测量首次揭示了类似于耦合传输线或短截线结构的电场模式,为AI生成设计的涌现特性提供了见解。

英文摘要

Traditional microwave filter design typically relies on iterative parameter tuning and predefined topologies, which limits design space and increases development time. This study uses a deep learning approach combining convolutional neural networks with genetic algorithms to automate pixelated microwave filter synthesis. To validate the approach experimentally, both S-parameter and spatial electric-field measurements were analyzed. The synthesized low-pass filter demonstrated excellent agreement between simulated and measured performance, achieving a 7 GHz passband with over 20 dB suppression beyond 9.5 GHz. Electro-optical measurements, for the first time, revealed electric field patterns that resemble coupled transmission-lines or stub structures, providing insight into the emergent characteristics of AI-generated designs.

2606.18395 2026-06-18 eess.SP cs.AI cs.AR cs.SY eess.SY 新提交

Deep Learning-Driven Inverse Design of Doherty Power Amplifiers Using Pixelated Combiners and Dual-State Impedance Synthesis

基于深度学习的Doherty功率放大器逆向设计:使用像素化合成器和双态阻抗合成

Han Zhou, Haojie Chang, David Widen, Christian Fager

发表机构 * Tampere University(塔尔皮奥大学) Chalmers University of Technology(挑战者技术大学)

AI总结 提出一种结合深度卷积神经网络、像素化布局和遗传算法的三端口Doherty合成器设计方法,实现峰值和回退功率条件下的双态阻抗合成,在2.6-2.8 GHz频段内饱和输出功率>44.2 dBm,峰值漏极效率>71.2%。

详情
AI中文摘要

Doherty功率放大器(PA)的输出合成器将负载调制、阻抗匹配和相位补偿集成在一个网络中,使其设计和合成极具挑战性。本文提出了一种三端口Doherty合成器设计方法,结合深度卷积神经网络(CNN)、像素化布局表示和遗传算法(GA)与双态阻抗合成,以同时处理峰值和回退功率条件。作为概念验证,设计并制作了两款采用三端口像素化合成器的GaN HEMT Doherty PA原型。两款原型在2.6-2.8 GHz范围内均实现了超过44.2 dBm的实测饱和输出功率,峰值漏极效率高于71.2%。此外,在6-dB回退水平下测得的漏极效率高达64%。应用数字预失真后,每个原型的邻道泄漏比(ACLR)优于-51.3 dBc。

英文摘要

The output combiner of a Doherty power amplifier (PA) integrates load modulation, impedance matching, and phase compensation within a single network, making its design and synthesis highly challenging. In this paper, we propose a three-port Doherty combiner design methodology that combines deep convolutional neural networks (CNNs), pixelated layout representations, and genetic algorithms (GA) with dual-state impedance synthesis to address both peak and back-off power conditions. As a proof of concept, two GaN HEMT Doherty PA prototypes incorporating three-port pixelated combiners are designed and fabricated. Both prototypes achieve a measured saturated output power exceeding 44.2 dBm with peak drain efficiency above 71.2% within 2.6-2.8 GHz. Furthermore, a drain efficiency as high as 64% is measured at the 6-dB back-off level. After applying digital predistortion, each prototype achieves an adjacent channel leakage ratio (ACLR) better than -51.3 dBc.

2606.18354 2026-06-18 eess.IV cs.LG 新提交

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University(Fairleigh Dickinson大学) University of Colorado at Colorado Springs(科罗拉多州立大学)

AI总结 针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题,本文扩展Med-DDPM条件扩散模型,以解剖分割掩膜为条件生成3D结构MRI,实验表明合成数据训练的模型Dice分数与真实数据相当,混合数据训练则显著提升性能。

详情
Journal ref
2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)
AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像,为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而,由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化,合成阿尔茨海默病(AD)的高质量结构MRI数据仍然具有挑战性。在本文中,我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展,以生成专门针对AD的3D结构MRI。我们采用Med-DDPM,因为与其他生成模型相比,它具有稳定的结构和保真度,特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件,将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型,系统评估了合成图像的质量和实用性。实验结果表明,仅在合成数据上训练的分割模型达到了与真实数据训练(0.6513)相当的Dice分数(0.6532),同时召回率显著提高。值得注意的是,在混合数据集(混合真实和合成图像)上训练的模型优于真实和纯合成基线,Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用,并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

2606.19311 2026-06-18 eess.SY cs.SY 新提交

A Lyapunov-Based Perspective on Absolute Stability

基于李雅普诺夫的绝对稳定性视角

Tessina H. Scholl

AI总结 本文提出一个李雅普诺夫类解释框架,用于非标量圆判据及其小增益和严格无源特例,通过加强扇区约束避免严格正定条件,并利用LMI、代数Riccati方程和矩阵方程推导二次解。

Comments 16 pages, 7 figures; Preprint version of a manuscript accepted for at-Automatisierungtechnik (special issue 08/2026)

详情
AI中文摘要

本文提出了绝对稳定性概念的统一视角。特别地,它为具有小增益和严格无源特例的非标量圆判据开发了一个李雅普诺夫类解释框架。为此,提出了一个避免严格正定条件的李雅普诺夫类函数的一般定义不等式,这是通过加强扇区约束实现的。我们讨论了推导二次解的不同方法:通过线性矩阵不等式(LMI)、代数Riccati方程和矩阵方程。利用卡尔曼-雅库博维奇-波波夫(KYP)引理,恢复了经典的频域结果。推导了一个基于无源指数的结果,简化了评估。总体而言,所呈现的相互关系可能对分析和教学都有用。

英文摘要

This article presents a unifying perspective on absolute stability concepts. In particular, it develops a Lyapunov-like explanatory framework for a nonscalar circle criterion with its small-gain and strict-passivity special cases. To this end, a general defining inequality for a Lyapunov-like function is proposed that avoids strict definiteness conditions, enabled by a strengthening of the sector constraint. We discuss different ways to derive a quadratic solution: via a linear matrix inequality (LMI), an algebraic Riccati equation, and a matrix equation. By exploiting the Kalman-Yakubovich-Popov (KYP) lemma, classical frequency-domain results are recovered. A passivity-index-based result is derived that simplifies the evaluation. Overall, the presented interrelations may be useful for both analysis and teaching.

2606.19267 2026-06-18 cs.RO cs.SY eess.SY 新提交

A Mixed-Reality Testbed for Autonomous Vehicles

自动驾驶汽车的混合现实测试平台

H. M. Sabbir Ahmad, Ehsan Sabouni, Emrullah Celik, Zean Wan, Damola Ajeyemi, Christos G. Cassandras, Wenchao Li

发表机构 * Boston University(波士顿大学)

AI总结 提出一种混合现实硬件在环测试平台,集成物理移动机器人与高保真仿真环境,用于验证感知、规划和控制算法,并支持多智能体系统研究。

Comments 9 pages, 7 figures, 1 table

详情
AI中文摘要

我们提出了一种用于自动驾驶汽车的混合现实、硬件在环(HIL)测试平台,该平台将物理移动机器人测试平台与高保真仿真环境无缝集成。虚拟仿真能够创建多样化的、安全关键的驾驶场景,以验证最先进的感知、规划和控制算法,同时通过配备多模态传感器的物理机器人在逼真的虚拟环境中增强仿真,进一步促进严格的验证。我们的测试平台还利用无线通信实现车辆连接,并通过物理机器人和虚拟仿真代理的组合容纳大量代理,支持包括网联自动驾驶汽车(CAV)在内的多智能体系统研究。最后,我们提出了一种结合感知、规划和一种新颖的基于控制障碍函数(CBF)的在线学习控制器的安全保证框架,用于CAV。使用所提出框架的实验用于验证和展示测试平台的关键功能以及其在弥合仿真与真实世界硬件部署之间差距方面的整体效用。

英文摘要

We propose a mixed-reality, hardware-in-the-loop (HIL) testbed for autonomous vehicles that seamlessly integrates a physical testbed of mobile robots with a high-fidelity simulation environment. The virtual simulation enables the creation of diverse, safety-critical driving scenarios to validate state-of-the-art perception, planning, and control algorithms, while augmenting simulations with physical robots equipped with multimodal sensors in photorealistic virtual environments further facilitating rigorous validation. Our testbed also features vehicular connectivity using wireless communication and can accommodate a large number of agents through the combination of physical robots and virtual simulated agents, supporting research on multi-agent systems including Connected and Autonomous Vehicles (CAVs). Finally, we present a safety-guaranteed framework combining perception, planning and a novel online learning-based controller using Control Barrier Functions (CBFs) for CAVs. Experiments using the proposed framework are used to validate and demonstrate the key functionalities and the overall utility of the testbed to bridge the gap between simulation and real-world hardware deployment.

2606.19240 2026-06-18 cs.RO cs.CV cs.HC cs.SY eess.SY 新提交

Seeing Through Occlusion: Deterministic Arm Kinematic Correction for Robot Teleoperation

透过遮挡:机器人遥操作的确定性手臂运动学校正

Thomas M. Kwok, Nicholas Koenig, Yue Hu

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出手臂运动学校正方法,利用恒定臂长几何约束和勾股定理确定性地重建遮挡关节深度,无需复杂建模,经Vicon验证有效,并成功应用于遥操作。

详情
AI中文摘要

无标记、单RGB-D相机动作捕捉为机器人遥操作提供了一种低成本、非侵入性的替代传统标记系统的方法;然而,在自遮挡存在时,特别是上肢运动期间,深度估计常常退化。本文提出了一种手臂运动学校正(AKC)方法,通过基于恒定臂长施加几何约束来改进深度估计。所提出的方法利用手腕位置和预定义臂长,基于勾股定理的确定性公式重建遮挡关节深度,从而避免了对复杂概率建模或参数调整的需求。针对Vicon参考系统的实验验证表明,该方法在静态和动态关节运动下均表现出可靠的性能,通过均方根误差(RMSE)和皮尔逊相关性进行评估。此外,在模拟和物理机器人环境中成功演示了运动映射遥操作。结果表明,AKC在长时间、严重自遮挡下增强了鲁棒性并保持了解剖一致性,即使与不太可靠的时间滤波器配对时也是如此,突显了其在机器人遥操作和人机交互等实时应用中的实用性。

英文摘要

Markerless, single-RGB-D-camera motion capture provides a low-cost and non-invasive alternative to conventional marker-based systems for robot teleoperation; however, depth estimation often degrades in the presence of self-occlusion, particularly during upper-limb motion. This paper presents an Arm Kinematic Correction (AKC) method that improves depth estimation by enforcing geometric constraints based on constant arm lengths. The proposed approach reconstructs occluded joint depths by leveraging wrist positions and predefined arm lengths via a deterministic formulation based on the Pythagorean theorem, thereby avoiding the need for complex probabilistic modeling or parameter tuning. Experimental validation against a Vicon reference system demonstrates reliable performance for both static and dynamic joint motions, evaluated using root-mean-square error (RMSE) and Pearson correlation. Furthermore, motion-mapping teleoperation is successfully demonstrated in both simulated and physical robot environments. The results show that AKC enhances robustness and preserves anatomical consistency under long-duration, severe self-occlusion, even when paired with less reliable temporal filters, highlighting its practicality for real-time applications such as robot teleoperation and human-robot interaction.

2606.19176 2026-06-18 cs.RO cs.AI cs.SY eess.SY 新提交

Hardware- and Vision-in-the-Loop Validation of Deep Monocular Pose Estimation for Autonomous Maritime UAV Flight

用于自主海上无人机飞行的深度单目位姿估计的硬件与视觉在环验证

Maneesha Wickramasuriya, Beomyeol Yu, Jaden Shin, Mason Huslig, Taeyoung Lee, Murray Snyder

发表机构 * George Washington University(乔治华盛顿大学)

AI总结 提出硬件验证的视觉在环框架,结合深度变换器单目位姿估计器和延迟卡尔曼滤波器,在模拟逼真海上环境中实现自主室内飞行,验证了感知延迟等嵌入式效应。

Comments 6 pages 9 figues

详情
AI中文摘要

船舶上的自主无人机操作需要可靠的基于视觉的相对位姿估计,然而海上验证成本高、依赖天气且风险大。本文提出一个硬件验证的视觉在环框架,能够在模拟逼真海上环境的同时实现完全自主的室内飞行。渲染的海上视图由板载的基于深度变换器的单目位姿估计器处理。延迟的视觉测量与高频率IMU数据通过延迟卡尔曼滤波器融合,为几何控制提供一致的状态估计。该系统捕捉了纯仿真中缺失的关键嵌入式效应,包括感知延迟、异步更新和计算约束。自主起飞、轨迹跟踪和着陆实验证明了稳定的闭环飞行。结果建立了一个安全且硬件真实的中间阶段,用于在船上部署之前开发海上无人机自主性。

英文摘要

Autonomous UAV operations on ships require reliable vision-based relative pose estimation, yet at-sea validation is costly, weather-dependent, and risky. This paper presents a hardware-validated vision-in-the-loop framework that enables fully autonomous indoor flight while emulating photorealistic maritime environments. Rendered maritime views are processed onboard by a deep transformer-based monocular pose estimator. Delayed vision measurements are fused with high-rate IMU data using a delayed Kalman filter to provide consistent state estimates for geometric control. The system captures critical embedded effects, including perception latency, asynchronous updates, and computational constraints, that are absent in pure simulation. Autonomous takeoff, trajectory tracking, and landing experiments demonstrate stable closed-loop flight. The results establish a safe and hardware-realistic intermediate stage for developing maritime UAV autonomy prior to shipboard deployment.