arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统110
2606.07463 2026-06-08 eess.SP cs.CE cs.LG 新提交

Amortized Neural Optimization for Pre-Layout Signal Integrity Design Space Exploration using Differentiable Surrogates

基于可微代理的布局前信号完整性设计空间探索的摊销神经优化

Julian Withöft, Werner John, Emre Ecik, Ralf Brüning, Jürgen Götze

AI总结 提出摊销神经优化(ANO)框架,利用可微神经网络代理模型替代迭代黑盒优化,实现单次前向传播获取近最优设计参数,在DDR5 DFE、SerDes均衡等场景中加速三到四个数量级。

详情
Comments
16 pages, 20 figures, 8 tables
AI中文摘要

高速信号完整性(SI)分析的布局前设计空间探索(DSE)通常受限于现代电子设计自动化(EDA)工作流程中仿真和迭代优化算法的计算成本。虽然机器学习代理模型加速了仿真步骤,但优化设计仍需利用迭代黑盒搜索方法。这种迭代性质扩展性差,使得多角点扫描计算成本高昂。作为解决方案,本文提出了用于布局前SI设计的摊销神经优化(ANO)。ANO通过利用完全可微的神经网络代理模型,完全消除了迭代黑盒推理。ANO从代理中提取解析梯度,以训练全局优化策略。推理时不再重复求解优化问题,而是离线学习优化过程,从而实现摊销。一旦ANO策略训练完成,它就能在单个确定性前向传播中直接将不同的通道上下文映射到近最优设计参数。基于三个复杂的SI设计场景展示了ANO框架的效率和准确性,包括DDR5决策反馈均衡(DFE)、9维SerDes Tx/Rx联合均衡以及DDR3 DQS差分对布线(在内部对偏斜约束下优化眼图指标)。与实例特定的黑盒算法相比,在牺牲约10%最优性的代价下,实现了三到四个数量级的加速。对于大规模32万实例多角点SerDes扫描优化,ANO将原本需要数天计算时间的迭代搜索算法压缩为一次批量前向传播,毫秒级完成。这将计算昂贵的SI优化转变为实时、交互式的布局前DSE。

英文摘要

Pre-layout design space exploration (DSE) for high-speed signal integrity (SI) analysis is often limited by the computational cost of simulations and iterative optimization algorithms within modern electronic design automation (EDA) workflows. While machine learning surrogate models accelerate the simulation step, optimizing designs still requires utilizing iterative black-box search methods. This iterative nature scales poorly, making multi-corner sweeps computationally expensive. As a solution, this paper proposes amortized neural optimization (ANO) for pre-layout SI design. ANO entirely eliminates iterative black-box inference by utilizing fully differentiable neural network surrogate models. ANO extracts analytical gradients from the surrogate to train a global optimization policy. Instead of solving the optimization problem repeatedly at inference, the optimization process is learned offline and therefore amortized. Once the ANO policy is trained, it maps different channel contexts directly to near-optimal design parameters in a single deterministic forward pass. The efficiency and accuracy of the ANO framework are demonstrated based on three complex SI design scenarios, including DDR5 decision feedback equalization (DFE), 9-dimensional SerDes Tx/Rx co-equalization, and DDR3 DQS differential pair routing to optimize eye diagram metrics under intra-pair skew constraints. By trading roughly 10% in optimality compared to instance-specific black-box algorithms, it realizes speedups of three to four orders of magnitude. For a large-scale 320,000-instance multi-corner SerDes sweep optimization, ANO collapses what would have taken days of computation using iterative search algorithms into a single batched forward pass that completes in milliseconds. This transforms computationally expensive SI optimization into real-time and interactive pre-layout DSE.

2606.07381 2026-06-08 eess.IV cs.AI cs.CV 新提交

Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios

合成病灶MR图像在低数据场景下自动局灶性皮质发育不良检测中的影响

Prabhjot Kaur, Hakim Ouaalam, Sedat Kandemirli, Sanjay P. Prabhu, Simon K. Warfield

AI总结 本研究通过条件生成网络合成FCD病灶MRI数据,评估其真实性及对自动检测的影响,发现合成数据可减少约20%标注需求,但真实数据仍更有效。

详情
AI中文摘要

背景与目的:自动检测局灶性皮质发育不良(FCD)需要大量体素级病灶勾画的MRI数据,这些数据难以获取。本研究旨在生成呈现FCD的合成MRI数据,评估其真实性,并评估其对自动FCD检测的影响,特别是在减少手动标注需求方面。方法:回顾性研究了来自多个(3个)中心的131例FCD患者和90例健康对照的T1加权(T1w)和T2加权液体衰减反转恢复(FLAIR)MRI扫描。通过将生成网络以二元FCD掩膜为条件生成合成MRI。两位神经放射科医生从14张真实和14张合成扫描的随机集合中识别真实图像。训练了三个nnU-Net模型用于检测FCD,分别使用:(i)仅真实数据(35例FCD/35例对照),(ii)真实数据(35例FCD/35例对照)加合成增强,以及(iii)扩展的真实数据(70例FCD/70例对照)。结果:专家区分真实与合成图像的能力有限,T1w分类准确率为60%,FLAIR为70%(评分者间一致性kappa=0.86)。用合成数据增强自动FCD检测使灵敏度提高8.14%(p=0.12),并改善了模型在真实病灶部位的置信度(0.83±0.11至0.89±0.12;p=0.02)。扩展真实数据模型进一步将灵敏度提高至73.8%(p<0.001),置信度提高至0.90±0.14(p=0.01)。结论:条件生成网络可以生成逼真的合成FCD-MRI,在保持同等灵敏度的情况下减少约20%的标注数据需求。当可用时,等量的真实数据仍比合成增强更有效。

英文摘要

Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficult to acquire. This study aims to generate synthetic MRI data exhibiting FCD, assess their realism, and evaluate their impact on automated FCD detection, particularly in reducing the need for manual annotations. Methods: T1-weighted (T1w) and T2-weighted Fluid-Attenuated Inversion Recovery (FLAIR) MRI scans from 131 FCD patients and 90 healthy controls from multiple (3) sites were retrospectively studied. Synthetic MRIs were generated by conditioning a generative network on binary FCD masks. Two neuroradiologists identified real images from a random set of 14 real and 14 synthetic scans. Three nnU-Net models were trained to detect FCD using: (i) real-only (35 FCD / 35 controls), (ii) real (35 FCD / 35 controls) plus synthetic augmentation, and (iii) expanded real data (70 FCD / 70 controls). Results: Experts showed limited ability to distinguish real from synthetic images, with classification accuracy of 60% for T1w and 70% for FLAIR (inter-rater agreement kappa = 0.86). Augmenting automated FCD detection with synthetic data increased sensitivity by 8.14% (p = 0.12) and improved model confidence at true lesion sites (0.83 +/- 0.11 to 0.89 +/- 0.12; p = 0.02). The expanded real-data model further improved sensitivity to 73.8% (p < 0.001) and confidence to 0.90 +/- 0.14 (p = 0.01). Conclusion: Conditional generative networks can generate realistic synthetic FCD-MRIs, reducing labeled data needs by approximately 20% while maintaining equivalent sensitivity. Equivalent amounts of real data, when available, remain more effective than synthetic augmentation.

2606.07374 2026-06-08 eess.SP cs.CV 新提交

Beyond Backscatter: InSAR coherence from detected SAR images

超越后向散射:来自检测SAR图像的InSAR相干性

Francescopaolo Sica, Andrea Pulella, Michael Schmitt

AI总结 提出一种深度学习框架,直接从检测SAR图像回归相干性,无需精确配准,使用Residual U-Net学习后向散射幅度与相干性的关系,在多种数据集上验证了高分辨率相干性回归的准确性提升和泛化能力。

详情
Comments
27 pages, 20 figures
AI中文摘要

在这项工作中,我们提出了一个深度学习框架,用于直接从检测SAR图像进行相干性回归,无需精确配准。使用从精确配准的Sentinel-1 SLC数据导出的相干性图训练Residual U-Net,以学习后向散射幅度与相干性之间的关系。模型在12天SLC对上训练,并在不同数据集上进行评估,包括配准的SLC产品和开放存取的分析就绪数据,覆盖不同的辐射特性、几何形状和位置。实验结果表明,与现有的基于强度的方法相比,所提出的方法实现了高分辨率相干性回归,且准确性更高。该网络在多样化的地理位置以及训练时从未见过的不同时间基线之间都能很好地泛化。此外,能够在全球可用的分析就绪数据(例如通过Google Earth Engine分发的地距检测数据)上运行,使其在任务设计、变化监测和多种制图任务中能够大规模应用。

英文摘要

In this work, we propose a deep learning framework for coherence regression directly from detected SAR images, without the need for accurate coregistration. A Residual U-Net is trained using coherence maps derived from precisely coregistered Sentinel-1 SLC data to learn the relationship between backscatter magnitudes and coherence. The model is trained on 12-day SLC pairs and evaluated across different datasets, including coregistered SLC products and open access analysis-ready data, covering diverse radiometric properties, geometries, and locations. Experimental results demonstrate that the proposed method achieves high-resolution coherence regression with improved accuracy compared to existing intensity-based approaches. The network generalizes well across diverse geographical locations and even across different temporal baselines that were never seen at training time. Additionally, the ability to operate on globally available analysis-ready data, such as ground range detected data, e.g., distributed through Google Earth Engine, enables its large-scale application in mission design, change monitoring, and diverse mapping tasks.

2606.07347 2026-06-08 eess.SP cs.ET 新提交

CSI Phase Averaging for High-Sensitivity Wi-Fi Sensing in Low-Multipath Environments

低多径环境下的高灵敏度Wi-Fi感知的CSI相位平均

Toshinori Suzuki, Shin-ichiro Ogura, Yu Morishima, Hiroshi Matsuura

AI总结 提出一种基于模型驱动的低复杂度运动检测方法,利用CSI相位结构特性抑制相位偏移误差,并通过相位平均降低噪声,实验证明可在低多径户外环境中检测数米外的飞鸟。

详情
Comments
13 pages, 11 figures, 3 tables
AI中文摘要

本文提出一种基于模型驱动的低复杂度运动检测方法,用于户外Wi-Fi感知。该方法利用低多径传播环境下信道状态信息(CSI)相位分量的结构特性(通常被认为不利于Wi-Fi感知),以减轻源自无线设备的相位偏移误差。此外,相位平均提供了处理增益,降低了包括量化噪声和热噪声在内的随机噪声分量。描述了该方法的理论基础,并使用从商用IEEE 802.11ac设备获取的压缩波束成形帧进行了实验评估。实验主要关注户外果园环境中飞行的野生乌鸦。实验结果表明,即使鸟类在距离发射和接收天线之间的直接视距路径数米外飞行,该方法也能检测到它们。此外,结果表明当风速低于3 m/s时,植被运动引起的波动可忽略不计。所提出的方法预计不仅适用于果园监测,也适用于低多径环境下的其他户外Wi-Fi感知应用。

英文摘要

This paper presents a low-complexity motion detection method for outdoor Wi-Fi sensing based on a model-driven approach. The method exploits the structural characteristics of the phase components in channel state information (CSI) for low-multipath propagation environments, which are generally considered disadvantageous for Wi-Fi sensing, to mitigate the phase offset errors originating from wireless devices. In addition, phase averaging provides a processing gain that reduces the random noise components, including quantization and thermal noise. The theoretical basis of the method is described and its effectiveness is experimentally evaluated using Compressed Beamforming frames obtained from commercial IEEE 802.11ac devices. The experiments primarily focus wild crows flying in an outdoor orchard environment. The experimental results demonstrate that the method can detect birds even when they fly several meters away from the direct line-of-sight path between the transmitter and receiver antennas. Furthermore, the results indicated that fluctuations caused by vegetation movement were negligible when the wind speed was less than 3~m/s. The proposed approach is expected to be applicable not only to orchard monitoring but also to other outdoor Wi-Fi sensing applications in low-multipath environments.

2606.07328 2026-06-08 eess.SP 新提交

Implementation and Calibration of 3GPP-Compliant ISAC Channel Simulator

符合3GPP标准的ISAC信道模拟器的实现与校准

Chien-Han Wu, Ming-Chun Lee, Ta-Sung Lee

AI总结 本文实现了3GPP TR 38.901中指定的ISAC信道模型模拟器,并通过与3GPP公司参考结果对比进行校准分析,为模拟器的实现和校准提供了关键细节。

详情
Comments
6 pages, Codes and other source files are open on GitHub
AI中文摘要

集成感知与通信(ISAC)已成为6G系统的关键技术。为了支持ISAC系统的开发,用于性能评估的精确信道建模与仿真至关重要。最近,3GPP为此引入了标准化的ISAC信道模型及其相关的校准程序。然而,由于建模方法的复杂性以及3GPP报告中缺乏完全明确的实现细节,不同的实现可能导致不一致或不同步的仿真结果。为了解决这个问题,在本工作中,我们实现了TR 38.901中指定的3GPP ISAC信道模型模拟器,并进行了全面的校准分析。我们将仿真结果与3GPP中公司报告的参考结果进行比较,并讨论了几个关键的实现细节,以提供对模拟器实现和校准的见解。为了促进可重复性和进一步研究,所开发的模拟器以及相关数据集和校准结果已作为开源项目在GitHub上发布。

英文摘要

Integrated sensing and communication (ISAC) has emerged as a key technology for 6G systems. To support the development of ISAC systems, accurate channel modeling and simulation for performance evaluation is essential. Recently, 3GPP introduced a standardized ISAC channel model and its associated calibration procedure for this purpose. However, due to the complexity of the modeling methodology and the lack of fully explicit implementation details in the 3GPP reports, different implementations may lead to inconsistent or unsynchronized simulation results. To address this issue, in this work, we implement the 3GPP ISAC channel model simulator specified in TR 38.901 and conduct a comprehensive calibration analysis. We compare the simulation results with the reference results reported by companies in 3GPP and discuss several key implementation details to provide insights into the implementation and calibration of the simulator. To facilitate reproducibility and further research, the developed simulator, together with the relevant datasets and calibration results, has been released as an open-source project on GitHub.

2606.07284 2026-06-08 eess.SP 新提交

RSMA Enabled Hierarchical UAV Networks with Non Linear Energy Harvesting: Outage Probability Analysis and UAV Placement Optimization

具有非线性能量收集的RSMA赋能分层无人机网络:中断概率分析与无人机部署优化

Faicel Khennoufa, Khelil Abdellatif, Metin Ozturk, Halim Yanikomeroglu, Safwan Alfattani

AI总结 针对分层无人机网络中的能量受限和硬件损伤问题,提出结合非线性能量收集与速率分割多址接入的方案,推导中断概率表达式并优化无人机部署,显著提升可靠性。

详情
Comments
Accepted in IEEE Transactions on Vehicular Technology
AI中文摘要

无人机有望增强第六代蜂窝网络的连接性、扩展网络覆盖并支持高级通信服务,特别是在公共和民用应用中。尽管多无人机系统比单无人机部署具有更高的效率和成本效益,但其实现仍面临若干基本挑战,限制了其可靠性、可持续性和可扩展性。有限的机载能量限制了任务持续时间和通信连续性。因此,无线能量收集成为克服这一限制的有前景的解决方案。然而,地面能源存在路径损耗,使得从周围无人机收集能量更具可持续性。此外,在硬件损伤和不完美信道状态信息下,速率分割多址接入在分层无人机网络中尚未得到充分探索。本文提出一种具有非线性能量收集和RSMA的分层自组织无人机网络,以提高能量和成本效率,其中无人机从周围无人机收集能量。针对实际场景,我们在所提系统中考虑了HWI和ICSI的影响。据作者所知,本研究是文献中首次对此类场景进行探讨。推导了地面物联网设备、每个CMU以及所提系统总中断概率的表达式,基于Nakagami-$m$衰落信道,同时考虑了HWI、ICSI和非线性EH等实际约束。此外,还推导了高发射功率区域下的近似中断概率表达式。随后,我们制定了两个优化问题以提高可靠性和性能。结果表明,所提系统在中断概率方面优于所有基准方案。

英文摘要

Uncrewed aerial vehicles (UAVs) are expected to enhance connectivity, extend network coverage, and support advanced communication services in sixth-generation (6G) cellular networks, particularly in public and civil applications. Although multi-UAV systems offer greater efficiency and cost-effectiveness than single-UAV deployments, their implementation still faces several fundamental challenges that limit their reliability, sustainability, and scalability. The limited onboard energy restricts mission duration and communication continuity. Therefore, wireless energy harvesting (EH) emerges as a promising solution to overcome this limitation. However, terrestrial energy sources experience path loss, making EH from surrounding UAVs more sustainable. Moreover, rate-splitting multiple access (RSMA) remains insufficiently explored in hierarchical UAV networks under hardware impairments (HWI) and imperfect channel state information (ICSI). This paper proposes a hierarchical ad hoc UAV network with non-linear EH and RSMA to enhance both energy and cost efficiency, where UAVs harvest energy from surrounding UAVs. For a practical scenario, we consider the effect of HWI and ICSI in our proposed system. To the best of the authors knowledge, this study is the first to investigate such a scenario in the literature. The outage probability expressions for ground Internet of things (IoT) devices, each CMU, and the overall outage probability of the proposed system are derived over Nakagami-$m$ fading channels while considering practical constraints such as HWI, ICSI, and non-linear EH. Additionally, approximate outage probability expressions are derived for high transmit power regimes. Subsequently, we formulate two optimization problems to enhance reliability and performance. Our findings indicate that the proposed system outperforms all benchmarks in terms of outage probability.

2606.07264 2026-06-08 eess.AS 新提交

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

VISA:面向Interspeech 2026 ARC智能体赛道的视觉信息增强音频推理系统

Wenming Tu, Jian Gao, Yanru Huo, Yixuan Wang, Jing Peng, Bohan Li, Ziyang Ma, Tao Liu, Shuai Fan, Kai Yu, Xie Chen, Zilong Zheng

AI总结 提出VISA系统,通过多模态特征提取、模型投票推理和细粒度类别感知路由,增强大音频语言模型的音频推理能力,在ARC智能体赛道取得66.23%评分和77.40%准确率。

详情
Comments
Submitted to INTERSPEECH 2026
AI中文摘要

音频推理需要对时变动态和声学混合信号进行多步骤、基于证据的推理,超越了传统感知任务如ASR或字幕生成。我们提出VISA,作为提交至Interspeech 2026音频推理挑战赛(智能体赛道)的系统,通过MMAR评分标准评估正确性和推理质量。在“LALM作为工具”范式下,VISA利用辅助多模态证据增强大音频语言模型,同时避免繁重的编排。该系统集成三个组件:多模态特征提取以获取互补的音频和声学-视觉线索,带一致性检查的模型投票推理以获得稳定预测,以及细粒度类别感知路由以解决分歧并选择符合评分标准的推理链。在官方智能体赛道排行榜上,VISA以66.23%的评分排名第二。它还达到了77.40%的准确率,是单模型和智能体赛道所有系统中最高的。

英文摘要

Audio reasoning requires multi-step, evidence-grounded inference over temporally dynamic and acoustically mixed signals, exceeding conventional perception tasks such as ASR or captioning. We present VISA, our submission to the Interspeech 2026 Audio Reasoning Challenge (Agent Track), evaluated via the MMAR Rubrics for correctness and reasoning quality. Under a "LALM as a Tool" paradigm, VISA strengthens large audio language models with auxiliary multi-modal evidence while avoiding heavy orchestration. The system integrates three components: multi-modal feature extraction for complementary audio and acoustic-visual clues, model-voting inference with consistency checking for stable predictions, and fine-grained category-aware routing to resolve disagreements and select rubric-aligned reasoning chains. On the official Agent Track leaderboard, VISA ranks 2nd overall with a 66.23% Rubrics score. It also achieves 77.40% Accuracy, the highest among all systems listed across both the Single Model and Agent tracks.

2606.07259 2026-06-08 eess.AS cs.SD 新提交

Assessing True Generalisability of Audio-Visual Speech Recognisers

评估音视频语音识别器的真正泛化能力

Zhaofeng Lin, Stavros Petridis, Maja Pantic, Naomi Harte

AI总结 通过构建与LRS3测试集严格匹配的评估集,发现当前最先进的音视频语音识别模型在未见数据上性能全面崩溃,揭示了其泛化能力不足,并分析了退化原因、词汇偏差和错误模式。

详情
Comments
Accepted to Interspeech 2026 Long paper track. 9 pages, 4 figures
AI中文摘要

当前的音视频语音识别(AVSR)模型在标准LRS3基准上实现了近乎完美的性能,引发了对自适应过拟合的担忧。为了系统评估真正的泛化能力,我们从大规模MultiVSR数据集中构建了一个高度可控、未见过的评估子集。与标准的分布外基准不同,我们的子集在声学、视觉和人口统计分布上与LRS3测试集严格匹配。评估五种最先进的架构揭示了普遍的性能崩溃,证明当前系统即使在严格对齐的条件下也无法泛化。通过跨七个因素的细粒度属性分析,我们隔离了这种退化的具体驱动因素。此外,我们发现了深刻的词汇偏差,揭示了不同的错误模式,并令人惊讶地发现音视频性能甚至落后于纯音频设置。我们发布了匹配的测试集,用于未来的基准测试。

英文摘要

Current Audio-Visual Speech Recognition (AVSR) models achieve near-perfect performance on the standard LRS3 benchmark, raising concerns of adaptive overfitting. To systematically assess true generalisability, we construct a highly controlled, unseen evaluation set subsampled from the massive MultiVSR dataset. Unlike standard out-of-distribution benchmarks, our subset strictly matches the acoustic, visual, and demographic distributions of the LRS3 test set. Evaluating five state-of-the-art architectures reveals a universal performance collapse, proving that current systems fail to generalise even under strictly aligned conditions. Through a fine-grained attribute analysis across seven factors, we isolate the specific drivers of this degradation. Furthermore, we uncover a profound lexical bias, expose distinct error patterns, and surprisingly reveal that audio-visual performance even lags behind audio-only settings. We release our matched test set for future benchmarking.

2606.07182 2026-06-08 eess.AS 新提交

Audio Imitator: Controlling Timbre and Tempo in Video2Audio Synthesis with Audio Reference

Audio Imitator: 通过音频参考控制视频到音频合成中的音色和节奏

Jiahui Zhao, Tianrui Wang, Chunyu Qiang, Cheng Gong, Xijuan Zeng, Feng Deng, Longbiao Wang

AI总结 提出AudioIM框架,通过双编码器分离建模音色和节奏,实现细粒度风格控制,在保持语义一致性的同时提升风格相似度。

详情
AI中文摘要

视频到音频生成在实现无声视频的语义一致性和时间对齐方面取得了显著进展。然而,音频包含丰富的风格属性,如音色和节奏,这些很难仅从视觉和文本输入中推断出来。虽然参考音频可以作为额外的条件,但它通常被视为整体信号,限制了细粒度的风格控制。我们提出AudioIM,一个属性感知框架,明确将音色和节奏建模为独立的控制因素,而不是依赖整体提示条件。双编码器提取互补的音色相关和节奏相关表示,并通过全局条件注入。基于掩码的训练策略使得在推理时能够进行有效的潜在提示条件。在VGGSound上的实验表明,在保持语义对齐和同步的同时,风格相似度得到了提升。音频样本可在以下网址获取:this https URL。

英文摘要

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as timbre and tempo that are difficult to infer from visual and textual inputs alone. While reference audio can serve as additional conditioning, it is typically treated as a holistic signal, limiting fine-grained style control. We propose AudioIM, an attribute-aware framework that explicitly models timbre and tempo as separate control factors rather than relying on holistic prompt conditioning. Dual encoders extract complementary timbre-related and tempo-related representations, which are injected through global conditioning. A masking-based training strategy enables effective latent prompt conditioning at inference. Experiments on VGGSound show improved style similarity while preserving semantic alignment and synchronization. Audio samples are available at: https://anonymousdemo757.github.io/.

2606.07104 2026-06-08 eess.SP 新提交

Robust Secure Beamforming for Movable Antenna Enhanced Integrated Sensing and Communications

可移动天线增强集成感知与通信的鲁棒安全波束赋形

Yuan Chen, Ning Wei, Ahmad Bazzi, Xiangyu Dong, Ran Yang, You Li, Yue Xiu

AI总结 针对不完美窃听信道状态信息,提出联合优化发射波束赋形和天线位置的鲁棒波束赋形设计,以最大化雷达信干噪比并保证通信安全,采用基于块坐标下降的算法结合逐次凸近似和分数规划。

详情
AI中文摘要

在这封信中,我们研究了可移动天线增强的安全集成感知与通信系统中,在存在不完美窃听信道状态信息情况下的鲁棒波束赋形设计。为了提升雷达感知性能,我们通过联合优化发射波束赋形和天线位置,同时确保通信数据安全,提出了一个雷达信干噪比最大化问题。然而,由于天线位置到信道系数的非线性映射以及窃听者信道的不确定性,所得到的优化问题本质上是难以处理的。为了应对这些挑战,我们提出了一种基于块坐标下降的算法,结合了逐次凸近似和分数规划技术。仿真结果表明,我们提出的算法具有快速收敛性,并在保证通信安全的同时显著提升了雷达信干噪比。

英文摘要

In this letter, we investigate robust beamforming design for a movable antenna (MA)-enhanced secure integrated sensing and communications (ISAC) system with imperfect eaves?dropping channel state information (CSI). To improve radar sensing performance, we formulate a radar signal-to-interference?plus-noise ratio (SINR) maximization problem by jointly opti?mizing the transmit beamforming and antenna placement while ensuring communication data security. However, the resulting op?timization problem is inherently intractable due to the nonlinea mapping from antenna positions to channel coefficients, as well as the eavesdropper (Eve) channel uncertainty. To handle these challenges, we propose a block coordinate descent (BCD)-based algorithm incorporating successive convex approximation (SCA) and fractional programming (FP) techniques. Simulation results show that our proposed algorithm exhibits fast convergence and achieves a significant improvement in the radar SINR while guaranteeing communication security.

2606.07091 2026-06-08 eess.SP 新提交

Rate-Splitting--Inspired Uplink Near-Field ISAC

速率分裂启发的上行近场ISAC

Anup Mishra, Israel Leyva-Mayorga, Petar Popovski

AI总结 提出速率分裂(RS)启发的上行近场ISAC框架,通过分裂通信消息到感知操作,推导通信速率和感知速率的闭式表达式,表征可达速率区域,证明RS启发边界优于NOMA启发的时间共享区域。

详情
AI中文摘要

集成感知与通信(ISAC)使感知和通信(S&C)功能共享频谱、硬件和信号处理资源,但由此产生的功能间干扰带来了基本的接收机设计挑战,特别是在上行链路操作中。本文开发了一个速率分裂(RS)启发的上行近场ISAC框架。该框架通过将通信消息分裂到感知操作中,推广了非正交多址(NOMA)启发ISAC的感知中心(S-C)和通信中心(C-C)端点顺序。推导了通信速率(CR)和感知速率(SR)的闭式表达式,考虑了来自目标响应估计不确定性的残余感知干扰。在感知匹配照明下表征了可达CR-SR速率区域,其中所提出的单帧RS启发边界包含NOMA启发的时间共享区域。与经典高斯上行多址信道(其中RS恢复时间共享主导面)不同,上行ISAC中的分裂因子也重塑了感知阶段的干扰,使得RS启发边界匹配或严格扩大S&C折衷。高信噪比分析表明,对于非对齐的S&C信道,残余感知干扰改变速率偏移但不改变主导S&C斜率,而在完全对齐的情况下,它变得斜率受限。使用孔径感知的近场信道模型,推导了大阵列极限,表明随着阵列增长,可达速率保持有限。数值结果验证了分析,并展示了RS启发方案的优势、残余感知干扰的影响以及由物理一致近场建模引起的有限大阵列行为。

英文摘要

Integrated sensing and communication (ISAC) enables sensing and communication (S&C) functionalities to share spectrum, hardware, and signal-processing resources, but the resulting inter-functionality interference creates a fundamental receiver-design challenge, particularly in uplink operation. This paper develops a rate-splitting (RS)-inspired framework for uplink near-field ISAC. The framework generalizes the sensing-centric (S-C) and communication-centric (C-C) endpoint orders of non-orthogonal multiple access (NOMA)-inspired ISAC by splitting the communication message across the sensing operation. Closed-form expressions are derived for the communication-rate (CR) and sensing-rate (SR), accounting for residual sensing interference from target-response estimation uncertainty. The achievable CR-SR rate region is characterized under sensing-matched illumination, where the proposed single-frame RS-inspired boundary contains the NOMA-inspired time-sharing region. Unlike the classical Gaussian uplink multiple access channel, where RS recovers the time-sharing dominant face, the split factor in uplink ISAC also reshapes the sensing-stage interference, allowing the RS-inspired boundary to match or strictly enlarge the S&C tradeoff. High-SNR analysis shows that, for non-aligned S&C channels, residual sensing interference changes the rate offsets but not the leading S&C slopes, whereas in the fully-aligned case it becomes slope-limiting. Using an aperture-aware near-field channel model, large-array limits are derived, showing that achievable rates remain finite as the array grows. Numerical results validate the analysis and demonstrate the benefits of the RS-inspired scheme, the impact of residual sensing interference, and the bounded large-array behaviour induced by physically consistent near-field modelling.

2606.07063 2026-06-08 eess.IV cs.CV 新提交

Beyond Universality: The GCC-FER Dataset and Culture-Aware Adaptation for Dynamic Facial Expression Recognition

超越普遍性:GCC-FER数据集及面向动态面部表情识别的文化感知适应

Sonalika Singh, Jyotirindra Dandapat, Avishi Razdan, Kshipra V. Moghe, Puneet Gupta, Lalan Kumar

AI总结 针对动态面部表情识别中文化差异被忽视的问题,提出首个大规模全球跨文化数据集GCC-FER,并设计文化感知适应系统CA-FER,通过自适应校准面部表示减轻文化偏差,实验证明其有效性。

详情
AI中文摘要

动态面部表情识别(DFER)是情感计算、人机交互和智能多媒体系统中的关键使能技术。尽管文化细微差别对FER性能有显著影响,但大多数现有FER系统假设情感表达在人群中普遍一致。这种差异可归因于不同文化中面部肌肉激活模式的系统性差异。推进跨文化FER的主要挑战在于缺乏文化多样性的基准数据集。为解决这一问题,本文引入了一个名为全球跨文化面部表情识别(GCC-FER)的新型混合多元文化视频数据集。GCC-FER包含跨越四种文化群体(非洲、高加索、东亚和南亚)的23,934个视频样本,涵盖七种基本表情,结合了对代表性不足人群的心理学家监督内部数据收集以及对现有来源的严格种族过滤。据我们所知,GCC-FER是首个旨在解决这些人口统计差距的大规模全球跨文化DFER数据集。利用该数据集,为每个文化群体推导出基于行为的文化先验,并为实际部署推导出全局先验。提出了一种文化感知FER(CA-FER)系统,通过自适应重新校准潜在面部表示来减轻文化偏差。在GCC-FER和DFEW上的大量实验表明,所提系统在多文化环境下持续提高了FER性能。

英文摘要

Dynamic Facial Expression Recognition (DFER) is a key enabling technology in affective computing, human-computer interaction, and intelligent multimedia systems. Despite the significant influence of cultural nuances on FER performance, most existing FER systems assume that emotional expressions are universally consistent across populations. This variation can be attributed to systematic differences in facial muscle activation patterns across cultures. A major challenge in advancing cross-cultural FER lies in the scarcity of culturally diverse benchmark datasets. To address this, a new hybrid multicultural video dataset termed Global Cross-Cultural Facial Expression Recognition (GCC-FER) is introduced. GCC-FER comprises 23,934 video samples spanning four cultural groups (African, Caucasian, East Asian, and South Asian) across seven basic expressions, combining psychologically supervised in-house data collection for underrepresented populations with rigorous ethnicity filtering of existing sources. To the best of our knowledge, GCC-FER is the first large-scale global cross-cultural DFER dataset designed to address these demographic gaps. Leveraging this dataset, behaviorally grounded cultural priors are derived for each cultural group and a global prior for practical deployment. A Culture-Aware FER (CA-FER) system is proposed to mitigate cultural bias by adaptively recalibrating latent facial representations. Extensive experiments on GCC-FER and DFEW demonstrate that the proposed system consistently improves FER performance across multicultural settings.

2606.07050 2026-06-08 eess.SP 新提交

Optimized Sampling of Angle-Resolved Scatterometry Data Using End-to-End Compressed Learning Model for Nanograss Deficiency Detection

使用端到端压缩学习模型优化角度分辨散射测量数据采样用于纳米草缺陷检测

Mehdi Abdollahpour, Carsten Bockelmann, Armin Dekorsy

AI总结 提出端到端压缩学习框架,集成可学习纬度采样层与CNN,联合优化采样与分类,在减少90%采样点下保持94.2%的五级缺陷分类精度。

详情
Comments
Preprint. 13 pages, 11 figures
AI中文摘要

纳米表面的可靠检测对于确保纳米结构制造质量至关重要。角度分辨散射测量提供了一种非侵入式检测方法,可在线使用,但由于密集的角度采样,通常采集时间较长。本文针对数据采集挑战,提出了一种端到端压缩学习框架,用于使用ARS图像检测氧化锌纳米草中的5级空位缺陷。该框架将可学习的基于纬度的采样层与卷积神经网络集成,使得采样和分类可以在训练过程中联合优化。采样层利用ARS模式的物理结构,学习信息丰富的纬度区域,从而减少采样搜索空间并提高收敛性。评估结果表明,所提方法在不同噪声条件下实现了高且稳定的缺陷级分类性能。使用完整ARS图像,模型在五级缺陷分类中达到94.2%的准确率,在区分缺陷与非缺陷纳米表面时达到98.6%的准确率。所提采样模型在使用多达90%更少的角度采样点时,性能与全图像相当。即使采样点减少99.7%,分类准确率下降不到10个百分点。为了进一步改善有限数据下的训练,我们还研究了基于GAN的增强方法,并使用GAN生成的数据进行模型预训练。增强数据使得仅需少量微调轮次即可快速收敛。

英文摘要

Reliable inspection of nanosurfaces is essential to ensure the quality of nanostructure manufacturing. Angle-resolved scatterometry provides a non-invasive inspection method that can be used in-line but often suffers from long acquisition times due to dense angular sampling. This paper addresses the data acquisition challenge by proposing an end-to-end compressed learning framework for 5-level vacancy deficiency detection in zinc oxide nanograss using ARS images. The proposed framework integrates a learnable latitude-based sampling layer with a convolutional neural network, allowing sampling and classification to be jointly optimized during training. The sampling layer exploits the physical structure of ARS patterns and learns informative latitudinal regions, which reduces the sampling search space and improves convergence. Evaluation results show that the proposed approach achieves high and stable deficiency-level classification performance under different noise conditions. Using full ARS images, the model achieves 94.2% accuracy for five-level deficiency classification and 98.6% accuracy for separating deficient from non-deficient nanosurfaces. The proposed sampling model matches full-image performance while using up to 90% fewer angular sampling points. Even when sampling points are reduced by 99.7%, the classification accuracy decreases by less than 10 percentage points. To further improve training with limited data, we also studied a GAN-based augmentation approach and used GAN-generated data for model pretraining. Augmented data resulted in fast convergence within only a few fine-tuning epochs.

2606.07026 2026-06-08 eess.SP 新提交

A Novel Stripe-based RIS Optimization for UAV Communications and Sensing in Low-Altitude Wireless Networks

基于条带的可重构智能表面优化用于低空无线网络中的无人机通信与感知

Burak Ahmet Celebi, Sefa Kayraklik, Onur Salan, Ibrahim Hokelek, Ali Emre Pusane, Ali Gorcin

AI总结 提出一种低复杂度的条带式RIS相位优化框架,利用相邻元素的结构相位梯度减小搜索空间,在3D移动下增强通信可靠性并提供被动感知能力,仿真和实验验证了其高收敛速度和鲁棒性。

详情
Comments
13 Pages, 14 figures
AI中文摘要

低空无线网络(LAWN)设想了一种可重构的3D网络,能够支持关键任务的空中操作。本文提出了一种可重构智能表面(RIS)辅助的LAWN,以在变化的无线信道条件和信号阻塞下与无人机(UAV)建立可靠通信。提出了一种低复杂度的条带式RIS相移优化框架,以同时增强通信可靠性并为3D移动下的UAV跟踪提供被动感知能力。与高复杂度的优化方法不同,所提方法利用RIS相邻元素固有的结构相位梯度,显著减少了随UAV移动计算和更新RIS配置的搜索空间。分析和仿真结果表明,所提框架在收敛速度和计算效率上优于传统基准,即使在存在相位估计误差和低信噪比(SNR)的情况下,也能保持稳健的高SNR连接。此外,在室外校园环境中使用真实RIS原型进行了测量实验,以证明所提方法的实际可行性。

英文摘要

Low-altitude wireless networks (LAWN) envision a reconfigurable 3D network capable of supporting mission-critical aerial operations. This paper presents a reconfigurable intelligent surface (RIS)-assisted LAWN to establish a reliable communication with an unmanned aerial vehicle (UAV) across varying wireless channel conditions and signal blockages. A low complexity stripe-based RIS phase shift optimization framework is proposed to simultaneously enhance communication reliability and provide passive sensing capability for UAV tracking under 3D mobility. Unlike high-complexity optimization approaches, the proposed method leverages the inherent structural phase-gradient of the RIS adjacent elements to significantly reduce the search space for calculating and updating the RIS configuration as the UAV moves. The analysis and simulation results demonstrate that the proposed framework outperforms conventional benchmarks in convergence speed and computational efficiency, while maintaining robust, high signal-to-noise-ratio (SNR) connectivity even in the presence of phase estimation errors and low SNR regimes. In addition, the measurement experiments using a real RIS prototype in an outdoor campus environment are performed to demonstrate the practical viability of the proposed approach.

2606.06983 2026-06-08 eess.IV cs.AI cs.CV 新提交

DaX: Learning General Pathology Representations Across Scales

DaX: 跨尺度的通用病理学表示学习

Bokai Zhao, Yiyang Zhang, Long Bai, Tai Ma, Hanqing Chao, Minfeng Xu

AI总结 提出病理视觉基础模型DaX,通过改进DINOv3自监督学习,结合连续放大训练、跨尺度组织视图等设计,在44个公开数据集的161项临床任务上取得最佳平均性能。

详情
AI中文摘要

计算病理学需要能够跨不同临床终点迁移且对放大倍数、染色、扫描仪类型、切片制备和输入分辨率变化保持鲁棒的视觉表示。我们提出DaX,一个病理视觉基础模型,它将DINOv3风格的自监督学习适应到全切片组织病理学。DaX从自然图像DINOv3权重初始化,并融合了连续放大训练、跨尺度组织视图、方向无关和采集鲁棒的数据增强、多输入尺寸训练以及Gram锚定的密集一致性。这些设计旨在连接局部细胞形态与全局组织结构,同时稳定跨输入尺度的密集token级表示。我们进一步构建了一个WSI级基准,包含来自44个公共数据集的161项临床有意义任务,涵盖28,182名患者和34,394张切片,跨越四个临床领域和九个任务类别。所有模型在固定的患者级交叉验证协议下进行评估,并采用折叠级统计排名,从而实现可重复的比较,对分割依赖的变异性不敏感。在该基准上,DaX在任务中取得了最高的平均性能,并持续获得强大的任务级排名分数,其增益涵盖诊断病理学、生物标志物和分子谱分析、组织/标本背景以及风险、反应和预后。这些结果支持DaX作为计算病理学的可迁移视觉编码器,并为未来的病理基础模型提供了标准化的评估框架。项目页面:此https URL。

英文摘要

Computational pathology requires visual representations that transfer across diverse clinical endpoints and remain robust to variation in magnification, staining, scanner type, slide preparation, and input resolution. We present DaX, a pathology vision foundation model that adapts DINOv3-style self-supervised learning to whole-slide histopathology. DaX is initialized from natural-image DINOv3 weights and incorporates continuous magnification training, cross-scale tissue views, orientation-agnostic and acquisition-robust augmentation, multi-input-size training, and Gram-anchored dense consistency. These designs aim to connect local cellular morphology with global tissue architecture while stabilizing dense token-level representations across input scales. We further construct a WSI-level benchmark comprising 161 clinically meaningful tasks from 44 public datasets, covering 28,182 patients and 34,394 slides across four clinical domains and nine task categories. All models are evaluated under a fixed patient-level cross-validation protocol with fold-level statistical ranking, enabling reproducible comparisons that are less sensitive to split-dependent variation. Across this benchmark, DaX achieves the highest mean performance across tasks and consistently strong task-level ranking scores, with gains spanning diagnostic pathology, biomarker and molecular profiling, tissue/specimen context, and risk, response, and prognosis. These results support DaX as a transferable visual encoder for computational pathology and provide a standardized evaluation framework for future pathology foundation models. Project page: https://alibaba-damo-academy.github.io/DaX/benchboard/.

2606.06962 2026-06-08 eess.AS 新提交

FSC-Net: Integrating Fast Fourier Convolutions and Progressive Learning for Speech Bandwidth Extension

FSC-Net:融合快速傅里叶卷积与渐进学习的语音带宽扩展

Xinan Chen, Xiaobin Rong, Qinwen Hu, Kai Chen, Jing Lu

AI总结 提出FSC-Net,通过集成快速傅里叶卷积和频率渐进学习,高效建模跨频段谐波依赖,实现窄带到宽带语音的高保真重建,在VCTK 4kHz-48kHz任务上以1.54M参数取得领先的LSD和PESQ分数。

详情
Comments
5 pages, 2 figures
AI中文摘要

语音带宽扩展(BWE)旨在从窄带输入重建高保真宽带音频。尽管近期方法取得了显著进展,但它们通常难以重建真实的高频相位和谐波结构,导致感知伪影。本文提出FSC-Net(全频谱上下文网络),一种参数高效的架构,旨在显式建模跨频段谐波依赖。通过将快速傅里叶卷积(FFCs)集成到复频谱映射框架中,FSC-Net将其感受野扩展到整个频谱,有效捕获长程频率交互。为解决高频生成的不适定性,我们新颖的频率渐进学习课程引导网络从粗到细地重建频谱细节。在VCTK和未见过的EARS数据集上的实验结果表明,FSC-Net提供了持续强劲的重建质量和泛化能力,尤其在具有挑战性的VCTK 4 kHz至48 kHz任务中。与规模更大的基线相比,我们的模型在保持高度紧凑的参数规模(1.54 M)的同时,取得了领先的LSD和PESQ分数。

英文摘要

Speech bandwidth extension (BWE) aims to reconstruct high-fidelity wideband audio from narrowband inputs. While recent approaches have made significant progress, they often struggle to reconstruct realistic high-frequency phase and harmonic structures, leading to perceptual artifacts. In this paper, we propose FSC-Net (Full-Spectrum Context Network), a parameter-efficient architecture designed to explicitly model cross-band harmonic dependencies. By integrating Fast Fourier Convolutions (FFCs) into a complex spectral mapping framework, FSC-Net expands its receptive field to the entire spectrum, capturing long-range frequency interactions effectively. To address the ill-posed nature of high-frequency generation, our novel frequency-progressive learning curriculum guides the network to reconstruct spectral details from coarse to fine. Experimental results on the VCTK and unseen EARS datasets demonstrate that FSC-Net delivers consistently strong reconstruction quality and generalization, particularly in the challenging VCTK 4 kHz-to-48 kHz task. Compared to scaled-up baselines, our model attains leading LSD and PESQ scores while maintaining a highly compact parameter footprint (1.54 M).

2606.06954 2026-06-08 eess.SP 新提交

Learn to Access and Backhaul the Sky: Multi-Scale Radio Map Guided Multi-UAV Cooperation

学会接入和回传天空:多尺度无线电地图引导的多无人机协作

Yifeng Yuan, Shijian Gao

AI总结 针对无人机群在三维场景中因用户移动和建筑遮挡导致的端到端瓶颈问题,提出多尺度无线电地图引导(MRMG)框架,结合全局、局部和链路级地图信息,通过多智能体强化学习实现无人机移动、下一跳选择和功率控制的联合优化,显著提升网络吞吐量和边缘用户速率。

详情
Comments
6 pages, 4 figures
AI中文摘要

受新兴低空经济的驱动,无人机群提供了灵活的集成空地接入和回传。然而,由于用户移动和建筑遮挡在这些三维场景中的相互依赖动态性,提供无缝连接是困难的。这些因素在端到端路径中造成快速变化的瓶颈。此外,联合控制的多维性质限制了传统启发式方法的有效性。为了应对这些挑战,提出了一个多尺度无线电地图引导(MRMG)框架。MRMG框架通过整合三个不同层次的无线电信息来处理异构动态:全局地图提供区域覆盖洞察,局部地图捕获邻域尺度服务条件,链路级地图表征高分辨率信道特征。这种设计有效地解耦了宏观移动和微观链路自适应。为了实现长期性能提升,一个多智能体强化学习(MARL)控制器学习无人机移动、下一跳选择和发射功率控制的协作策略。仿真结果表明,MRMG框架不仅提高了网络吞吐量,还显著增强了小区边缘服务,几乎将第5百分位用户速率翻倍。

英文摘要

Driven by the emerging low-altitude economy, uncrewed aerial vehicle (UAV) swarms offer flexible integrated air-ground access and backhaul. However, providing seamless connectivity is difficult due to the interdependent dynamics of user mobility and building blockages in these 3D scenarios. These factors create rapidly shifting bottlenecks in end-to-end paths. Furthermore, the multi-dimensional nature of joint control limits the effectiveness of traditional heuristics. To address these challenges, a \textbf{\underline{M}}ulti-Scale \textbf{\underline{R}}adio \textbf{\underline{M}}ap-\textbf{\underline{G}}uided (MRMG) framework is proposed. The MRMG framework handles heterogeneous dynamics by integrating three distinct levels of radio information: global-level maps provide regional coverage insights, local-level maps capture neighborhood-scale service conditions, and link-level maps characterize high-resolution channel features. This design effectively decouples macro-movement from micro-link adaptation. To yield long-term performance improvements, A multi-agent reinforcement learning (MARL) controller learns cooperative policies for UAV movement, next-hop selection, and transmit-power control. Simulation results show that the MRMG framework not only improves network throughput but also significantly bolsters cell-edge service, nearly doubling the 5th-percentile user rate.

2606.06933 2026-06-08 eess.IV 新提交

A 3D Formulation of the Extended Phaseless Rytov Approximation

扩展无相位Rytov近似的三维公式

Wanqin Ma, Zan Li, Amartansh Dubey, Alikhan Umirbayev, Yijun Chen, Junhui Rao, Ross Murch

AI总结 提出扩展三维无相位Rytov近似(x3DPRA),将二维无相位RF成像方法扩展到三维,保持实现简单性,实现体积成像,并通过仿真验证其定位、形状重建和材料衰减估计性能。

详情
Comments
12 pages, 6 figures, In processing for IEEE Trans
AI中文摘要

扩展无相位Rytov近似(xPRA)是一种最近提出的无设备射频成像技术,仅使用无相位测量(如接收信号强度RSS)即可提供成像区域的高分辨率重建。由于其无相位公式,可以利用现有无线通信基础设施直接实现。它也优于著名的无设备无相位RF成像方法,如无线电断层成像(RTI)。xPRA(和RTI)中使用的线性无相位公式使得这些方法可能对下一代无线网络中的集成感知与通信(ISAC)系统有用,因为它们不需要宽带宽。然而,到目前为止,xPRA和RTI主要是在二维(2D)中提出的。本文介绍了xPRA的三维扩展,我们称之为扩展三维无相位Rytov近似(x3DPRA)。我们方法的新颖之处在于,它保留了RTI和xPRA的直接实现优势,同时实现了体积(3D)成像。仿真结果表明,x3DPRA提供了良好的位置和形状估计,并且还可以重建物体材料衰减。我们提出了三维公式,通过与二维模型比较进行验证,并报告了展示其性能的仿真结果。

英文摘要

The extended Phaseless Rytov Approximation (xPRA) is a recently proposed device-free RF imaging technique that provides high-resolution reconstructions of the imaging region using only phaseless measurements, such as received signal strength (RSS). Because of its phaseless formulation, it can be implemented straightforwardly using existing wireless commu?nication infrastructure. It also outperforms well-known device?free phaseless RF imaging methods such as Radio Tomographic Imaging (RTI). The linear phaseless formulation used in xPRA(and RTI) makes these methods potentially useful for integrated sensing and communication (ISAC) systems in next generation wireless networks since they do not require wide bandwidths. However, so far, both xPRA and RTI have primarily been formulated in two dimensions (2D). This paper introduces a 3D extension of xPRA, which we call the extended three-dimensional phaseless Rytov approximation (x3DPRA). The novelty of our approach is that it preserves the straightforward implementation advantages of RTI and xPRA while enabling volumetric (3D) imaging. Simulation results show that x3DPRA provides good estimates of location and shape and can also reconstruct object material attenuation. We present the 3D formulation, validate it with a 2D model comparison, and report simulation results demonstrating its performance.

2606.06907 2026-06-08 eess.AS cs.AI cs.SD 新提交

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

SpectCount: 通过合成信号进行频谱时间计数改进大型音频语言模型

Seonuk Kim, Yonghyeon Jun, Ju Yeon Kang, Jimin Hong, Yoonhyeong Lee, Nam Soo Kim

AI总结 针对大型音频语言模型在频谱时间感知上的弱点,提出SpectCount方法,利用动态生成的完全合成音频信号进行数据高效微调,无需真实音频或标注,显著提升多种听觉基准性能。

详情
Comments
5 pages, 5 figures
AI中文摘要

大型音频语言模型(LALMs)通过音频编码器和大规模音频数据扩展了大型语言模型。然而,高质量标注音频数据的稀缺性仍然是扩展的根本瓶颈。通过探测信号可检测性分析,我们识别出基础LALM在细粒度频谱时间感知上的弱点。为了解决这些挑战,我们提出频谱时间计数(SpectCount),一种基于动态生成的完全合成音频信号的数据高效微调方法,无需依赖真实世界音频、标注或预训练生成模型。SpectCount不仅解决了观察到的弱点,还在微调期间未见的声音、音乐和语音等多种听觉基准上提升了性能。这些结果表明,针对弱点的合成信号为LALMs增强听觉理解能力提供了一条数据高效的途径。

英文摘要

Large audio language models (LALMs) extend large language models with an audio encoder and large-scale audio data. However, the scarcity of high-quality annotated audio data remains a fundamental bottleneck for scaling. Through probing signal detectability analysis, we identify fine-grained spectrotemporal perceptual weaknesses in a foundation LALM. To address these challenges, we propose Spectrotemporal Counting (SpectCount), a data-efficient fine-tuning approach based on fully synthetic audio signals generated on-the-fly, without relying on real-world audio, annotations, or pretrained generative models. SpectCount not only resolves the observed weaknesses but also improves performance on diverse auditory benchmarks spanning sound, music, and speech, unseen during fine-tuning. These results suggest that weakness-targeted synthetic signals provide a data-efficient path toward enhanced auditory understanding capabilities in LALMs.

2606.06847 2026-06-08 eess.IV cs.CV 新提交

Physics-Driven Semantic Scattering Structure Understanding of Aircraft Target in SAR Images

SAR图像中飞机目标的物理驱动语义散射结构理解

Yifei Yin, Xiaogang Yu, Hao Shi, Liang Chen, Wei Li

AI总结 针对SAR图像中飞机目标散射中心表示不稳定、弱散射部件缺失的问题,提出物理驱动框架S3U-SAR,通过定义语义散射关键点并利用多维物理先验约束,实现完整拓扑结构重建,在基准数据集上取得最优性能。

详情
AI中文摘要

合成孔径雷达(SAR)因其全天时、全天候观测能力,已成为目标解译不可或缺的手段。在SAR目标解译中,电磁散射信息提供了超越视觉纹理的物理基础线索,并被广泛用于目标解译。然而,现有方法仍以局部散射中心表示为主。这种无序且与部件无关的表示对飞机目标极不稳定。因此,物理存在的弱散射响应部件常被遗漏,导致重建的拓扑结构不完整。为解决这一局限,我们建立了语义散射结构理解作为SAR飞机解译的新范式。定义语义散射关键点以将局部电磁响应与物理上有意义的飞机部件关联,同时引入可见性感知属性以保留弱可观测但物理存在的部件。关键点进一步组织为稳定的语义散射结构。基于此,我们提出S3U-SAR,一个物理驱动框架,用于定位语义散射关键点并构建由多维物理先验(包括散射异质性、刚体拓扑、散斑不确定性)约束的完整表示。进一步引入置信门控联合监督策略以缓解优化冲突。我们构建了KP-SAR-Aircraft-1.0,首个用于语义散射结构理解的细粒度基准。大量实验表明,S3U-SAR相比基线取得了最佳性能。跨类别和跨数据集评估进一步验证了其鲁棒性和可迁移性。

英文摘要

Synthetic aperture radar (SAR) has become indispensable for target interpretation owing to its all-day and all-weather observation capability. In SAR target interpretation, electromagnetic scattering information provides a physically grounded cue beyond visual texture and has been widely exploited for target interpretation. However, existing methods remain dominated by local scattering center representations. Such unordered and component-agnostic representations are highly unstable for aircraft targets. As a result, physically existing components with weak scattering responses are often missed, resulting in the incomplete reconstructed topology structure. To address this limitation, we establish Semantic Scattering Structure Understanding as a new paradigm for SAR aircraft interpretation. Semantic scattering keypoints are defined to associate local electromagnetic responses with physically meaningful aircraft components, while visibility-aware attributes are introduced to retain weakly observable yet physically existed components. The keypoints are further organized into a stable semantic scattering structure. Build upon this, we propose S3U-SAR, a physics-driven framework to localize semantic scattering keypoints and construct the complete representation constrained by multi-dimensional physical priors containing scattering heterogeneity, rigid-body topology, speckle uncertainty. A confidence-gated joint supervision strategy is further introduced to alleviate optimization conflicts. We construct KP-SAR-Aircraft-1.0, the first fine-grained benchmark for semantic scattering structure understanding. Extensive experiments demonstrate that S3U-SAR achieves the best performance compared with baselines. Cross-category and cross-dataset evaluations further verify its robustness and transferability.

2606.06846 2026-06-08 eess.SP 新提交

Variable-Length Finite-Rate CSI Feedback With Generative Priors

变长有限速率CSI反馈与生成先验

Yangxuan Cheng, Fanyang Meng, Jian Zou, Jiacheng Xie, Zhongqiang Zhang, Ye Wang, Yongsheng Liang

AI总结 提出CsiCoGen,一种基于生成扩散模型的变长CSI反馈结构,通过可迁移码本实现灵活序列长度和量化精度,无需联合训练,在COST2100上达到高码率下室内-31 dB、室外-20 dB NMSE。

详情
AI中文摘要

本文从结构角度研究了变长有限速率CSI反馈,并提出了CsiCoGen,一种新颖的生成式反馈结构,具有无需联合训练的可迁移码本机制。UE将$H_0$映射为有序的码本索引序列,而BS利用共享的去噪先验从接收到的任意部分反馈索引序列中递归恢复CSI。这通过码本大小实现了反馈序列长度和每步量化精度的灵活控制。CsiCoGen不需要联合训练特定任务的反馈编码器或码本与重构器,且相同的在线结构可以搭配不同的预训练去噪器。在本文中,我们使用生成扩散模型实例化解码器。在COST2100上的仿真结果表明,与代表性基线相比,CsiCoGen在速率-NMSE和速率-$\ ho$权衡上表现优异,在高码率下达到约-31 dB室内NMSE和-20 dB室外NMSE,同时展示了可扩展的解码复杂度和可调节的每步量化精度。

英文摘要

This letter studies variable-length finite-rate CSI feedback from a structural perspective and proposes CsiCoGen, a novel generative feedback structure with a transferable codebook mechanism without joint training. The UE maps $H_0$ into an ordered sequence of codebook indices, while the BS recursively recovers CSI from any received partial sequence of feedback indices using a shared denoising prior. This enables flexible control of feedback sequence length and per-step quantization precision through codebook size. CsiCoGen does not require jointly training a task-specific feedback encoder or codebook with the reconstructor, and the same online structure can be paired with different pretrained denoisers. In this work, we instantiate the decoder with a generative diffusion model. Simulation results on COST2100 show favorable rate-NMSE and rate-$ρ$ tradeoffs against representative baselines, with CsiCoGen reaching about -31 dB indoor NMSE and -20 dB outdoor NMSE in the high-rate regime while demonstrating scalable decoding complexity and adjustable per-step quantization precision.

2606.06837 2026-06-08 eess.AS cs.LG 新提交

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

SEAM:面向面试防护栏的脚本化与自发语音的快捷方式感知实时检测

Vsevolod, Kovalev, Pranay Manocha

AI总结 提出SEAM框架,通过统一预处理、接缝感知采样、非语音增强和紧凑DistilHuBERT骨干,在8秒窗口下实现0.971 ROC-AUC,并揭示快捷方式学习问题。

详情
Comments
Accepted to Interspeech 2026
AI中文摘要

脚本化与自发语音检测对面试防护栏具有吸引力,但基准性能可能因与语料库身份、信道条件和录音伪影相关的快捷方式(而非说话风格本身)而膨胀。我们提出SEAM,一个用于实时脚本化检测的快捷方式感知框架,结合了统一预处理、接缝感知采样、非语音增强和紧凑的DistilHuBERT骨干。使用8秒窗口,该模型在外部面试领域评估集上达到0.971 ± 0.004的ROC-AUC。移除快捷方式预防组件可改善内部留出指标,但急剧降低外部性能,表明存在快捷方式学习。训练后量化将模型占用减少至41.8MB,且外部性能损失很小。结果表明,鲁棒的实时脚本化检测不仅依赖于骨干网络,还依赖于快捷方式感知的数据设计和评估。我们发布代码和模型检查点。

英文摘要

Scripted vs spontaneous speech detection is appealing for interview guardrails, but benchmark performance can be inflated by shortcuts tied to corpus identity, channel conditions, and recording artifacts rather than speaking style itself. We present SEAM, a shortcut-aware framework for real-time scriptedness detection that combines uniform preprocessing, seam-aware sampling, non-speech augmentation, and a compact DistilHuBERT backbone. With 8s windows, the model achieves 0.971 +- 0.004 ROC-AUC on an external interview-domain evaluation set. Removing the shortcut-prevention components improves internal held-out metrics but sharply reduces external performance, indicating shortcut learning. Post-training quantization reduces the model footprint to 41.8MB with little loss in external performance. The results demonstrate that robust real-time scriptedness detection depends not only on the backbone, but on shortcut-aware data design and evaluation. We release code and model checkpoints.

2606.06795 2026-06-08 eess.AS cs.SD 新提交

BiEAR: A Human Auditory-Inspired Adaptive Binaural Front-end for Multi-Speaker Localisation and Distance Estimation

BiEAR: 一种受人类听觉启发的自适应双耳前端,用于多说话人定位和距离估计

Hanyu Meng, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Qiquan Zhang, Haizhou Li

AI总结 提出受人类听觉启发的自适应双耳前端BiEAR,通过神经控制器动态调整滤波器组频率选择性,提升多说话人定位和距离估计的准确性与鲁棒性。

详情
Comments
Accepted to INTERSPEECH 2026
AI中文摘要

我们提出BiEAR,一种受人类听觉启发的自适应双耳前端,用于多说话人定位和距离估计。受人类听觉中内侧橄榄耳蜗(MOC)反馈的启发,BiEAR使用神经控制器在推理过程中自适应调整双耳听觉滤波器组的频率选择性。这为双耳产生时频自适应表示,使模型能够响应变化的声学条件。我们在消声和真实房间环境中评估了BiEAR在多说话人定位和距离估计上的性能。结果表明,与常用的固定双耳前端相比,自适应前端提高了定位准确性以及对未见说话人和房间的鲁棒性。对学习到的滤波器自适应的可视化和分析表明,BiEAR随时间强调信息丰富的频带。这些发现表明,自适应的、受生物启发的双耳前端可以改善机器在复杂声学场景中的听觉鲁棒性。

英文摘要

We present BiEAR, a human auditory-inspired adaptive binaural front-end for multi-speaker localisation and distance estimation. Inspired by medial olivocochlear (MOC) feedback in human hearing, BiEAR uses a neural controller to adaptively adjust the frequency selectivity of a binaural auditory filterbank during inference. This yields time-frequency adaptive representations for ears, enabling the model to respond to changing acoustic conditions. We evaluate BiEAR on multi-speaker localisation and distance estimation in anechoic and real-room environments. Results show that the adaptive front-end improves localisation accuracy and robustness to unseen speakers and rooms compared with commonly used fixed binaural front-ends. Visualisation and analysis of learned filter adaptations show that BiEAR emphasises informative frequency bands over time. These findings suggest that adaptive, biologically inspired binaural front-ends can improve machine hearing robustness in complex acoustic scenes.

2606.06792 2026-06-08 eess.SP 新提交

Copula Function Parameter Regions in Analyzing Wireless Communications Performances

无线通信性能分析中的Copula函数参数区域

Mona Mohsenzadeh, Saeid Pakravan, Ghosheh Abed Hodtani

AI总结 提出Copula依赖参数区域概念,通过两用户MAC信道中FGM Copula的示例,从通信和概率角度推导参数区域,表明实际需求可显著缩小经典可容许区间。

详情
AI中文摘要

Copula函数已广泛应用于无线通信分析中,用于建模依赖结构和评估系统性能。然而,现有研究通常用Copula依赖参数表达性能指标,而未明确表征其可容许区域。本文介绍了Copula依赖参数区域的概念,并研究了其在无线通信中的重要性。考虑一个由双变量Farlie--Gumbel--Morgenstern (FGM) Copula建模的相关瑞利衰落的两用户无线多址接入信道 (MAC),从中断概率和皮尔逊相关系数 (PCC) 约束出发,从通信理论和概率角度推导出显式参数区域。结果表明,实际通信和统计要求可以显著缩小经典的Copula可容许区间,使得一些理论上可容许的依赖结构变得不可行。数值示例说明了所提出的概念及其实际意义。

英文摘要

Copula functions have been widely employed in wireless communication analysis to model dependence structures and evaluate system performance. However, existing studies generally express performance metrics in terms of copula dependence parameters without explicitly characterizing their admissible regions. This letter introduces the concept of copula dependence parameter regions and investigates its significance in wireless communications. Considering a two-user wireless multiple access channel (MAC) with correlated Rayleigh fading modeled by the bivariate Farlie--Gumbel--Morgenstern (FGM) copula, explicit parameter regions are derived from communication-theoretic and probabilistic perspectives using outage probability and Pearson correlation coefficient (PCC) constraints. The results show that practical communication and statistical requirements can significantly shrink the classical copula admissible interval, rendering some theoretically admissible dependence structures infeasible. Numerical examples illustrate the proposed concept and its practical implications.

2606.06732 2026-06-08 eess.SP 新提交

Angular Sector-Based Sparse Array Design for Adaptive Beamforming Using Deep Learning

基于深度学习的角扇区稀疏阵列设计用于自适应波束成形

John Kobak, Ethan Atiyeh, Syed A Hamza

AI总结 提出一种基于深度学习的稀疏阵列设计框架,通过角扇区分类策略和CNN/ResNet50实现高精度阵列选择,SINR偏差低于1%。

详情
Comments
Presented at the IEEE Radar Conference 2026
AI中文摘要

高效的稀疏阵列可重构性对于动态射频环境中的认知感知至关重要,其中快速干扰变化需要适应性和稳定性。本文提出一个框架,用于设计在宽角扇区上优化的稀疏阵列,实现接近最优的波束成形,从而在干扰角度范围内最大化信号与干扰加噪声比(SINR)。计算候选配置的完整数据相关矩阵,并应用基于角扇区的类别缩减策略合并由相同配置主导的相邻扇区,得到56个代表性类别。通过受控的上采样和下采样生成四个数据集变体,包括高样本数和低样本数、平衡和不平衡数据集,以系统评估数据集大小和类别分布对神经网络性能的影响。使用这些数据集训练和评估轻量级卷积神经网络(CNN)和更深的ResNet 50架构。结果表明,分类准确率高,ResNet 50达到97.3%,而大多数类别的SINR偏差保持在1%以下,即使对于接近法线的挑战性干扰角度,偏差也低于5%。所提出的方法实现了鲁棒的稀疏阵列选择,保持了强大的SINR性能,减少了不必要的重配置,并为实时认知感知和自适应干扰缓解提供了有效框架。

英文摘要

Efficient sparse array reconfigurability is essential for cognitive sensing in dynamic radio frequency environments, where rapid interference variations require both adaptability and stability. This work presents a framework for designing sparse arrays optimized over broad angular sectors, enabling near-optimal beamforming that maximizes the signal-to-interference-plus-noise ratio (SINR) across a range of interferer angles. Full data correlation matrices are computed for candidate configurations, and an angular-sector-based class reduction strategy is applied to merge adjacent sectors dominated by the same configuration, resulting in 56 representative classes. Controlled up- and down-sampling produce four dataset variants involving, high and low sample count, balanced and unbalanced datasets, to systematically evaluate the effects of dataset size and class distribution on neural network performance. A lightweight convolutional neural network (CNN) and a deeper ResNet 50 architecture are trained and evaluated using these datasets. Results demonstrate high classification accuracy, with ResNet 50 achieving up to 97.3%, while SINR deviations remain below 1% for most classes and below 5% even for challenging interference angles near broadside. The proposed approach enables robust sparse array selection, maintains strong SINR performance, reduces unnecessary reconfigurations, and provides an effective framework for real-time cognitive sensing and adaptive interference mitigation.

2606.06725 2026-06-08 eess.IV cs.CV 新提交

Compute-Optimal Network Design for Echocardiography Myocardial Segmentation and Perfusion Quantification using Neural Scaling Laws

基于神经缩放定律的超声心动图心肌分割与灌注量化的计算最优网络设计

Clara Rodrigo González, Matthieu Toulemonde, Lasha Gvinianidze, Cameron A. B. Smith, Oscar Bates, Roxy Senior, Fu Siong Ng, Meng-Xing Tang

AI总结 应用神经缩放定律预测心肌分割性能,在CAMUS和CEUS数据集上确定最优网络大小,实现参数减少240倍且性能达最优,自动分割在心肌灌注量化中与资深心脏病专家等效。

详情
Comments
15 pages, 4 figures, 5 tables, journal
AI中文摘要

使用对比增强超声进行心肌灌注量化提供了一种床旁非电离替代核成像模态的方法。然而,其临床采用受到耗时的手动标注的限制。由于域内训练数据匮乏,自动分割已被证明具有挑战性。我们应用当前用于优化大数据集上大型语言模型的策略,将神经缩放定律应用于预测心肌分割的网络性能。我们在数据子集上外推性能,以确定CAMUS超声心动图数据集和25名患者的对比增强超声(CEUS)数据集上的最优网络大小。最后,通过将最终心肌灌注参数与资深心脏病专家获得的参数进行比较,验证了我们模型的临床实用性。基于缩放定律的外推能够预测完整数据集大小下的测试损失,使我们能够选择两个网络,在CAMUS上以240倍的参数减少获得最先进性能。我们观察到缩放定律的梯度从CAMUS迁移到CEUS数据集,但预测损失存在偏差。自动分割的掩膜在心肌灌注量化中与资深心脏病专家表现相当。这些结果确立了神经缩放定律作为小成像数据集上数据驱动计算最优模型设计的实用工具。

英文摘要

Myocardial perfusion quantification using contrast-enhanced ultrasound offers a bedside non-ionizing alternative to nuclear imaging modalities. However, its clinical adoption is hindered by time-consuming manual labelling. Automated segmentation has proved challenging due to a paucity of in-domain training data. Adapting strategies currently used to optimise large language models for large datasets, we apply neural scaling laws to predict network performance for myocardial segmentation. We extrapolate performance on subsets of the data to determine optimal network size on the CAMUS echocardiography dataset and a 25-patient contrast-enhanced ultrasound (CEUS) dataset. Finally, we validate the clinical utility of our models by comparing the final myocardial perfusion parameters with those obtained by a senior cardiologist. Extrapolation based on the scaling law is predictive of test loss at the full dataset size, allowing us to select two networks that obtained state-of-the-art performance on CAMUS with a 240-fold reduction in parameter count. We observe the gradient of the scaling law transfers from CAMUS to the CEUS dataset with a bias in the predicted losses. The automatically segmented masks perform equivalently to a senior cardiologist in myocardial perfusion quantification. These results establish neural scaling laws as a practical tool for data-driven compute-optimal model design for small imaging datasets.

2606.06723 2026-06-08 eess.SP 新提交

Deep Learning Based Sparse Array Design with Pre-Steering for Adaptive Beamforming

基于预导向的深度学习稀疏阵列设计用于自适应波束形成

Ian Straub, Syed A Hamza

AI总结 提出利用卷积神经网络学习稀疏阵列配置,通过预导向策略避免针对每个源角度重新训练,实现快速重配置并最大化信干噪比,在动态环境中达到90%以上测试精度。

详情
Comments
Accepted for presentation at the IEEE Radar Conference 2026
AI中文摘要

本文研究了使用卷积神经网络(CNN)学习稀疏阵列配置,在变化的源和干扰角度下实现接近最优的波束形成。与传统的或基于凸优化的算法不同,所提出的深度学习方法能够在高度动态的传播环境中快速重配置稀疏阵列。本文考虑单个期望源和单个干扰信号位于任意角度,分析了固定和变化期望源方向两种情况。为避免针对每个可能的源角度重新训练,引入了阵列预导向策略,即网络仅在侧射方向训练,而测试输入被预导向以对齐侧射方向。为考虑实际不完美性,研究了预导向误差的影响,并采用了鲁棒的误差增强训练。该方法在训练过程中系统地引入小的、结构化的预导向扰动,使网络即使在角度不确定下也能保持高分类精度并最大化信干噪比(SINR)。结果表明,所提出的方法在广泛的源和干扰角度范围内实现了超过90%的测试精度,突显了其在动态环境中实时、鲁棒稀疏阵列配置的潜力。

英文摘要

This paper investigates the use of convolutional neural networks (CNNs) for learning sparse array configurations that achieve near-optimal beamforming under varying source and interference angles. Unlike conventional or convex optimization based algorithms, the proposed deep learning approach enables rapid reconfiguration of sparse arrays in highly dynamic propagation environments. The paper considers a single desired source and a single interference signal at arbitrary angles, analyzing scenarios with both fixed and varying desired source directions. To avoid retraining for each possible source angle, an array pre-steering strategy is introduced, whereby the network is trained only at broadside, while test inputs are pre-steered to align with the broadside direction. To account for practical imperfections, the effect of pre-steering errors is examined, and a robust error-augmented training is adopted. The approach systematically incorporates small, structured pre-steering perturbations during training, enabling the network to maintain high classification accuracy and maximize the signal-to-interference-plus-noise ratio (SINR) even under angular uncertainty. The results demonstrate that the proposed method achieves over 90% test accuracy across wide ranges of source and interference angles, highlighting its potential for real-time, robust sparse array configuration in dynamic environments.

2606.06672 2026-06-08 eess.SP 新提交

Variational Bayes Estimation for Affine-Precoded Superimposed Pilots in Partially Connected Dual-Wideband Tera-Hertz MU-MIMO Systems

部分连接双宽带太赫兹MU-MIMO系统中仿射预编码叠加导频的变分贝叶斯估计

Abhisha Garg, Suraj Srivastava, Aditya K. Jagannatham

AI总结 针对部分连接双宽带太赫兹MU-MIMO系统,提出两种仿射预编码模型,利用叠加导频和变分贝叶斯推断实现联合信道估计与稀疏结构学习,并进行了性能权衡分析。

详情
AI中文摘要

本工作构思了两种基于仿射预编码的系统模型:联合信道估计的公共预编码(CP-JCE)和用于解耦信道估计的用户特定预编码(USPDCE)。考虑到受双宽带影响的部分连接架构,我们通过结合吸收、反射和自由空间损耗,严格建模了每个用户对应子阵列的太赫兹(THz)多输入多输出(MIMO)信道。接下来,为了解决传统基于导频的信道估计带来的显著带宽开销,我们采用了叠加导频。在此基础上,我们构建了一个结构化稀疏信道模型,并开发了一种变分贝叶斯推断算法,该算法通过超参数推断联合估计信道系数并学习底层稀疏结构,从而在严重的模型不确定性下实现鲁棒且高精度的叠加导频信道估计。最后,我们比较了两种系统的结果,并提供了它们之间的权衡分析。

英文摘要

This work conceives two affine precoding based system models, common precoding with joint channel estimation (CP-JCE) and user-specific precoding for decoupled channel estimation (USPDCE). Considering a dual-wideband effected partially connected architecture, we rigorously model the terahertz (THz) multiple input multiple output (MIMO) channel for each subarray corresponding to each user by incorporating the absorption, reflection, and freespace losses. Next, to address the significant bandwidth overhead associated with conventional pilot-based channel estimation, we employ superimposed pilots. Building on this, we formulate a structured sparse channel model and develop a variational Bayesian inference algorithm that jointly estimates the channel coefficients and learns the underlying sparsity structure through hyperparameter inference, thereby enabling robust and high-precision superimposed pilotbased channel estimation under severe model uncertainty. Lastly, we compare our results for both systems and provide a trade-off analysis between them.

2606.06640 2026-06-08 eess.SP 新提交

SEMIKHORN: Globally balanced affinities for mmWave Localization in MU mMIMO systems

SEMIKHORN:用于MU mMIMO系统中毫米波定位的全局平衡亲和度

Abhisha Garg, Raghav Shukla, Suraj Srivastava, Aditya K. Jagannatham

AI总结 提出SEMIKHORN框架,利用t-SNEkhorn的全局平衡相似性进行半监督信道图构建,通过融合分布式基站的局部不相似矩阵实现毫米波定位,在模拟环境中以少于15%的标记样本达到6.86%的平均定位误差。

详情
AI中文摘要

本工作提出了SEMIKHORN,一种用于毫米波定位的半监督信道图(CC)框架,它利用t-SNEkhorn——t分布随机邻域嵌入(t-SNE)的双随机变体,该变体利用熵最优传输来构建成对相似性。与标准t-SNE(对每个数据点独立归一化亲和度)不同,t-SNEkhorn生成全局平衡的相似性,确保一致的邻域表示。我们考虑配备多天线的分布式基站(BS)的无线网络,每个基站从信道状态信息(CSI)构建局部不相似矩阵。然后将这些局部不相似矩阵融合以获得单个全局不相似矩阵,通过流形学习处理,将用户嵌入到几何地图上。在模拟室外环境中评估性能,并采用贝叶斯优化对框架超参数进行优化,以最小化平均定位误差(MLE)。实验结果表明,所提出的框架在半径100m的圆形区域内实现了6.86%的MLE,所需标记CSI样本少于15%。

英文摘要

This work conceives SEMIKHORN, a semisupervised channel charting (CC) framework for mmWave localization, which leverages t-SNEkhorn, a doubly stochastic variant of t-distributed Stochastic Neighbor Embedding (t-SNE) that utilizes entropic optimal transport to construct pairwise similarities. Unlike standard t-SNE, which normalizes affinities independently for each data point, t-SNEkhorn generates globally balanced similarities ensuring consistent neighborhood representation. We consider wireless networks with distributed base stations (BSs) equipped with multiple antennas, where each BS constructs a local dissimilarity matrix from the channel state information (CSI). These local dissimilarity matrices are then fused to obtain a single global dissimilarity matrix, which is processed through manifold learning to embed users onto a geometric map. The performance is evaluated in a simulated outdoor environment, and Bayesian optimization is employed on the framework hyperparameters to minimize the mean localization error (MLE). Experimental results demonstrate that the proposed framework achieves an MLE of 6.86% in a circular vicinity of radius 100m, requiring less than 15% of labeled CSI samples.

2606.06540 2026-06-08 eess.IV cs.CV 新提交

ErA: Error-Aware Deep Unrolling Network for Single Image Defocus Deblurring

ErA:用于单图像散焦去模糊的误差感知深度展开网络

Tu Vo, Chan Y. Park

AI总结 提出ErA网络,通过联合学习紧凑核基和逐像素权重,并利用增广拉格朗日展开中的误差感知项交替更新和ResUNet去噪器校正核估计误差,在多个数据集上达到最优性能。

详情
AI中文摘要

我们提出了ErA(误差感知深度展开网络),一个用于单图像散焦去模糊的端到端框架。ErA联合学习一个紧凑的核基和逐像素权重,同时增广拉格朗日展开中的一个误差感知项通过交替更新和ResUNet去噪器校正核估计误差。它在DPDD、RealDOF和RTF上达到了最先进的PSNR/SSIM,并在没有真实数据的CUHK上显示出强大的泛化能力。

英文摘要

We introduce ErA (Error-Aware Deep Unrolling Network), an end-to-end frame work for single-image defocus deblurring. ErA jointly learns a compact kerne basis and per-pixel weights, while an error-aware term in Augmented Lagrangian unrolling corrects kernel estimation errors via alternating updates and ResUNet denoisers. It achieves state-of-the-art PSNR/SSIM on DPDD, RealDOF, and RTF, and shows strong generalization on CUHK without ground truth.