arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.31579 2026-06-01 eess.SP cs.IT math.IT math.ST stat.TH

Functional Multi-Target Detection via Bispectrum Inversion

基于双谱反演的功能性多目标检测

Anna Little, Daniel Sanz-Alonso, Mikhail Sweeney, Ruiyi Yang

AI总结 针对含未知平移的多目标检测问题,提出基于自相关分析的无初始化恢复算法,通过去偏三阶经验自相关估计双谱,并利用频率推进或Kotlarski反卷积公式恢复信号,证明非渐近恢复保证。

详情
AI中文摘要

本文发展了多目标检测的功能性理论,其中从包含信号多个未知平移的单个含噪观测中恢复紧支撑信号。我们的公式允许连续、非网格平移和相关平稳高斯过程噪声,超越了先前工作中常见的离散、网格对齐、白噪声模型。我们分析了两种基于自相关分析的无初始化恢复算法;特别地,两种算法首先通过去偏三阶经验自相关估计信号的双谱。然后利用功能性频率推进方案或Kotlarski型反卷积公式从估计的双谱中恢复信号。对于两种算法,我们在无带限假设下证明了紧支撑信号的非渐近恢复保证。得到的误差界依赖于信号的光滑性和双谱估计的精度,后者由噪声特性和信号出现次数决定。数值实验验证了我们的理论,并展示了在低信噪比条件下的准确恢复。

英文摘要

This paper develops a functional theory for multi-target detection, where a compactly supported signal is recovered from a single noisy observation containing many unknown translations of the signal. Our formulation allows continuous, off-grid translations and correlated stationary Gaussian process noise, extending beyond the discrete, grid-aligned, white-noise models common in prior work. We analyze two uninitialized recovery algorithms based on autocorrelation analysis; in particular, both algorithms first estimate the signal's bispectrum via a debiased third-order empirical autocorrelation. The signal is then recovered from the estimated bispectrum using either a functional frequency marching scheme or a Kotlarski-type deconvolution formula. For both algorithms, we prove non-asymptotic recovery guarantees for compactly supported signals without bandlimiting assumptions. The resulting error bounds depend on the smoothness of the signal and the accuracy of bispectrum estimation, with the latter governed by the noise characteristics and the number of signal occurrences. Numerical experiments validate our theory and demonstrate accurate recovery in low-SNR regimes.

2605.31549 2026-06-01 cs.IT eess.SP math.IT

Microwave Linear Analog Computer (MiLAC) for Simultaneous Active and Passive Beamforming

微波线性模拟计算机(MiLAC)用于同时主动和被动波束赋形

Matteo Nerini, Bruno Clerckx

AI总结 本文提出一种双功能微波线性模拟计算机(MiLAC)框架,使其在实现主动波束赋形的同时,可作为可重构智能表面(RIS)进行被动波束赋形,并给出了最优重构策略及主动与被动速率之间的基本权衡极限。

详情
Comments
Submitted to IEEE for publication
AI中文摘要

微波线性模拟计算机(MiLAC)最近出现,能够在模拟域中实现高性能和高效的波束赋形。在本文中,我们引入了一个用于MiLAC辅助收发器的双功能框架。除了模拟域预编码/合并(主动波束赋形)外,MiLAC及其天线阵列可以同时充当可重构智能表面(RIS)(被动波束赋形)。这使得MiLAC能够在反射外部入射信号的同时,为发射/接收执行波束赋形。我们为这种双功能MiLAC提供了最优重构策略,并刻画了主动与被动速率之间权衡的基本极限,即容量区域边界和和速率容量。

英文摘要

Microwave linear analog computers (MiLACs) have recently emerged to enable high-performance and efficient beamforming in the analog domain. In this paper, we introduce a dual-functionality framework for MiLAC-aided transceivers. Beyond analog-domain precoding/combining (active beamforming), a MiLAC and its antenna array can simultaneously act as a reconfigurable intelligent surface (RIS) (passive beamforming). This allows the MiLAC to execute beamforming for transmission/reception while reflecting external incident signals. We provide an optimal reconfiguration strategy for this dual-functional MiLAC, and characterize the fundamental limits on the trade-off between active and passive rate, namely the capacity region bounds and the sum-rate capacity.

2605.31526 2026-06-01 cs.IT eess.SP math.IT

Distributionally Robust Physical-Layer Security for Satellite Communication via Aerial Reconfigurable Intelligent Surface

基于空中可重构智能表面的卫星通信分布鲁棒物理层安全

Zhaole Wang, Xiao Tang, Naijin Liu, Jinxin Liu, Qinghe Du, Lei Chen, Tingwu Lin

AI总结 针对卫星通信易被窃听的问题,提出利用空中可重构智能表面(ARIS)增强物理层安全,通过联合优化发射和反射波束成形,并采用基于矩的模糊集刻画信道不确定性,实现分布鲁棒的保密速率优化。

详情
Comments
Accepted @ IEEE TCOM
AI中文摘要

卫星通信被视为未来6G网络实现无处不在覆盖的关键使能技术,然而其广播特性使其易受窃听,尤其是在长距离传输和相关高不确定性的情况下。在本文中,我们提出借助空中可重构智能表面(ARIS)增强多波束卫星通信的物理层安全。考虑到信道的高动态性和不确定性,我们使用基于矩的模糊集来刻画信道分布。据此,通过联合设计发射和反射波束成形,构建了分布鲁棒的保密速率优化问题。然后,我们引入基于条件风险价值的重新表述,将概率约束转化为确定性形式。随后采用交替优化框架,迭代更新发射和反射波束成形向量直至收敛。仿真结果表明,所提出的分布鲁棒方案显著增强了保密性能,并在各种信道误差分布下保持可靠的性能。

英文摘要

Satellite communications are envisioned as a key enabler for ubiquitous coverage in future 6G networks, yet the broadcast nature renders them vulnerable to eavesdropping, especially given the long-distance transmissions and associated high uncertainties. In this paper, we propose the physical layer security enhancement for multi-beam satellite communications with the assistance of an aerial reconfigurable intelligent surface (ARIS). Considering the high dynamics and uncertainties of channels, we characterize the channel distribution with moment-based ambiguity sets. Accordingly, a distributionally robust secrecy rate optimization is formulated through joint design of transmit and reflection beamforming. We then introduce a conditional value-at-risk-based reformulation to convert the probabilistic constraints into deterministic forms. An alternating optimization framework is subsequently employed to iteratively update the transmit and reflective beamforming vectors until convergence. Simulation results demonstrate that the proposed distributionally robust scheme significantly enhances secrecy performance, and maintains reliable performance across various channel error distributions.

2605.31510 2026-06-01 eess.SP

Cooperative Uplink Channel Estimation in User-Centric Cell-free Massive MIMO Communication Networks

以用户为中心的无小区大规模MIMO通信网络中的协作上行信道估计

Pourya Behmandpoor, Marc Moonen

AI总结 针对以用户为中心的无小区大规模MIMO网络,提出一种基于最小均方误差的协作上行信道估计方法,通过接入点间共享线性压缩信号实现最优估计,降低通信开销并提升估计精度与收敛速度。

详情
AI中文摘要

无小区大规模多输入多输出(CFmMIMO)通信网络旨在通过在整个覆盖区域分布接入点(AP)来提供均匀的服务质量。在以用户为中心的变体中,每个用户设备(UE)可以选择具有最佳信道条件(例如最近的AP)的AP簇来接入服务。这种方法消除了蜂窝mMIMO通信网络中具有专用区域和AP的小区的概念。估计UE和AP之间的上行信道是CFmMIMO通信网络中的关键步骤;然而,现有的信道估计(CE)方法通常源自mMIMO系统,没有考虑CFmMIMO通信网络的独特特性。例如,CFmMIMO系统中较短的AP-UE距离导致AP与UE之间具有显著视距(LoS)分量的莱斯信道模型,这促使AP之间进行协作以提高性能。在本文中,我们提出了一种基于协作最小均方误差(MMSE)的上行CE方法,其中AP将其线性压缩信号作为融合信号与同一簇中的其他AP共享。所提出的方法是最优的,即其性能等同于集中式CE方法(其中AP共享其未压缩的原始信号)的性能。值得注意的是,这种最优性是一次性实现的;也就是说,给定所需的协方差矩阵,最优融合滤波器和估计器是非迭代推导的。因此,与集中式方法相比,所提出的方法保证了协作CE更低的通信开销。数值实验验证了所提出的协作CE方法在CE精度和收敛速度方面的优越性能。

英文摘要

Cell-free massive multi-input-multi-output (CFmMIMO) communication networks aim to provide uniform quality of service by distributing access points (APs) across a coverage area. In user-centric variants, each user equipment (UE) can choose a cluster of APs with the best channel conditions (e.g., the closest APs) for accessing service. This approach eliminates the notion of cells with dedicated regions and APs, as found in cellular mMIMO communication networks. Estimating uplink channels between UEs and APs is a crucial step in CFmMIMO communication networks; however, existing channel estimation (CE) approaches typically originate from mMIMO systems without considering the unique properties of CFmMIMO communication networks. For instance, shorter AP-UE distances in CFmMIMO systems result in Rician channel models with prominent line of sight (LoS) components between APs and UEs, motivating cooperation between APs for improved performance. In this paper, we propose a cooperative minimum-mean-squared-error (MMSE)-based uplink CE approach where APs share their linearly compressed signals as fused signals with other APs in the same cluster. The proposed approach is optimal, i.e., its performance is equivalent to that of the centralized CE approach, where APs share their uncompressed raw signals. Notably, this optimality is achieved in one shot; that is, given the required correlation matrices, the optimal fusion filters and estimators are derived non-iteratively. Consequently, the proposed approach guarantees lower communication overhead for cooperative CE compared to the centralized approach. Numerical experiments corroborate the superior performance of the proposed cooperative CE approaches in terms of CE accuracy and convergence rate.

2605.31499 2026-06-01 eess.SP

Perceptual-Quality based AMC for Enhanced mmWave Spectral Efficiency: Concept and Experiment

基于感知质量的AMC以增强毫米波频谱效率:概念与实验

Kıvanç Değirmenci, Hasan Atalay Günel, Mohaned Chraiti, Özgür Erçetin, Ali Ghrayeb, Ali Görçin

AI总结 针对毫米波系统,提出一种将感知质量指标(基于SSIM)融入自适应调制编码(AMC)框架的方法,利用DnCNN去噪器提升图像质量,实验表明在保持感知保真度下频谱效率提升两倍。

详情
Comments
To be published in IEEE WCNC 2026
AI中文摘要

对于超高清视频流和沉浸式扩展现实等高吞吐量应用,感知质量而非比特级精度定义了主要性能标准,并提供了比严格逐位重建更具信息量和频谱效率的目标。这在毫米波(mmWave)和亚太赫兹(sub-THz)系统中尤为相关,其中路径损耗、短信道相干时间和相位噪声引入严重波动,降低链路频谱效率。我们提出对传统自适应调制编码(AMC)框架的扩展,将感知质量意识纳入链路自适应。在该框架中,决策度量是从结构相似性指数(SSIM)导出的感知质量指标(PQI)。接收端采用去噪卷积神经网络(DnCNN)去噪器,在反馈估计前增强解码后图像质量。得到的感知度量替换AMC环路中的标准信道质量指示符(CQI),使得自适应能够在满足感知保真度约束的同时最大化频谱效率。在符合5G的毫米波测试平台上的实验表明,在保持感知保真度的同时,频谱效率提升高达两倍,突显了感知优化链路自适应的潜力。

英文摘要

For high-throughput applications such as ultra-high-definition video streaming and immersive extended-reality, perceptual quality rather than bit-level accuracy defines the primary performance criterion and provides a more informative and spectrally efficient objective than strict bitwise reconstruction. This is particularly relevant in millimeter-wave (mmWave) and sub-Terahertz (sub-THz) systems, where path loss, short channel coherence times and phase noise introduce severe fluctuations that degrade link spectral efficiency. We propose an extension to conventional Adaptive Modulation and Coding (AMC) framework that incorporates perceptual quality awareness into link adaptation. In this framework, the decision metric is a Perceptual Quality Indicator (PQI) derived from the Structural Similarity Index Measure (SSIM). The receiver employs a Denoising Convolutional Neural Network (DnCNN) denoiser to enhance post-decoding image quality before feedback estimation. The resulting perceptual metric replaces the standard Channel Quality Indicator (CQI) in the AMC loop, enabling adaptation to maximize spectral efficiency while satisfying a perceptual-fidelity constraint. Experiments on a 5G-compliant mmWave testbed demonstrate up to a twofold gain in spectral efficiency while maintaining perceptual fidelity, underscoring the potential of perception-optimized link adaptation.

2605.31478 2026-06-01 cs.SE cs.CL cs.SY eess.SY

Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation

知识边界探测与需求引导干预:面向基于LLM的电力系统代码生成

Hui Wu, Xiaoyang Wang, Zhong Fan

AI总结 针对LLM在电力系统代码生成中因API知识边界错误导致失败的问题,提出PowerCodeBench基准、L0-L3文档驱动探测和边界感知干预方法,显著提升模型准确率。

详情
Comments
43 pages, 12 figures, includes supplementary material
AI中文摘要

大型语言模型(LLMs)越来越多地被用于自动化电力系统分析,但许多公用事业和能源研究实验室出于保密、监管、可重复性和成本原因,要求本地部署。这使得开源模型的可靠性成为一个部署问题。我们表明,电力系统代码生成中的首次失败并非仅由推理主导,而是由结构化的API知识边界错误主导:在版本化的仿真库中出现虚构的函数名、误用的参数以及处理不当的结果表。我们引入了PowerCodeBench,一个经过执行验证的基准生成器,它将自然语言操作员查询与pandapower代码和数值真值配对;一个L0-L3文档驱动的探测程序,用于测量每个模型的API知识概况;以及一种边界感知干预,将查询侧API需求估计与目标主动文档注入和路由被动修正相结合。在一个包含2000个任务的冻结版本上,我们评估了十个开源LLM(1.5B-480B参数)和四个商业中端API。该干预措施使每个评估的至少7B参数的开源模型和每个商业API提升了32到56个准确率点。70B-120B范围内的开源模型匹配了商业中端准确率范围,而Llama-3.1-405B和Qwen3-Coder-480B领先。目标提示在保持全上下文准确率上限的同时,使用了41%的提示令牌成本。结果是在不进行微调或云端推理的情况下,为电网分析工作流提供可靠的本地LLM辅助的准确率侧、部署时路径。

英文摘要

Large language models (LLMs) are increasingly used to automate power-system analysis, but many utilities and energy-research labs require on-premise serving for confidentiality, regulatory, reproducibility, and cost reasons. This makes the reliability of open-weight models a deployment issue. We show that first-pass failures in power-system code generation are dominated not by reasoning alone, but by structured API-knowledge boundary errors: hallucinated function names, misused parameters, and mishandled result tables in versioned simulation libraries. We introduce PowerCodeBench, an execution-validated benchmark generator that pairs natural-language operator queries with pandapower code and numerical ground truth; an L0-L3 documentation-driven probing procedure that measures per-model API knowledge profiles; and a boundary-aware intervention that combines query-side API demand estimation with targeted proactive documentation injection and routed reactive correction. On a 2,000-task frozen release, we evaluate ten open-weight LLMs (1.5B-480B parameters) and four commercial mid-tier APIs. The intervention improves every evaluated open-weight model of at least 7B parameters and every commercial API by 32 to 56 accuracy points. Open-weight models in the 70B-120B range match the commercial mid-tier accuracy range, while Llama-3.1-405B and Qwen3-Coder-480B lead the panel. The targeted prompts preserve the full-context accuracy ceiling while using 41% of the prompt-token cost. The result is an accuracy-side, deployment-time path toward reliable on-premise LLM assistance for grid-analysis workflows without fine-tuning or cloud inference.

2605.31469 2026-06-01 cs.CL cs.AI cs.SD eess.AS

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

扩展匈牙利语对话ASR:BEA-Dialogue+语料库

Máté Gedeon, Piroska Zsófia Barta, Péter Mihajlik, Katalin Mády

AI总结 针对匈牙利语对话语音识别训练数据不足的问题,本文通过放宽分割标准扩展BEA-Dialogue语料库至200小时,并评估基于Whisper和FastConformer的模型,证明基于序列化输出训练的微调能持续改善识别性能。

详情
AI中文摘要

匈牙利语对话自动语音识别受到公开对话式训练数据有限的制约。BEA-Dialogue语料库解决了这一需求,但其严格的说话人分离的训练/开发/测试分割将可用材料减少到仅85小时。在本文中,我们介绍了BEA-Dialogue+,这是该语料库的扩展版本,它放宽了实验者和对话伙伴的分割标准,同时保持主要说话人的完全分离。这产生了200小时转录的自然对话,并允许对额外训练数据与分割间说话人重叠之间的权衡进行受控研究。我们在两个语料库版本上评估了多个基于Whisper和FastConformer的模型,包括基于序列化输出训练(SOT)的对话转录微调。我们的结果表明,对于未经微调的模型,较大的语料库更具挑战性,而基于SOT的适应在WER、CER、cpWER和cpCER上产生了一致的改进。总体而言,BEA-Dialogue+为匈牙利语对话ASR提供了一个更大但仍具挑战性的基准,以及用于训练和评估对话转录系统的实用资源。

英文摘要

Conversational automatic speech recognition in Hungarian is constrained by the limited amount of publicly available dialogue-style training data. The BEA-Dialogue corpus addresses this need, but its strictly speaker-disjoint train/dev/eval split reduces the usable material to only 85 hours. In this paper, we introduce BEA-Dialogue+, an expanded version of the corpus that relaxes the split criterion for experimenters and dialogue partners while preserving complete separation of the primary speakers. This results in 200 hours of transcribed natural conversations and enables a controlled study of the trade-off between additional training data and speaker overlap across the splits. We evaluate several Whisper- and FastConformer-based models on both corpus versions, including Serialized Output Training (SOT)-based fine-tuning for dialogue transcription. Our results show that the larger corpus is more challenging for models without fine-tuning, whereas SOT-based adaptation yields consistent improvements in WER, CER, cpWER, and cpCER. Overall, BEA-Dialogue+ provides a substantially larger yet still demanding benchmark for Hungarian dialogue ASR, and a practical resource for training and evaluating dialogue transcription systems.

2605.31460 2026-06-01 cs.RO cs.SY eess.SY

On-Device Robotic Planning: Eliminating Inference Redundancy for Efficient Decision-Making

设备端机器人规划:消除推理冗余以实现高效决策

Joonhee Lee, Hyunseung Shin, Hyunmi Kim, Pei Zhang, Jeonggil Ko

AI总结 提出REIS框架,通过场景门控、KV引导的affordance路由和审慎推理减少推理冗余,在保持语义适应性的同时加速机器人控制。

详情
Comments
19 pages
AI中文摘要

基于推理的机器人策略使用大型语言和视觉语言模型实现了强大的语义规划能力,但大多受限于高推理延迟,限制了实际实时部署。在这项工作中,我们观察到机器人推理工作负载包含大量的时间冗余,连续观察经常产生相同的动作和子目标。基于这一洞察,我们提出了REIS,一种受人类认知启发的机器人决策框架,在保持语义适应性的同时最小化不必要的推理。REIS结合了轻量级场景门控、KV引导的affordance路由和审慎推理,以在具身约束下加速机器人控制。在ALFRED和真实世界机器人任务上的实验表明,REIS显著抑制了推理开销,同时保持了有竞争力的任务性能。

英文摘要

Reasoning-based robotic policies using large language and vision-language models achieve strong semantic planning capabilities but mostly suffer from a high inference latency that limits practical real-time deployment. In this work, we observe that robotic reasoning workloads contain substantial temporal redundancy, where consecutive observations frequently produce identical actions and subgoals. Based on this insight, we present REIS, a human cognition inspired robotic decision-making framework that minimizes unnecessary reasoning while preserving semantic adaptability. REIS combines lightweight scene gating, KV-steered affordance routing, and deliberative reasoning to accelerate robotic control under embodied constraints. Experiments on ALFRED, and real-world robotic tasks demonstrate that REIS significantly suppresses reasoning overhead while maintaining competitive task performance.

2605.31458 2026-06-01 cs.SE cs.SY eess.SY

Ladder Logic Translation using Large Language Models in Industrial Automation

工业自动化中使用大语言模型的梯形图逻辑翻译

Oluwatosin Ogundare, Promise Ekpo, Nathanial Wiggins

AI总结 针对PLC厂商切换中梯形图翻译的语义不一致问题,提出基于LLM的数学建模与工程化流水线,实现Rockwell到Siemens S7的高语义一致性自动翻译。

详情
AI中文摘要

梯形图逻辑翻译是工业自动化中的一个重要问题,因为没有它,就很难切换可编程逻辑控制器(PLC)供应商。普遍的翻译问题凸显了不匹配的编程环境、不兼容的梯形图逻辑结构、供应商形式主义语义表达差异的限制以及集成的黑盒专有工程工具,这些在我们的示例案例——Rockwell到Siemens PLC代码翻译中得到了体现。本文提出了该问题的数学表述、解决方案的详细架构,该方案支持XML提取、结构归一化、约束生成函数(LLM)以及通过TIA Portal Openness API的系统集成,作为一个严格设计的流水线,用于自动将Rockwell梯形程序翻译为Siemens S7梯形程序。最后,我们展示的结果表明,翻译在指令类别之间保持了高语义一致性。

英文摘要

Ladder logic translation is an important problem in industrial automation because without it, it is difficult to switch Programmable Logic Controller (PLC) vendors. The prevailing translation problem highlights mismatched programming environments, incompatible ladder logic constructs, limitations in terms of differences in the semantic expressiveness of the vendor formalisms and integrated black-box proprietary engineering tools which are exemplified in our example case; Rockwell to Siemens PLC code translation. This work presents a mathematical formulation of the problem, the detailed architecture of a solution which supports XML extraction, structural normalization, constrained generative function (LLM), and system integration via the TIA Portal Openness API as rigorously engineered pipeline for automated translation of Rockwell Ladder Programs to Siemens S7 ladder programs. Finally, we present results that show that the translations retain high semantic consistency across instruction categories.

2605.31426 2026-06-01 eess.IV cs.CV math.OC

Self-Tuning Regularization for Image Scanning Microscopy

图像扫描显微镜的自调谐正则化

Sofia Agostoni, Lisa Cuneo, Christian Daniele, Giacomo Garré, Laurent Le, Alessandro Zunino, Giuseppe Vicidomini, Luca Calatroni

AI总结 针对图像扫描显微镜(ISM)的多图像反卷积(MID)和超分辨率切片ISM(s²ISM)重建,提出一种自调谐显式正则化框架,通过贝叶斯最大后验公式结合多帧泊松数据保真项与ℓ1或平滑全变分惩罚,并基于残差白化原则自适应选择正则化参数,无需经验停止准则,在低光子条件下实现稳定超分辨和光学切片。

详情
AI中文摘要

图像扫描显微镜(ISM)是一种荧光成像技术,它结合探测器阵列采集和计算重建,实现理想共聚焦显微镜(即使用无穷小针孔)的理论分辨率,同时保持高信噪比。在获得超分辨图像的重建方法中,多图像反卷积(MID)及其旨在保持共聚焦显微镜光学切片能力的扩展(称为超分辨率切片ISM,s²ISM)是最广泛使用的方法之一。这两种方法都依赖于Richardson-Lucy型迭代方案,其半收敛行为需要提前停止,并且常常导致噪声放大和重建伪影。在这项工作中,我们为MID和s²ISM重建引入了一个自调谐显式正则化框架。在贝叶斯最大后验公式中,我们将多帧泊松数据保真项与显式正则化相结合,考虑ℓ1和平滑全变差惩罚作为代表性例子。我们进一步通过将残差白化原则适应于多帧泊松设置,并引入针对s²ISM定制的频谱高通扩展,开发了一种自动且无需真实值的正则化参数选择策略。由此产生的框架无需经验停止规则即可实现稳定重建。为了演示所提出的框架,我们考虑了基于近端梯度和镜像下降方法的一阶优化方案,并采用自适应回溯策略。在模拟和真实荧光ISM数据集上的实验表明,与无正则化方法相比,重建稳定性和图像质量得到改善,同时在低光子条件下实现了鲁棒的超分辨率和光学切片。

英文摘要

Image Scanning Microscopy (ISM) is a fluorescence imaging technique that combines detector-array acquisition and computational reconstruction to achieve the theoretical resolution of an ideal confocal microscope, i.e., one operating with an infinitesimally small pinhole, while maintaining high signal-to-noise ratio. Among the reconstruction methods for obtaining the super-resolved image, multi-image deconvolution (MID) and its extension aimed at preserving the optical sectioning capability of confocal microscopy, known as super-resolution sectioning ISM (s$^2$ISM), are among the most widely used approaches. Both methods rely on Richardson--Lucy-type iterative schemes, whose semi-convergent behavior requires early stopping and often leads to noise amplification and reconstruction artifacts. In this work, we introduce a self-tuning explicit regularization framework for both MID and s$^2$ISM reconstruction. Within a Bayesian maximum a posteriori formulation, we combine a multi-frame Poisson data fidelity term with explicit regularization, considering $\ell_1$ and smoothed total variation penalties as representative examples. We further develop an automatic and ground-truth-free strategy for regularization parameter selection by adapting the residual whiteness principle to the multi-frame Poisson setting and introducing a spectral high-pass extension tailored to s$^2$ISM. The resulting framework enables stable reconstructions without empirical stopping rules. To demonstrate the proposed framework, we consider first-order optimization schemes based on proximal gradient and mirror descent methods with adaptive backtracking strategies. Experiments on simulated and real fluorescence ISM datasets demonstrate improved reconstruction stability and image quality with respect to unregularized approaches, while enabling robust super-resolution and optical sectioning in low-photon conditions.

2605.31396 2026-06-01 eess.SY cs.SY

Current Practices in Electricy Demand and Charging Scheduling for On-Road Electric Fleet Operations: An Industry-Wide Review

当前公路电动车队运营中的电力需求与充电调度实践:行业综述

Joost Commandeur, Bart De Schutter, Neil Yorke-Smith

AI总结 本文通过灰色文献分析,综述了电动卡车车队运营中数字系统的现状,识别了关键趋势与差距,以指导未来研究与发展。

详情
AI中文摘要

公路车队物流的电气化有望改善空气质量、降低噪音排放、带来重大气候效益、通过使用本地发电提高能源灵活性,并减少对进口燃料的依赖。然而,电池电动汽车可能带来内燃机车辆所没有的运营规划挑战,包括异构充电速度、暴露于波动的电价以及基础设施稀缺。管理这些复杂性需要平衡成本效益和鲁棒性的解决方案,并得到交通与电力系统之间部门耦合的支持。本文通过灰色文献分析,回顾了电动卡车车队管理中用于运营决策的数字系统的现状,借鉴了从业者导向的来源,如行业报告、公司文档和技术博客,这些反映了实际实践和发展。我们识别了关键趋势和差距,为未来研究和发展提供指导。

英文摘要

The electrification of on-road fleet logistics promises improved air quality, lower noise emissions, major climate benefits, increased energy flexibility through the use of locally generated electricity and reduced dependence on imported fuels. However, battery electric vehicles can introduce operational planning challenges not present with internal combustion engine vehicles, including heterogeneous charging speeds, exposure to volatile electricity prices, and scarcity in infrastructure. Managing these complexities requires solutions that balance cost efficiency and robustness, supported by sector coupling between transport and electricity systems. This paper reviews the current state of digital systems for operational decision-making in electric fleet management through a grey literature analysis, drawing on practitioner-oriented sources such as industry reports, company documentation, and technical blogs that reflect real-world practices and developments. We identify key trends and gaps, providing insights to guide future research and development.

2605.31366 2026-06-01 eess.SP

ISAC-Enabled Grant-Free Uplink via Artificial-Path Delay Modulation

基于人工路径时延调制的ISAC免授权上行链路

Ruiqi Kong, He Chen

AI总结 提出一种基于人工路径时延调制的通感一体化免授权上行框架,通过调制下行波形的可控人工路径时延传递上行信息,实现免SIC的上下行共存。

详情
Comments
6 pages
AI中文摘要

本文提出了一种基于人工路径时延调制的通感一体化(ISAC)免授权上行框架。免授权用户设备(g-UE)通过调制来自调度下行波形的可控人工路径的时延来传递上行信息。与传统的基于叠加和连续干扰消除的方案不同,所提方法在时延感知域实现了上下行共存。通过引入一个限制在循环前缀(CP)内的弱人工路径,g-UE允许接入点(AP)从CSI扰动中解码上行符号,同时仅对下行链路中的调度用户设备(s-UE)造成有限退化。为了在未知路径增益和离网格泄漏下支持可靠的有限字母表时延检测,我们开发了基线时延校准程序和归一化匹配滤波器检测器。结果表明,反射功率决定了g-UE和s-UE之间的可靠性权衡,而时延步长主要控制g-UE的可靠性-效率权衡,对下行s-UE的额外影响很小。即使人工路径比调度下行信号弱15 dB,g-UE在有效调制阶数为16-QAM时也能实现比s-UE更低的误码率。因此,所提框架为ISAC系统中的免授权上行提供了一种低复杂度、免SIC且对下行友好的解决方案。

英文摘要

This paper proposes an integrated sensing and communication (ISAC)-enabled grant-free uplink framework based on artificial-path delay modulation. A grant-free user equipment (g-UE) conveys uplink information by modulating the delay of a controllable artificial path derived from the scheduled downlink waveform. In contrast to conventional superposition-based schemes with successive interference cancellation, the proposed method enables uplink-downlink coexistence in the delay-sensing domain. By introducing a single weak artificial path confined within the cyclic prefix (CP), the g-UE allows the access point (AP) to decode uplink symbols from CSI perturbations while causing only limited degradation to the scheduled user equipment (s-UE) in the downlink. To support reliable finite-alphabet delay detection under unknown path gain and off-grid leakage, we develop a baseline delay calibration procedure and a normalized matched-filter detector. Results show that reflection power determines the reliability trade-off between the g-UE and the s-UE, whereas the delay step mainly controls the g-UE reliability-efficiency trade-off with little additional impact on the downlink s-UE. Even with an artificial path 15 dB weaker than the scheduled downlink signal, the g-UE achieves lower BER than the s-UE at an effective modulation order of 16-QAM. The proposed framework thus offers a low-complexity, SIC-free, and downlink-friendly solution for grant-free uplink in ISAC systems.

2605.31329 2026-06-01 eess.AS

Improving acoustic drone detection generalization through pretraining and data augmentation

通过预训练和数据增强提高声学无人机检测的泛化能力

Paul M. Reuter, Mattes Ohlenbusch, Christian Rollwage

AI总结 针对声学无人机检测泛化挑战,提出基于预训练和在线数据增强的紧凑型DNN检测器,显著提升跨域检测性能。

详情
Comments
Accepted to Quiet Drones 2026
AI中文摘要

检测未经授权的无人机飞行对于监控、安全和空域管理至关重要。声学无人机检测利用无人机独特的螺旋桨和电机声音,提供了一种低成本、被动且无需视线的解决方案。一个核心挑战是泛化:在未见过的记录设置、环境和无人机类型(域外)中,可靠地将无人机信号与环境噪声区分开来。受大规模音频预训练进展的启发,我们开发了一个紧凑的基于DNN的检测器,并通过以下方式提高其泛化能力:(1)在微调于多样化的内部和公共无人机录音之前,对模型进行广泛的声事件分类预训练;(2)应用在线增强(音高偏移、噪声混合、麦克风传递函数模拟、频谱图增强),使模型暴露于各种声学条件。一项消融研究量化了每种增强的影响。为了评估,我们设定了与真实世界监控需求一致的目标假阳性率(FPR),并报告了在域内数据(公共IDMT Berne 2022)和域外数据(公共AuDroK)上的真阳性率(TPR)。我们的结果表明,预训练是鲁棒检测的主导因素,在所有基准测试中,与从头训练相比,TPR显著提高。完整的增强链在声学不匹配的域外数据上提供了额外的增益,在AuDroK子集上实现了最佳平均TPR,并在最具挑战性的场景中实现了最大改进。我们进一步通过测量公共非无人机语料库(IDMT-TRAFFIC和ESC-50)上的假阳性来验证真实世界的适用性,在陌生背景上同样表现出低FPR。对IDMT Berne 2022的距离依赖性分析显示,在高达150米的距离内有效检测。

英文摘要

Detecting unauthorized UAV flights is critical for surveillance, security, and airspace management. Acoustic drone detection, which relies on the distinctive propeller and motor sounds of UAVs, provides a low-cost, passive solution that requires no line of sight. A central challenge is generalization: reliably distinguishing drone signatures from ambient noise across unseen recording setups, environments, and UAV types (out-of-domain). Inspired by advances in large-scale audio pretraining, we develop a compact DNN-based detector and improve its generalization by (1) pretraining the model for broad sound-event classification before fine-tuning on diverse in-house and public drone recordings, and (2) applying on-the-fly augmentations (pitch shifting, noise mixing, microphone transfer function simulation, spectrogram augmentation) to expose the model to varied acoustic conditions. An ablation study quantifies the impact of each augmentation. For evaluation, we set target false-positive rates (FPR) aligned with real-world surveillance needs and report true-positive rates (TPR) on both in-domain data (public IDMT Berne 2022) and out-of-domain data (public AuDroK). Our results show that pretraining is the dominant factor for robust detection, yielding substantial TPR improvements over training from scratch on all benchmarks. The full augmentation chain provides additional gains on acoustically mismatched out-of-domain data, achieving the best mean TPR on the AuDroK subsets and the largest improvements on the most challenging scenarios. We further validate real-world applicability by measuring false positives on public non-drone corpora (IDMT-TRAFFIC and ESC-50), demonstrating equally low FPR on unfamiliar backgrounds. A distance-dependent analysis on IDMT Berne 2022 shows effective detection at distances up to 150 m.

2605.31310 2026-06-01 eess.SY cs.SY

Model-free LQG Control with Chance Constraints

带机会约束的无模型LQG控制

Arunava Naha, Subhrakanti Dey

AI总结 针对带概率风险约束的线性时不变系统,提出一种基于自然策略梯度的演员-评论家算法,在无模型情况下实现线性收敛并保证闭环稳定性,数值验证其有效性。

详情
Comments
Under review at IEEE OPEN JOURNAL OF CONTROL SYSTEMS
AI中文摘要

本文研究了线性时不变系统在概率风险或机会约束下的无模型最优控制设计及其收敛性质。具体地,我们研究了一种基于自然策略梯度(NPG)的双时间尺度演员-评论家(AC)算法,使用拉格朗日原始-对偶框架来强制执行约束。此外,风险被定义为一步超前状态的函数超过用户指定阈值的概率。据我们所知,这是首个在无模型知识的机会约束线性二次高斯(LQG)调节器设置中研究基于NPG的AC算法解析收敛性质的工作。我们建立了拉格朗日函数的强制性和梯度优势性质,这确保了训练过程中演员的线性收敛和闭环稳定性。另一方面,我们应用随机逼近理论分析了评论家的时序差分(TD(0))学习的收敛性质。此外,我们证明了约束优化问题中无对偶间隙。另外,我们对所提方法的收敛性质和精度进行了数值分析,并与基于模型的机会约束LQR和基于场景的MPC进行了比较。结果表明,我们的方法在无需完整模型知识或实时优化的情况下,有效限制了风险,同时保持了接近最优的性能。

英文摘要

This paper studies model-free optimal control design and its convergence properties for linear time-invariant systems subject to probabilistic risk or chance constraints. In particular, we study a natural policy gradient (NPG)-based actor-critic (AC) algorithm with two timescales, using a Lagrangian primal-dual framework to enforce the constraint. Furthermore, the risk is defined as the probability that a function of the one-step-ahead state exceeds a user-specified threshold. To our knowledge, this is the first work to study the analytical convergence properties for NPG-based AC in a chance-constrained linear-quadratic Gaussian (LQG) regulator setting without model knowledge. We establish the coercivity and gradient dominance properties of the Lagrangian function, which ensure linear convergence and closed-loop stability during training for the actor. On the other hand, we analyse the convergence properties of the temporal difference (TD(0)) learning for the critic, applying stochastic approximation theory. Also, we demonstrate no duality gap in the constrained optimisation problem. Additionally, we have performed numerical analysis of the convergence properties and accuracy of the proposed method, comparing it with model-based chance-constrained LQR and scenario-based MPC. Results show that our approach effectively limits risk while maintaining near-optimal performance, without requiring full model knowledge or real-time optimisation.

2605.31306 2026-06-01 math.OC cs.SY econ.EM eess.SY stat.ME

Posterior and Likelihood Sensitivity in Bayesian Distributionally Robust Optimization

贝叶斯分布鲁棒优化中的后验和似然敏感性

Jun-ya Gotoh, Andrew E. B. Lim, Michael Jong Kim

AI总结 本文提出最坏情况后验和似然敏感性的概念,用于量化贝叶斯模型对后验和似然扰动的鲁棒性,并证明分布鲁棒优化可实现性能与鲁棒性的近似帕累托最优权衡。

详情
AI中文摘要

我们引入了最坏情况后验和似然敏感性的概念。这些分别衡量期望成本对后验分布最坏情况扰动和贝叶斯模型似然最坏情况扰动的敏感性。每个都定义了鲁棒性的定量度量。关心样本外期望成本对其假设偏差敏感性的决策者将希望两个敏感性都较小的决策。我们推导了由偏差度量定义的不确定性集的后验和似然敏感性。当后验方差缩小到零时,后验敏感性消失,这发生在参数不确定性通过学习消除时。参数学习不能消除似然敏感性。贝叶斯优化问题的分布鲁棒公式在性能(期望成本)和鲁棒性(后验和似然敏感性)之间实现了近似帕累托最优的权衡。

英文摘要

We introduce the notion of worst-case posterior and worst-case likelihood sensitivity. These measure, respectively, the sensitivity of the expected cost to worst-case perturbations of the posterior distribution and worst-case perturbations of the likelihood of a Bayesian model. Each defines a quantitative measure of robustness. A decision maker concerned about the sensitivity of the out-of-sample expected cost to deviations from her assumptions will want a decision for which both sensitivities are small. We derive posterior and likelihood sensitivities for uncertainty sets defined in terms of deviation measures. Posterior sensitivity vanishes when the posterior variance shrinks to zero, which occurs when parameter uncertainty is eliminated from learning. Parameter learning does not eliminate likelihood sensitivity. A distributionally robust formulation of a Bayesian optimization problem makes a near-Pareto-optimal tradeoff between performance (expected cost) and robustness (posterior and likelihood sensitivity).

2605.31302 2026-06-01 eess.IV cs.CV eess.SP

MoE-dqINR: A Unified Mixture-of-Experts Implicit Neural Representation Framework for Scan-Specific Dynamic and Quantitative MRI Reconstruction

MoE-dqINR:用于特定扫描动态和定量MRI重建的统一混合专家隐式神经表示框架

Yinzhe Wu, Fanwen Wang, Zhenxuan Zhang, Zi Wang, Chengyan Wang, Guang Yang

AI总结 提出MoE-dqINR框架,通过共享空间专家和状态条件路由路径,实现高效、统一的特定扫描多线圈动态和定量MRI重建,优化时间约30秒。

详情
AI中文摘要

欠采样磁共振成像(MRI)重建旨在从不完整的多线圈k空间数据中恢复时间或对比度变化的图像序列,同时为动态和定量MRI(qMRI)保留状态相关的保真度。现有的特定扫描隐式神经表示(INR)通常使用单一的时空坐标场、显式子空间、运动或变形模型、校准变量或序列特定的定量信号模型。这些设计选择在跨采集状态适应图像合成的同时,限制了共享空间信息的灵活性。此外,许多基于INR的基线方法计算量大,通常需要每个扫描数百到数千秒的优化时间。我们提出MoE-dqINR,一种特定扫描的多线圈MRI重建框架,将图像域表示分解为共享空间专家和状态条件路由路径。空间专家编码可重用的坐标相关图像内容,而路由权重(以有序采集状态为条件)从公共专家库合成每个动态帧或对比状态。该表示与多线圈MRI前向模型耦合,使用归一化状态索引驱动动态和定量MRI中的路由。通过将共享空间表示与状态相关合成分离,该框架为动态和定量MRI提供了一种以图像为先的架构,同时在我们的实验中将特定扫描INR优化减少到每扫描约30秒。所提出的公式建立了状态条件混合专家INR作为特定扫描多线圈MRI重建先验,统一了共享空间表示、动态和qMRI特定合成以及实际每扫描效率。

英文摘要

Undersampled magnetic resonance imaging (MRI) reconstruction seeks to recover temporally or contrast-varying image series from incomplete multicoil k-space data while preserving state-dependent fidelity for dynamic and quantitative MRI (qMRI). Existing scan-specific implicit neural representations (INRs) often use monolithic spatiotemporal coordinate fields, explicit subspaces, motion or deformation models, calibration variables, or sequence-specific quantitative signal models. These design choices can limit flexibility in sharing spatial information while adapting image synthesis across acquisition states. Moreover, many INR-based baselines remain computationally demanding, typically requiring per-scan optimization times on the order of hundreds to thousands of seconds. We propose MoE-dqINR, a scan-specific multicoil MRI reconstruction framework that factorizes the image-domain representation into shared spatial experts and a state-conditioned routing pathway. Spatial experts encode reusable coordinate-dependent image content, whereas routing weights, conditioned on ordered acquisition states, synthesize each dynamic frame or contrast state from a common expert bank. The representation is coupled to a multicoil MRI forward model, uses the normalized state index to drive routing in both dynamic and quantitative MRI. By separating shared spatial representation from state-dependent synthesis, the framework provides an image-first architecture for dynamic and quantitative MRI while reducing scan-specific INR optimization to approximately 30 s per scan in our experiments. The proposed formulation establishes state-conditioned mixture-of-experts INR as a scan-specific multicoil MRI reconstruction prior that unifies shared spatial representation, dynamic- and qMRI-specific synthesis, and practical per-scan efficiency.

2605.31279 2026-06-01 eess.SP cs.AI cs.NI

Practical Cross-Band Channel Prediction for AI-RAN via Physics-Guided Deep Unfolding

面向AI-RAN的实用跨频段信道预测:基于物理引导的深度展开

Ruiqi Kong, He Chen, Xiaojun Lin

AI总结 提出GUIDE框架,通过将无线信道物理嵌入可微层,实现跨频段信道预测的泛化与实时推理,在未见环境中波束赋形增益比深度学习基线FIRE高2.75倍,比模型基线R2F2高1.39倍且速度快1610倍以上。

详情
Comments
2 pages
AI中文摘要

为了使跨频段信道预测对AI原生RAN实用化,算法必须能够泛化到不同环境并支持实时推理。现有方法只能实现其中之一。为弥合这一差距,我们引入了GUIDE,一种物理引导的深度展开框架,将无线信道物理嵌入到可微层中。在未见环境中无需重新训练,GUIDE的波束赋形增益比基于深度学习的基线FIRE高2.75倍,且推理时间仅略有增加;比最强的基于模型的基线R2F2的波束赋形增益高1.39倍,同时运行速度快1610倍以上。

英文摘要

To make cross-band channel prediction practical for AI-native RAN, algorithms must generalize across diverse environments and support real-time inference. Existing approaches achieve one but not both. To bridge this gap, we introduce GUIDE, a physics-guided deep unfolding framework that embeds wireless channel physics into differentiable layers. Without retraining in unseen environments, GUIDE achieves 2.75x beamforming gain than the deep learning-based baseline FIRE with only a slight increase in inference time, and 1.39x beamforming gain than the strongest model-based baseline R2F2 while running over 1610x faster.

2605.31270 2026-06-01 eess.SY cs.SY

Steering Fractional-Order Network Dynamics via Joint Parameter and State Control

通过联合参数与状态控制引导分数阶网络动力学

Alessandro Varalda, Sergio Pequito

AI总结 本文研究离散时间线性分数阶网络的控制问题,通过联合调节分数阶指数和网络耦合矩阵,实现了网络参数与状态的协同引导,并提出了能量约束下的二次规划求解方法。

详情
AI中文摘要

本文研究离散时间线性分数阶网络的控制问题,这是一种适用于具有长程记忆系统的灵活建模框架,例如电网、生物网络和神经元回路。与分数阶指数(时间尺度)是固定参数的常见观点相反,我们证明可以通过适当设计的输入序列系统地引导它们以及网络耦合矩阵。我们首先推导出代数条件,在这些条件下,给定网络的耦合矩阵和分数阶指数向量可以重新配置为所需值,并描述了截断无限记忆项对所得动力学的影响。基于这些结果,我们构建了一个等价的线性表示,隔离了记忆的贡献,并引入了一个分数阶可达性矩阵,该矩阵提供了在有限步数内联合引导网络参数和状态的显式条件。为了解决实际实现问题,我们进一步将能量约束引导问题(包含执行器限制和有限记忆近似)表述为二次规划。该框架在低维玩具示例、具有Erdos-Renyi、Barabasi-Albert和Watts-Strogatz拓扑的更大网络以及从癫痫患者皮层电图记录中推断出的脑网络模型上进行了说明,我们展示了发作前和发作状态之间的转换。

英文摘要

This paper studies the control of discrete-time linear fractional-order networks, a flexible modeling framework for systems with long-range memory such as power grids, biological networks, and neuronal circuits. In contrast to the common view that fractional exponents (time-scales) are fixed parameters, we show that they can be systematically steered, together with the network coupling matrix, by appropriately designed input sequences. We first derive algebraic conditions under which the coupling matrix and the vector of fractional exponents of a given network can be reconfigured to desired values, and we characterize how truncating the infinite-memory term impacts the resulting dynamics. Building on these results, we construct an equivalent linear representation that isolates the contribution of memory, and we introduce a fractional reachability matrix that provides explicit conditions for jointly steering both network parameters and state in a finite number of steps. To address practical implementations, we further formulate an energy-constrained steering problem that incorporates actuator bounds and finite-memory approximations as a quadratic program. The framework is illustrated on low-dimensional toy examples, on larger networks with Erdos-Renyi, Barabasi-Albert, and Watts-Strogatz topologies, and on a brain network model inferred from electrocorticography recordings of an epilepsy patient, where we showcase transitions between pre-seizure and seizure configurations.

2605.31267 2026-06-01 eess.SP

Super-Resolution Experimental Validation and Polarimetric Extension of the Effective Roughness Diffuse Scattering Models

有效粗糙度漫散射模型的超分辨率实验验证与极化扩展

Giacomo Melloni, Jack Chuang, Samuel Berweger, Enrico M. Vitucci, Vittorio Degli-Esposti, Camillo Gentile, Nada Golmie

AI总结 结合超分辨率多径分量提取与数字孪生辅助几何,首次通过测量验证了有效粗糙度模型,并提出了角度相关的交叉极化鉴别模型,实现了毫米波信道的高保真建模。

详情
AI中文摘要

漫散射模型的实验验证长期以来受限于无法在测量信道中空间分离镜面分量和漫散射分量。本文通过结合超分辨率多径分量(MPC)提取(可解析包括镜面分量在内的各个传播路径)与数字孪生辅助几何,克服了这一限制,实现了28 GHz双站测量中镜面和漫散射贡献的空间分离。利用该框架,我们首次提供了有效粗糙度(ER)模型的测量驱动验证,独立表征了十种常见建筑材料的漫散射特性,每种材料在266个角度配置和所有极化组合(HH、HV、VH、VV)下进行了测量。此外,我们通过提出一种新颖的角度相关交叉极化鉴别(XPD)模型扩展了ER框架,捕捉了现有方法忽略的依赖于几何的去极化特性。所提出的方法再现了测量的漫散射功率趋势,在测试材料上实现了低至3 dB的RMSE值,并且在几乎所有材料-极化情况下,XPD预测优于基线恒定XPD模型。这些结果为毫米波系统中的高保真信道建模建立了一种物理一致且实际可行的方法。

英文摘要

The experimental validation of diffuse scattering models has long been limited by the inability to spatially separate specular and diffuse contributions in measured channels. This paper overcomes this limitation by combining super-resolution multipath component (MPC) extraction, which resolves individual propagation paths including the specular component, with digital-twin-assisted geometry, enabling the spatial separation of specular and diffuse contributions from bistatic measurements at 28~GHz. Using this framework, we provide the first measurement-driven validation of the Effective Roughness (ER) model with independent characterization of diffuse scattering across ten common building materials, each measured over 266 angular configurations and all polarization combinations (HH, HV, VH, VV). Furthermore, we extend the ER framework by proposing a novel angle-dependent cross-polarization discrimination (XPD) model, capturing the geometry-dependent nature of depolarization that is neglected in existing approaches. The proposed method reproduces the measured diffuse power trends, achieving RMSE values as low as 3 dB across the tested materials, and improves XPD prediction over the baseline constant-XPD model for nearly all material-polarization cases. These results establish a physically consistent and practically viable approach for high-fidelity channel modeling in mmWave systems.

2605.31243 2026-06-01 eess.SY cs.SY

Safe Arrival Scheduling at Constraint Waypoints in UAM Corridors

UAM走廊中约束航路点的安全到达调度

Sasinee Pruekprasert, Shinji Nakadai

AI总结 提出两种基于NMAC规则和RSS规则的方法计算最小到达时间间隔,以保障UAM走廊内飞行器在约束航路点的安全调度。

详情
Journal ref
Proceedings of the AIAA SciTech 2025 Forum, January 2025
AI中文摘要

本研究引入了一种新颖的空中交通管制(ATC)概念,以支持城市空中交通(UAM)走廊内飞行器之间的自主间隔。我们提出的方案涉及在UAM运营商之间共享约束航路点(CWP)的预期到达时刻表。我们提出了两种方法来辅助CWP处的到达调度,通过计算每对飞行器所需的最小到达时间间隔,以确保它们在走廊内整个飞行过程中的安全。第一种方法考虑了近空中碰撞(NMAC)避免规则所需的最小间隔距离,而第二种方法基于责任敏感安全(RSS)规则。我们证明,在飞行器遵守走廊速度限制的正常情况下,基于NMAC规则的方法可以有效防止碰撞。然而,如果飞行器超过速度限制,该方法不能保证安全。相反,基于RSS规则的方法在紧急情况下当飞行器超过速度限制时能确保防止碰撞,但在正常情况下可能需要更大的到达时间间隔,这可能导致交通流量减少。我们的结果通过数值模拟得到了验证。

英文摘要

This study introduces a novel Air Traffic Control (ATC) concept to support self-separation between vehicles in Urban Air Mobility (UAM) corridors. Our proposed scheme involves sharing intended arrival schedules at Constrained Waypoints (CWPs) among UAM operators. We propose two approaches to assist the arrival scheduling at CWPs by computing the minimum arrival time gap necessary for each pair of vehicles to ensure their safety throughout the flights within the corridor. The first approach considers the minimum separation distance required by the Near Mid-Air-Collision (NMAC) avoidance rules, while the second one is based on the Responsibility-Sensitive Safety (RSS) rules. We demonstrate that the NMAC-rule-based approach can effectively prevent collisions in normal circumstances, where the vehicles adhere to the speed limits of the corridor. However, this approach does not guarantee safety if vehicles exceed the speed limits. Conversely, while the RSS-rule-based approach ensures collision prevention during emergencies when vehicles exceed speed limits, it may require larger arrival time gaps under normal circumstances, which may lead to reduced traffic flow. Our results are confirmed through numerical simulations.

2605.31180 2026-06-01 eess.SP

Impact of Phase Errors on Distributed NTN Beam Focusing

相位误差对分布式NTN波束聚焦的影响

Ahmad Nimr, Mohammad Parvini, Bitan Banerjee, Gerhard Fettweis

AI总结 针对未来非地面网络(NTN)中的协调卫星星座,研究分布式波束聚焦,分析理想同步下的相干合并增益,并推导相位误差对平均相干增益的闭式表达式,揭示同步和定时误差降低增益,而几何依赖效应决定空间聚焦行为。

详情
Comments
Submitted to WSA 2026
AI中文摘要

本文研究未来非地面网络(NTN)系统中具有相控阵的协调卫星星座的分布式波束聚焦。通过结合卫星位置、阵列方向、天线方向性和极化效应,建立了几何和信道模型。在理想同步下,分析了不同星座几何形状下可实现的相干合并增益,表明最大比传输(MRT)使接收功率随卫星数量呈二次方缩放。然后研究了由残余同步、定时、移动性和定位失配引起的相位误差的影响。针对均匀分布的定时偏移,推导了平均相干增益的闭式表达式,展示了从相干合并到非相干合并的转变。结果表明,同步和定时失配降低了相干合并增益,而几何依赖效应决定了最终的空间聚焦行为。数值结果进一步表明,线性和圆形星座提供了不同的聚焦特性和空间分离能力。然而,基于MRT的聚焦导致强旁瓣和有限的空间划分能力,因此需要联合优化模拟波束成形和数字预编码,以提高空间选择性以及对移动性和定位误差的鲁棒性。

英文摘要

This paper investigates distributed beam focusing for coordinated satellite constellations with phased arrays, motivated by future non-terrestrial network (NTN) systems. A geometric and channel model is developed by incorporating satellite positions, array orientations, antenna directivity, and polarization effects. Under ideal synchronization, the achievable coherent combining gain is analyzed for different constellation geometries, showing that maximum ratio transmission (MRT) enables quadratic scaling of the received power with the number of satellites. The impact of phase errors caused by residual synchronization, timing, mobility, and localization mismatches is then investigated. Closed-form expressions for the average coherent gain are derived for uniformly distributed timing offsets, demonstrating the transition from coherent to non-coherent combining. The results show that synchronization and timing mismatches reduce the coherent combining gain, while geometry dependent effects govern the resulting spatial focusing behavior. Numerical results further show that linear and circular constellations provide different focusing characteristics and spatial separation capabilities. However, MRT-based focusing results in strong sidelobes and limited spatial division capability, motivating the need for joint analog beamforming and digital precoding optimization to improve spatial selectivity and robustness against mobility and localization errors.

2605.31101 2026-06-01 eess.AS

On the Use of Dereverberation for Acoustic Feedback Cancellation

论去混响在声学反馈消除中的应用

Basil Liekens, Arnout Roebben, Toon van Waterschoot, Marc Moonen

AI总结 本文证明在温和条件下,声学反馈信号可视为源信号的混响版本,从而将联合去混响和声学反馈消除问题简化为纯去混响问题,并通过仿真验证。

详情
Comments
Accepted for publication in proceedings of EUSIPCO 2026
AI中文摘要

在公共广播系统和助听器中,最大可实现的放大或增益受到声学反馈的限制。因此,为了能够应用更高的增益,需要反馈消除方法。此外,通常还希望在播放前对记录信号进行去混响,即去除信号的晚期混响分量。本文证明,在两个温和条件下,声学反馈信号可以写为源信号的混响版本。因此,可以将联合去混响和声学反馈消除问题视为纯去混响问题,这意味着去混响算法可以应用于联合问题。仿真证实了这一发现。

英文摘要

In public address systems and hearing aids, the maximally achievable amplification or gain is limited by acoustic feedback. Therefore, in order to be able to apply a higher gain, feedback cancellation methods are required. In addition, it is oftentimes also desirable to dereverberate a recorded signal, that is, remove the late reverberation component of the signal, before playing it back. In this paper, it is shown that under two mild conditions, the acoustic feedback signal can be written as a reverberant version of the source signal. Therefore, it is possible to treat the joint dereverberation and acoustic feedback cancellation problem as a dereverberation-only problem, meaning that dereverberation algorithms can be applied to the joint problem. Simulations corroborate this finding

2605.31065 2026-06-01 eess.SP cs.AI

DRIFT: Joint Channel Estimation and Prediction Towards Pilotless 6G Non-Terrestrial Networks

DRIFT:面向无导频6G非地面网络的联合信道估计与预测

Bruno De Filippo, Carla Amatetti, Alessandro Vanelli-Coralli

AI总结 针对6G低轨卫星网络中导频开销大和星载计算受限的问题,提出一种轻量级联合信道估计与预测框架DRIFT,通过仅在初始时隙发送导频并利用数据驱动处理后续时隙,在低计算复杂度下实现高达12%的频谱效率提升。

详情
Comments
Submitted for publication
AI中文摘要

非地面网络(NTN)有望通过实现无处不在的连接和大规模通信,在第六代(6G)系统中发挥关键作用。在此背景下,信道预测成为一项关键技术,通过限制导频开销来提高频谱利用效率。然而,许多基于人工智能(AI)的预测器具有高推理复杂度,给星载实现带来挑战。本文针对低地球轨道(LEO)NTN,在严格功率约束限制模型复杂度的情况下,设计了精确且计算高效的信道预测技术,以实现频谱效率增益。我们提出了一种面向6G NTN的迭代联合信道估计与预测框架,通过仅在初始时隙传输导频,并在后续时隙依赖数据驱动处理,显著降低了导频开销。我们引入了DRIFT(无线信道跟踪的数据驱动细化与迭代预测),这是一种轻量级架构,以低计算成本和减少的误差传播来细化数据辅助的信道估计并预测未来的信道频率响应。研究了基于卷积层和长短期记忆层的两种预测器变体。在上行链路LEO NTN场景的端到端仿真中,结果表明,与传统基于导频的系统相比,所提方法实现了高达12%的频谱效率增益,对训练-测试不匹配具有鲁棒性,并在不同信道模型下保持一致的性能。此外,DRIFT所需的乘加运算少于20万次,使其适用于严格功率约束下的星载实现。

英文摘要

Non-terrestrial networks (NTNs) are expected to play a pivotal role in sixth-generation (6G) systems by enabling ubiquitous connectivity and massive communication. In this context, channel prediction emerges as a key technique to improve the spectrum utilization efficiency by limiting the pilot overhead. However, many proposed predictors based on artificial intelligence (AI) are characterized by high inference complexity, posing challenges to onboard implementation. In this paper, we address the challenge of designing accurate yet computationally efficient channel prediction techniques tailored to low Earth orbit (LEO) NTNs, where strict power constraints limit model complexity, to enable spectral efficiency gains. We propose an iterative joint channel estimation and prediction framework in the context of 6G NTNs that significantly reduces pilot overhead by transmitting pilots only in the initial slot and relying on data-driven processing for subsequent slots. We introduce Data-driven Refinement and Iterative Forecast for wireless channel Tracking (DRIFT), a lightweight architecture that refines data-aided channel estimates and predicts future channel frequency responses with low computational cost and reduced error propagation. Two predictor variants based on convolutional and long short-term memory layers are investigated. Simulation results in an end-to-end simulation of an uplink LEO NTN scenario show that the proposed approach achieves up to 12% spectral efficiency gain compared to conventional pilot-based systems, with robustness to training-test mismatches and consistent performance across different channel models. Moreover, DRIFT requires fewer than 200k multiply-accumulate operations, making it suitable for on-board satellite implementation under stringent power constraints.

2605.31059 2026-06-01 eess.SP cs.IT math.IT

CRB-Optimal Arrays and Waveforms in Active Sensing: Role of Redundancy and Spatial Covariance of Array Geometry

主动感知中的CRB最优阵列与波形:冗余性与阵列几何空间协方差的作用

Ids van der Werf, Robin Rajamäki, Geert Leus

AI总结 本文通过分析正交和相干波形下线性及平面阵列的Cramér-Rao界,揭示了最优阵列几何的冗余性本质,并推导了给定传感器总数下的最优收发分配方案。

详情
Comments
Accepted for publication in IEEE Transactions on Signal Processing
AI中文摘要

本文刻画了使用正交和相干波形的线性及平面阵列最优设计的性能极限。对于正交波形,我们证明单目标Cramér-Rao界(CRB)取决于发射(Tx)和接收(Rx)阵列所谓的空间方差之和,或者等价地,取决于由虚拟传感器多重性加权的和协阵列的空间方差。这表明CRB最优几何本质上是冗余的,突出了参数估计中均方误差(MSE)与可辨识性之间的基本权衡。此外,我们推导了给定传感器总数下的最优Tx-Rx传感器分配,并表明即使对于非冗余阵列,非均衡分配(偏向Rx)也是最优的,这挑战了传统设计。我们将结果扩展到平面阵列,提供了Tx和Rx阵列空间协方差应满足的新的一般条件,以使最优波形将功率导向目标方向。另外,我们建立了丢番图方程与具有相等CRB的阵列几何之间的联系,并给出了设计此类阵列的构造性方法。我们的工作为新兴主动传感多输入多输出系统中的最优阵列与波形设计提供了新的指导和见解。

英文摘要

This paper characterizes the performance limits of optimal array designs using orthogonal and coherent waveforms for both linear and planar arrays. For orthogonal waveforms, we show that the single-target Cramér-Rao Bound (CRB) depends on the sum of the so-called spatial variances of the transmit (Tx) and receive (Rx) arrays, or equivalently, the spatial variance of the sum co-array weighted by the multiplicities of the virtual sensors. This reveals that CRB-optimal geometries are inherently redundant, highlighting a fundamental trade-off between mean squared error (MSE) and identifiability in parameter estimation. Moreover, we derive optimal Tx-Rx sensor allocations given a total sensor budget and show that unequal allocation (favoring the Rx) is optimal even for nonredundant arrays, questioning conventional designs. We extend our results to planar arrays, providing a new general condition that the spatial covariances of the Tx and Rx arrays should satisfy for the optimal waveforms to direct power in the target direction. Additionally, we establish a connection between Diophantine equations and array geometries with equal CRB, along with a constructive method for designing such arrays. Our work provides new guidelines for and insights into optimal array and waveform design with relevance in emerging active sensing multiple-input multiple-output systems.

2605.31017 2026-06-01 eess.SP

Combining Cartesian and non-Cartesian acceleration techniques with SPARKLING for 1mm isotropic whole-brain MPRAGE in a minute

结合笛卡尔和非笛卡尔加速技术与SPARKLING实现一分钟内1mm各向同性全脑MPRAGE

Chaithya Giliyar Radhakrishna, Aurélien Massire, Blanche Bapst, Alexandre Vignaud, Philippe Ciuciu

AI总结 提出GoLF-SPARKLING框架,联合利用GRAPPA并行成像和变密度压缩感知,实现一分钟内高分辨率全脑成像,图像质量优于单独使用任一加速技术。

详情
AI中文摘要

目的:T1加权MPRAGE仍是临床解剖成像的基石,但其较长的采集时间限制了常规使用。现有的加速技术,即并行成像(PI)和压缩感知(CS),在推向高加速因子时往往会引入显著的噪声和模糊。尽管它们依赖于根本不同的冗余性,但协同结合两者仍然是一个开放的挑战。方法:将GoLF-SPARKLING框架扩展以联合利用两种加速机制:中心k空间区域基于GRAPPA的PI和外围区域的变密度CS,每个区域具有独立的加速因子。为了保持反转恢复期间平滑的信号演变并避免调制伪影,相应地重新排序了采集轨迹。所提出的方法在体内以1mm各向同性分辨率进行了前瞻性评估,并与Wave-CAIPI和泊松盘采样进行了基准比较。结果:所提出的混合方法在大约一分钟内产生了比单独使用任何一种加速策略更清晰、噪声更小、更稳定的全脑图像。纯PI重建因高g因子噪声而退化,而纯CS重建则表现出明显的模糊。此外,该方法在下游自动脑分割中产生的平均体积误差低于最先进的加速技术,证明了其临床实用性。结论:通过联合利用PI和CS,GoLF-SPARKLING实现了高加速因子,从而实现了亚分钟、高质量的解剖MRI。这转化为更高的临床通量和在难以扫描的患者中更可靠的成像。

英文摘要

Purpose: T1-weighted MPRAGE remains a cornerstone of clinical anatomical imaging, yet its long acquisition times constrain routine use. Established acceleration techniques, namely Parallel Imaging (PI) and Compressed Sensing (CS), tend to introduce substantial noise and blurring when pushed to high acceleration factors. Although they rely on fundamentally different redundancies, combining them synergistically remains an open challenge. Methods: The GoLF-SPARKLING framework was extended to jointly exploit two acceleration mechanisms: GRAPPA-based PI in the central k-space region and variable-density CS in the periphery, with independent acceleration factors in each zone. To preserve smooth signal evolution throughout the inversion-recovery period and avoid modulation artifacts, the acquisition trajectory was reordered accordingly. The resulting method was evaluated prospectively in vivo at 1mm isotropic resolution and benchmarked against Wave-CAIPI and Poisson-disk sampling. Results: The proposed hybrid approach produced sharper, less noisy, and more stable whole-brain images in approximately one minute than either acceleration strategy alone. Purely PI-based reconstructions were degraded by high g-factor noise, while purely CS-based reconstructions exhibited pronounced blurring. Furthermore, this method yielded lower average volumetric errors in downstream automated brain segmentation than state-of-the-art acceleration techniques, demonstrating its clinical utility. Conclusion: By jointly leveraging PI and CS, GoLF-SPARKLING achieves high acceleration factors that enable sub-minute, high-quality anatomical MRI. This translates into greater clinical throughput and more reliable imaging in patients who are challenging to scan.

2605.30993 2026-06-01 eess.AS

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

SwanVoice: 面向独白与对话的表现力长文本零样本语音合成

Ruiqi Li, Yu Zhang, Changhao Pan, Ke Lei, Xiang Yin, Cheng Yang

AI总结 提出SwanVoice零样本TTS模型,通过构建独白与对话语料库、结合VAE、原始文本条件与流匹配DiT,以及两阶段训练和强化学习后训练,在长文本多说话人对话中实现表现力连贯性和可控切换。

详情
Comments
Technical Report
AI中文摘要

零样本文本到语音(TTS)在单说话人合成方面已有显著提升,但富有表现力的长文本多说话人对话仍然困难。常见的解决方法是使用独白TTS模型合成每一轮对话并将输出拼接起来。这增加了推理成本,并且常常破坏跨轮次的声学一致性、对话连贯性和情感连续性。最近的对话TTS系统已开始解决这一问题,但它们仍难以同时保持表现力连贯性、可控说话人切换和独白质量。我们提出SwanData-Speech和SwanVoice。SwanData-Speech从野外音频构建独白和对话语料库,使用Swan强制对齐器进行停顿感知的词级对齐,以及RobustMegaTTS3处理发音困难情况。基于这些数据,SwanVoice是一个针对1-4个说话人的零样本TTS模型,结合了25 Hz VAE、带有停顿感知符号和拼音替换的原始文本条件,以及带有说话人-轮次条件的流匹配DiT。训练从独白语音开始,经过混合和真实对话数据,然后使用带有音素级和说话人相似度奖励的DiffusionNFT后训练。在SwanBench-Speech上,SwanVoice在独白和对话设置中均获得了比所有评估的开源基线更高的丰富度和层次度得分,而内容准确性仍是主要限制。音频演示可在https://swanaigc.github.io//#swanvoice获取。

英文摘要

Zero-shot text-to-speech (TTS) has improved substantially for single-speaker synthesis, yet expressive long-form multi-speaker dialogue remains difficult. A common workaround is to synthesize each turn with a monologue TTS model and stitch the outputs together. This adds inference cost and often breaks acoustic consistency, conversational coherence, and affective continuity across turns. Recent dialogue TTS systems have begun to address this setting, but they still struggle to keep expressive coherence, controllable speaker switching, and monologue quality at the same time. We present SwanData-Speech and SwanVoice. SwanData-Speech builds monologue and dialogue corpora from in-the-wild audio, using Swan Forced Aligner for pause-aware word-level alignment and RobustMegaTTS3 for pronunciation-hard cases. Built on these data, SwanVoice is a zero-shot TTS model for 1--4 speakers, combining a 25 Hz VAE, raw-text conditioning with pause-aware symbols and pinyin substitution, and a flow-matching DiT with speaker-turn conditioning. Training starts from monologue speech, moves through mixed and real dialogue data, and then uses DiffusionNFT post-training with phone-level and speaker-similarity rewards. On SwanBench-Speech, SwanVoice obtains higher richness and hierarchy scores than all evaluated open-source baselines in both monologue and dialogue settings, while content accuracy remains the main limitation. Audio demos are available at https://swanaigc.github.io//#swanvoice.

2605.30988 2026-06-01 eess.SP

Distribution-Aware Constellation Learning for Image Transmission

面向图像传输的分布感知星座学习

Xufeng Zhang, Yinhuan Huang, Jingkai Ying, Huan Liu, Zhijin Qin

AI总结 提出一种分布感知的可学习调制方法,通过星座学习桥接语义特征与离散调制,利用可学习星座模块和两阶段训练策略,在数字语义通信中实现优于现有方案且媲美模拟方法的性能。

详情
AI中文摘要

语义通信在带宽受限和低信噪比场景下展现出巨大潜力,尤其适用于图像传输。然而,现有方法大多基于模拟传输,与现有数字通信系统的兼容性面临挑战。现有的数字语义通信方法通常采用传统正交幅度调制星座,与语义编码器产生的语义特征的经验分布不匹配。本文提出一种分布感知的可学习调制框架,通过星座学习桥接语义特征表示与离散调制。具体而言,开发了一个可学习星座模块,以幅相键控几何先验初始化,将星座几何形状优化为可训练码本,使调制符号更好地对齐语义特征的分布。为实现端到端优化,引入两阶段训练策略,结合可微软分配与直通估计器。仿真结果表明,所提框架持续优于现有数字语义通信方案,并达到与先进模拟方法相当的性能。

英文摘要

Semantic communication has demonstrated significant potential for image transmission, especially in bandwidth-limited and low signal-to-noise ratio scenarios. However, most existing methods are based on analog transmission, which poses challenges to the compatibility with existing digital communication systems. Existing digital semantic communication methods commonly adopt conventional quadrature amplitude modulation constellations, which mismatch the empirical distribution of semantic features produced by the semantic encoder. This paper proposes a distribution-aware learnable modulation for semantic communication framework, which bridges semantic feature representations and discrete modulation through constellation learning. Specifically, a learnable constellation module, initialized with an amplitude phase shift keying geometric prior, is developed to refine the constellation geometry as a trainable codebook, enabling modulation symbols to better align with the distribution of semantic features. To enable end-to-end optimization, a two-stage training strategy is introduced, combining differentiable soft assignment with straight-through estimator. Simulation results show that the proposed framework consistently outperforms existing digital semantic communication schemes and achieves performance comparable to advanced analog methods.

2605.30973 2026-06-01 eess.IV cs.GR eess.SP

SCALMU: Synthetically-trained Coupling of Adaptive Learned Multiplicative Updates for Hyperspectral-Multispectral Fusion

SCALMU:自适应学习乘法更新的合成训练耦合用于高光谱-多光谱融合

Xinxin Xu, Yann Gousseau, Christophe Kervazo, Saïd Ladjal

AI总结 提出一种名为SCALMU的展开神经网络架构,通过将自适应可学习矩阵集成到CNMF乘法更新框架中,结合合成数据集训练,实现高光谱与多光谱图像的高质量融合。

详情
AI中文摘要

高光谱-多光谱图像(HSI-MSI)融合通过结合低空间分辨率高光谱图像的丰富光谱信息与多光谱图像的详细空间结构,实现高分辨率高光谱成像。经典方法如耦合非负矩阵分解(CNMF)具有强物理可解释性,但结果不如深度学习方法。为解决此局限,我们提出SCALMU(合成训练的自适应学习乘法更新耦合),一种新颖的展开神经网络架构,将自适应可学习矩阵集成到CNMF乘法更新的经典框架中,提升其性能。由于其架构与CNMF接近,所得算法保留了物理可解释性和非负性约束。为克服训练数据稀缺,我们通过死叶模型额外生成合成HSI-MSI数据集,实现合成监督。SCALMU在该数据集上进行端到端训练。实验表明,SCALMU在多个数据集上优于现有方法。代码可在https://github.com/xinxinxu99/SCALMU.git获取。

英文摘要

HyperSpectral-MultiSpectral Image (HSI-MSI) fusion enables high-resolution hyperspectral imaging by combining the rich spectral information of low-spatial-resolution hyperspectral images with the detailed spatial structure of multispectral images. Classical methods such as Coupled Nonnegative Matrix Factorization (CNMF) benefit from a strong physical interpretability but suffer from inferior results compared to their deep-learning counterparts. To address this limitation, we propose SCALMU (Synthetically-trained Coupling of Adaptive Learned Multiplicative Updates), a novel unrolled neural network architecture that integrates adaptive learnable matrices within the classical framework of CNMF multiplicative updates, improving its results. Due to its architectural proximity with CNMF, the resulting algorithm preserves physical interpretability and nonnegativity constraints. To overcome data scarcity for training, we additionally generate a synthetic HSI-MSI dataset via the dead leaves model, enabling synthetic supervision. SCALMU is then trained end-to-end on this dataset. Experiments demonstrate SCALMU's superiority over state-of-the-art methods on several datasets. The code is available at https://github.com/xinxinxu99/SCALMU.git

2605.30965 2026-06-01 eess.AS cs.AI cs.CL

ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

ImmersiveTTS:基于多模态扩散Transformer和领域特定表示对齐的环境感知文本转语音

Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee

AI总结 提出ImmersiveTTS模型,通过多模态扩散Transformer和领域特定表示对齐,实现与环境音频自然融合的文本到语音生成。

详情
Comments
Accepted to ACL 2026 main conference. Code is available at https://github.com/jjunak-yun/ImmersiveTTS
AI中文摘要

最近在文本引导音频生成方面的进展在声音效果、语音和音乐等多个领域取得了有希望的结果。然而,由于语音和环境音频在声学模式和时域动态上的固有差异,联合生成语音和环境音频仍然具有挑战性。我们提出了ImmersiveTTS,一种环境感知的文本到语音(TTS)模型,通过显式建模跨模态交互,生成与环境上下文无缝融合的自然语音。我们的模型基于多模态扩散Transformer,并通过联合注意力将转录对齐的语音潜在表示与文本条件的环境上下文融合。为了增强语义一致性,我们引入了一种针对环境感知TTS量身定制的领域特定表示对齐目标,利用来自语音和音频编码器的互补自监督表示。实验结果表明,在客观指标和人类听力测试中,ImmersiveTTS在自然度、可懂度和音频保真度方面均优于现有方法。

英文摘要

Recent advancements in text-guided audio generation have yielded promising results in diverse domains, including sound effects, speech, and music. However, jointly generating speech with environmental audio remains challenging due to the inherent disparities in their acoustic patterns and temporal dynamics. We propose ImmersiveTTS, an environment-aware text-to-speech (TTS) model that generates natural speech seamlessly integrated within environmental contexts by explicitly modeling cross-modal interactions. Our model builds on a multimodal diffusion transformer and fuses transcript-aligned speech latent with text-conditioned environmental context via joint attention. To enhance semantic consistency, we introduce a domain-specific representation alignment objective tailored to environment-aware TTS, leveraging complementary self-supervised representations from speech and audio encoders. Experimental results show that ImmersiveTTS achieves higher naturalness, intelligibility, and audio fidelity than existing approaches across objective metrics and human listening tests.

2605.30940 2026-06-01 eess.AS cs.MM cs.SD

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

面向流式同步空间音频生成的自回归扩散Transformer

Ke Lei, Yu Zhang, Changhao Pan, Xueyi Pu, Wenxiang Guo, Ruiqi Li, Zhou Zhao

AI总结 提出SwanSphere统一流式框架,通过因果自回归扩散Transformer、空间视频-音频对比学习及多目标在线直接偏好优化,实现从全景视频和文本提示生成高保真空间音频,并开发自动化标注管道缓解数据稀缺。

详情
Comments
Accepted by ICML 2026
AI中文摘要

实时且准确的空间音频生成对于提供沉浸式体验至关重要。然而,现有的空间音频合成技术通常受限于生成质量与高推理延迟之间的权衡,以及难以从多模态输入中捕获精确的空间信息。为应对这些挑战,我们提出了SwanSphere,一个统一的流式框架,用于从全景视频和文本提示生成高保真空间音频。SwanSphere主要做出以下贡献:1)我们引入了一种因果自回归扩散Transformer架构,支持流式高质量空间音频生成。2)我们设计了一种空间视频-音频对比学习策略,以对齐视频编码器与声学领域,并进一步采用多目标在线直接偏好优化方案,从而实现强大的空间感知和鲁棒的多模态空间音频合成。3)为缓解当前空间音频数据集的稀缺性,我们还开发了一个自动化标注管道,用于生成详细的空间描述。实验结果表明,SwanSphere在视频到空间和文本到空间音频生成任务中均取得了优越性能。演示可在 https://swanaigc.github.io 找到。

英文摘要

Real-time and accurate spatial audio generation is pivotal for delivering an immersive experience. However, existing spatial audio synthesis technologies are often encumbered by a tradeoff between generation quality and high inference latency, as well as difficulty in capturing precise spatial information from multimodal inputs. To address these challenges, we propose SwanSphere, a unified streaming framework for high-fidelity spatial audio generation from panoramic videos and text prompts. SwanSphere mainly makes the following contributions: 1) We introduce a causal autoregressive diffusion transformer architecture that enables streaming high-quality spatial audio generation. 2) We design a Spatial Video-Audio Contrastive (SVAC) learning strategy to align the video encoder with the acoustic domain, and further employ a multi-objective online direct preference optimization (ODPO) scheme, resulting in strong spatial perception and robust multimodal spatial audio synthesis. 3) To alleviate the current scarcity of spatial audio datasets, we also develop an automated annotation pipeline for generating detailed spatial captions. Experimental results demonstrate that SwanSphere achieves superior performance in both video-to-spatial and text-to-spatial audio generation tasks. Demos can be found at: https://swanaigc.github.io.