arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 186
2605.26063 2026-05-26 eess.SP

Blind SNR Estimation for FSO Communication Systems with Deep Fading

深衰落FSO通信系统中的盲SNR估计

Humberto V. Q. Melo, Robson A. Colares, Darli A. A. Mello

AI总结 针对深衰落FSO通信系统,评估M2M4和EVM方法进行实时SNR估计,实验表明M2M4估计器能可靠跟踪SNR轮廓,适用于触发收发器自适应。

详情
AI中文摘要

我们评估了M2M4和EVM方法在深衰落FSO通信系统中进行实时SNR估计的性能。通过使用受控深衰落的实验装置,我们表明M2M4估计器能可靠地跟踪SNR轮廓,使其适用于触发收发器自适应。

英文摘要

We evaluate the M2M4 and EVM methods for real-time SNR estimation in FSO communication systems subject to deep fading. Using an experimental setup with controlled deep fading, we show that the M2M4 estimator reliably tracks the SNR profile, making it suitable for triggering transceiver adaptation.

2605.26043 2026-05-26 eess.SY cs.SY

Robust Tracking of Curvature-Constrained Paths for Uncertain Dubins Systems

不确定Dubins系统的曲率约束路径鲁棒跟踪

Xingjian Xue, Sze Zheng Yong

AI总结 针对具有不确定Dubins动力学的车辆/机器人,提出基于滑模控制的鲁棒跟踪控制器,保证横向和航向误差在有限扰动下收敛到零。

详情
Comments
6 pages, 3 figures. Accepted to IFAC World Congress 2026
AI中文摘要

本文提出了一种鲁棒跟踪控制器,用于跟踪具有不确定Dubins动力学的车辆/机器人的曲率约束路径。尽管Dubins路径已广泛应用于车辆和机器人应用中,但在模型不确定性下的鲁棒和收敛跟踪仍研究不足。为此,我们提出了基于滑模控制的路径跟踪控制器,该控制器在横向坐标框架中制定,保证了在存在有界扰动的情况下横向和航向误差的不变性和收敛到零。仿真结果表明,所提出的方法即使在扰动下也能可靠地跟踪路径,并且显著优于基于滑模控制器的现有方法。

英文摘要

This paper presents a robust tracking controller for tracking curvature-constrained paths by vehicles/robots with uncertain Dubins dynamics. Although Dubins paths have been widely used in vehicular and robotic applications, robust and convergent tracking under model uncertainties remains understudied. To address this, we propose path tracking controllers based on sliding mode control, formulated in the transverse coordinate frame, which guarantee invariance and convergence of both lateral and heading errors to zero in the presence of bounded disturbances. Simulation results show that the proposed method reliably tracks paths despite disturbances and significantly outperforms existing methods based on sliding mode controllers.

2605.26042 2026-05-26 eess.SP

Alt-CC-PINN: An Alternating Optimization Framework with Implicit Neural Representation for Microwave Inverse Scattering Imaging

Alt-CC-PINN:一种用于微波逆散射成像的隐式神经表示交替优化框架

Shilong Sun

AI总结 提出基于交叉相关物理信息神经网络的交替优化框架(Alt-CC-PINN),通过解耦物理场演化与神经网络参数推断,结合解析共轭梯度法和深度学习优化器,有效解决高对比度、低信噪比环境下的微波逆散射成像局部极小问题。

详情
AI中文摘要

微波逆散射成像(MISI)是微波无损评估和近场微波传感系统中的关键计算技术。然而,由于严重的多重散射效应和电磁逆问题固有的不适定性,高对比度目标的定量重建仍然是一个艰巨的挑战。为了克服计算微波成像中的这一基本瓶颈,本文提出了一种基于交叉相关物理信息神经网络的交替优化框架(Alt-CC-PINN)。该架构将微波物理场的演化与基于神经网络的介电参数推断深度解耦,用混合交替引擎取代传统的联合优化。具体而言,该方法采用由交叉相关损失驱动的解析Polak-Ribière共轭梯度(PR-CG)算法来最优更新对比源,并采用批量补零二维FFT确保高计算效率。随后,利用深度学习优化器更新连续神经表示。基于模拟和实测数据的广泛验证表明,Alt-CC-PINN有效克服了高对比度和低信噪比(SNR)环境下的局部极小问题。它在跳频探测策略下展现出优越的重建保真度和鲁棒性,为实际微波成像系统提供了强大可靠的计算电磁求解器。

英文摘要

Microwave inverse scattering imaging (MISI) is a crucial computational technique in microwave nondestructive evaluation and near-field microwave sensing systems. However, quantitative reconstruction of high-contrast targets remains a formidable challenge due to severe multiple scattering effects and the inherent ill-posedness of electromagnetic inverse problems. To overcome this fundamental bottleneck in computational microwave imaging, this paper proposes an alternating optimization framework based on cross-correlated physics-informed neural network (Alt-CC-PINN). This architecture deeply decouples the evolution of the microwave physical field from the neural-network-based dielectric parameter inference, replacing traditional joint optimization with a hybrid alternating engine. Specifically, the method employs an analytical Polak-Ribière conjugate gradient (PR-CG) algorithm driven by a cross-correlated loss to optimally update the contrast sources, and deploys batched zero-padded 2D-FFT to ensure high computational efficiency. Subsequently, a deep learning optimizer is utilized to update the continuous neural representation. Extensive validations based on simulated and measured data demonstrate that Alt-CC-PINN effectively overcomes the local minima problem in high-contrast and low-signal-to-noise-ratio (SNR) environments. It exhibits superior reconstruction fidelity and robustness under the frequency-hopping probing strategy, providing a powerful and reliable computational electromagnetic solver for practical microwave imaging systems.

2605.25168 2026-05-26 eess.IV cs.AI cs.CV

Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

创建临床验证的皮肤镜图像数据集的方法论

Kozachok Elena Sergeevna

AI总结 提出一种结合移动皮肤镜图像采集标准操作程序、结构化元数据信息模型和多阶段专家验证的方法,构建临床验证的皮肤镜图像数据集,用于医学信息学研究。

详情
Comments
22 pages, 5 figures, 5 tables
AI中文摘要

本研究提出了一种构建临床验证的皮肤镜图像数据集的方法,用于医学信息学研究。该工作的相关性在于,自动化诊断支持系统的性能不仅取决于图像数量,还取决于图像采集过程的可重复性、结构化元数据的完整性以及诊断标签的可靠性。国际数据集主要是在与俄罗斯常规门诊实践和移动皮肤镜显著不同的条件下创建的。所提出的方法整合了三个相互关联的组成部分:(1)通过移动皮肤镜采集图像的标准操作程序(SOP),(2)一个信息模型,包含16个结构化元数据字段,组织成六个临床导向的块,采用ISIC兼容的符号表示,以及(3)多阶段专家验证诊断标签(初始临床注释、三位专家的共识审查以及所有恶性肿瘤的组织学确认)。使用该方法,在2025年6月至2026年5月期间,收集了来自443名患者的1026张独特的皮肤镜图像数据集。从1044条初始记录中排除了18个重复项。该数据集包括九个疾病类别;所有39个恶性病变(18个黑色素瘤、15个基底细胞癌和6个鳞状细胞癌)均经过组织学验证。患者年龄范围为2至90岁(中位年龄38岁),其中女性279人(63%),男性164人(37%)。每张图像都附有专家注释的皮肤镜结构和明确的verification_stage字段,指示诊断确认的水平。所得数据集作为临床验证的试点资源,适用于独立模型评估、域偏移分析、可解释性研究和进一步扩展。

英文摘要

This study presents a methodology for constructing a clinically verified dataset of dermatoscopic images for medical informatics research. The relevance of the work is driven by the fact that the performance of automated diagnostic support systems depends not only on the volume of images, but also on the reproducibility of the image acquisition procedure, the completeness of structured metadata, and the reliability of diagnostic labels. International collections were primarily created under conditions that differ substantially from routine Russian outpatient practice and mobile dermatoscopy. The proposed methodology integrates three interconnected components: (1) a standard operating procedure (SOP) for acquiring images via mobile dermatoscopy, (2) an information model comprising 16 structured metadata fields organized into six clinically oriented blocks in ISIC-compatible notation, and (3) a multi-stage expert verification of diagnostic labels (initial clinical annotation, consensus review by three specialists, and histological confirmation of all malignant neoplasms). Using this methodology, a dataset of 1,026 unique dermatoscopic images from 443 patients was collected between June 2025 and May 2026. From 1,044 initial records, 18 duplicates were excluded. The dataset includes nine nosological categories; all 39 malignant lesions (18 melanomas, 15 basal cell carcinomas, and 6 squamous cell carcinomas) were histologically verified. Patient age ranged from 2 to 90 years (median 38), with 279 females (63%) and 164 males (37%). Each image is accompanied by expert-annotated dermatoscopic structures and an explicit verification_stage field indicating the level of diagnostic confirmation. The resulting dataset serves as a pilot clinically verified resource suitable for independent model evaluation, domain shift analysis, interpretability studies, and further expansion.

2605.25940 2026-05-26 eess.IV cs.CV

How Accurate are Video Quality Models for Diffusion-Based Video Super-Resolution?

扩散模型视频超分辨率中的视频质量模型有多准确?

Benjamin Herb, Steve Göring, Alexander Raake, Rakesh Rao Ramachandra Rao

AI总结 本研究通过主观测试比较了六种扩散模型视频超分辨率方法,评估现有视频质量模型(尤其是全参考和无参考模型)在扩散VSR上的准确性,发现基于CNN的全参考模型相关性较高但均不足以替代主观测试。

详情
Comments
Accepted for the 18th International Conference on Quality of Multimedia Experience (QoMEX 2026)
AI中文摘要

最近的视频超分辨率(VSR)方法使用深度神经网络来增强低质量输入视频并恢复视觉细节,其中基于扩散的方法尤其显示出有希望的结果。在本文中,我们通过将模型预测与主观测试结果进行比较,研究现有视频质量模型是否可用于评估这些基于扩散的VSR方法的性能。该研究比较了六种上采样方法(Lanczos、Rhea、SCST、DOVE、SeedVR2、Starlight Mini),应用于压缩(AV1和DCVC-RT)和未压缩的低分辨率视频,考虑在UHD-1/4K屏幕上播放。使用一系列全参考和无参考质量模型来评估它们对这种新型质量退化的适用性,重点关注序列内性能。结果强调,基于CNN的全参考模型,如LPIPS、DISTS和CVQA-FR,显示出比传统全参考模型以及测试的无参考模型显著更高的相关系数。大多数模型高估了SCST过度锐利的结果,VMAF主要由于Starlight Mini引入的空间不一致而失败。测试的视频质量模型均未达到足够的准确性以替代互补的主观测试。参考、降质和上采样的视频,以及用户评分和模型分数,随论文在https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-VSR作为开放数据提供。

英文摘要

Recent video super-resolution (VSR) approaches use deep neural networks to enhance low-quality input videos and recover visual detail, with diffusion-based methods in particular showing promising results. In this paper, we investigate whether existing video quality models can be used to assess the performance of these diffusion-based VSR methods, by comparing model predictions with results from a subjective test. The study compares six upscaling methods (Lanczos, Rhea, SCST, DOVE, SeedVR2, Starlight Mini) applied to both compressed (AV1 and DCVC-RT) and uncompressed low-resolution videos considering the play-out on a UHD-1/4K screen. A range of full- and no-reference quality models are used to assess their applicability to this new type of quality degradation, focusing on within-sequence performance. The results highlight that CNN-based full-reference models, such as LPIPS, DISTS, and CVQA-FR show significantly higher correlation coefficients than both conventional full- as well as the tested no-reference models. Most overestimate the overly sharp results of SCST, with VMAF mainly failing due to spatial inconsistencies introduced by Starlight Mini. None of the tested video quality models reach sufficient accuracy so as to replace complementary subjective testing. The reference, degraded and upscaled videos, as well as the user ratings and model scores are made available with the paper at https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-VSR as open data.

2605.25928 2026-05-26 cs.CL cs.SD eess.AS

Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization

Thaka at KSAA-2026 Task 2: 用于阿拉伯语音节符号化的正则化微调

Meshal Alamr, Hassan Alqaeri, Abdullah Aldahlawi

AI总结 针对低资源阿拉伯语音节符号化任务,通过正则化微调CATT-Whisper多模态模型,结合R-Drop一致性正则化、Optuna优化超参数和Focal Loss,在KSAA-2026共享任务中取得第一名。

详情
Comments
4 pages, 1 figure. Published in Proceedings of OSACT7 (LREC 2026). Winning system for KSAA-2026 Task 2 on Arabic Speech Diacritization
AI中文摘要

我们描述了KSAA-2026阿拉伯语音听写自动音节符号化共享任务Task 2的获胜系统。该任务要求从语音音频和无音节符号的转录文本中生成完全带音节符号的阿拉伯语文本,仅提供2,327个训练样本且不允许使用外部数据。我们的系统微调了CATT-Whisper,这是一个字符级多模态模型,结合了预训练的CATT文本编码器和冻结的Whisper语音编码器。我们方法的关键是训练正则化:R-Drop一致性正则化、使用高权重衰减的Optuna优化超参数以及Focal Loss。在推理时,我们在四个模型检查点上使用蒙特卡洛Dropout在softmax概率级别平均200次随机前向传播。该系统在主要排行榜指标(包括词尾变化,含无音节符号位置)上实现了23.26%的词错误率,在所有参与者中排名第一。

英文摘要

We describe the winning system for Task 2 of the KSAA-2026 Shared Task on Arabic Speech Dictation with Automatic Diacritization. The task requires producing fully diacritized Arabic text from speech audio and undiacritized transcripts, with only 2,327 training samples available and no external data permitted. Our system fine-tunes CATT-Whisper, a character-level multimodal model combining a pretrained CATT text encoder with a frozen Whisper speech encoder. The key to our approach is training regularization: R-Drop consistency regularization, Optuna-optimized hyperparameters with high weight decay, and Focal Loss. At inference, we average 200 stochastic forward passes across four model checkpoints using Monte Carlo Dropout at the softmax probability level. The system achieves 23.26% WER on the primary leaderboard metric (with case endings, including no-diacritic positions), placing 1st among all participants.

2605.25878 2026-05-26 eess.IV cs.CV

A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation

临床验证的基础模型用于全面肺部病理解读

Zhengrui Guo, Zhengyu Zhang, Jiabo Ma, Yihui Wang, Fengtao Zhou, Yingxue Xu, Ling Liang, Chenglong Zhao, Qi Xie, Jinbang Li, Shujing Guo, Fangyi Han, Zhijian Cen, Ziyi Liu, Cheng Jin, Junlin Hou, Zhixuan Chen, Yu Cai, Lijuan Qu, Shifu Chen, Yueping Liu, Zhe Wang, Xiuming Zhang, Muyan Cai, Li Liang, Hao Chen

AI总结 提出PulmoFoundation,一种基于Virchow2和约4万张H&E染色全切片图像进行亚专科预训练的肺部病理基础模型,通过32项临床任务和前瞻性随机对照试验验证,在诊断准确性、效率和一致性上显著提升。

详情
AI中文摘要

病理评估指导肺癌诊断、治疗选择和预后评估,但当前的CPath方法依赖于针对孤立目标的任务特定模型。尽管泛癌基础模型提供了多功能性,但它们缺乏亚专科深度,且未在临床工作流程中评估或在真实世界环境中进行前瞻性验证。我们介绍了PulmoFoundation,这是一个多中心、前瞻性验证、随机对照试验(RCT)评估的基础模型,用于术前、术中和术后护理的全面肺部病理评估。PulmoFoundation基于Virchow2,通过使用约40,000张诊断性H&E染色全切片图像(WSI)进行亚专科特定预训练构建,并在约26,000张WSI上系统评估了32项临床相关任务。除了准确预测分子标记和患者生存率外,我们的模型在活检、冰冻切片和手术切除切片的核芯诊断任务中达到了临床级性能。在一项针对1,357名患者、涵盖11项诊断任务的注册前瞻性研究中,我们的模型实现了平均AUC 92.3%。使用预设的分诊阈值,PulmoFoundation可以减少68.8%的活检和83.0%的冰冻切片的额外二次复核负担,并推迟44.5%的IHC染色订单,阳性预测值分别为1.0、0.991和0.966。除了前瞻性验证,我们还进行了一项交叉RCT,涉及八名病理学家,AI辅助在4,928个病例-阅片者对中提高了诊断准确性(有AI为91.7%,无AI为83.8%)。AI辅助还使中位诊断时间减少了19.6%,诊断信心提高了8.7%,并将阅片者间一致性从中等(kappa=0.56)提高到显著(kappa=0.76)。这些评估共同支持PulmoFoundation作为临床验证的肺部病理决策支持系统。

英文摘要

Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.

2605.25870 2026-05-26 eess.SP math.ST stat.AP stat.TH

The Symmetric Location Problem: a Song of Efficiency and Robustness

对称定位问题:效率与鲁棒性的协奏曲

Stefano Fortunati

AI总结 本文通过半参数统计框架解决对称定位问题,在存在无限维 nuisance 参数时实现有限维参数的估计,兼顾统计效率与分布自由鲁棒性。

详情
AI中文摘要

本讲义旨在向信号处理(SP)社区介绍一种强大但仍未充分利用的工具:半参数统计。简而言之,半参数框架允许我们在存在无限维 nuisance 参数(例如噪声密度)的情况下估计或对有限维参数进行假设检验。显然,该框架足够通用,几乎涵盖所有 SP 应用。值得注意的是,正如标题借鉴乔治·R·R·马丁著名系列丛书所暗示的,半参数统计相对于参数和非参数统计的最大优势在于它能够调和两个看似对立的概念:统计效率和鲁棒性。这里,鲁棒性被理解为分布自由性,即估计性能必须对生成数据分布的函数形式缺乏知识具有鲁棒性。为了确切解释这意味着什么,在本讲义中,我们将重点关注著名且基本的对称定位问题。对称定位问题是一个基本问题,可以在无数 SP 领域中找到(以各种形式):源定位、时间同步、阵列信号处理和分布式传感器网络等。此外,值得注意的是,我们将针对这一特定问题开发的方法可以扩展到更一般的半参数估计问题,例如椭圆数据中的位置向量和协方差矩阵的估计。

英文摘要

The aim of this Lecture Note is to introduce the Signal Processing (SP) community to a powerful yet still under-utilised tool: the semiparametric statistics. In short, the semiparametric framework allows us to estimate or perform hypothesis testing on a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter (i.e. a function), such as the density of the noise. Clearly, this framework is general enough to include almost every SP application. Remarkably, as the title suggests drawing on George R. R. Martin's famous book series, the greatest advantage of semiparametric statistics over parametric and non-parametric ones lies in the fact that it is able to reconcile two seemingly dichotomous concepts: statistical efficiency and robustness. Here, robustness is understood in the sense of distribution-freeness, that is the estimation performance must be robust with respect to the lack of knowledge of the functional form of the generating data distribution. To explain exactly what this means, in this Lecture Note we will focus our attention on the famous and fundamental symmetric location problem. The symmetric location problem is a fundamental problem that can be found (in various forms) in countless areas of SP: source localization, time synchronization, array signal processing, and distributed sensor networks, just to name a few. Furthermore, it is important to note that the methodology we will develop for this specific problem can be extended to much more general semiparametric estimation problems, such as the estimation of the location vector and covariance matrix in elliptical data.

2605.25867 2026-05-26 eess.SY cs.SY

CINOC: Cardinality-Invariant Neural Operator Policies for Scalable PDE Control

CINOC: 用于可扩展PDE控制的不变性基数神经算子策略

Pietro Zanotta, Dibakar Roy Sarkar, Honghui Zheng, Somdatta Goswami, Ján Drgoňa

AI总结 提出基数不变神经算子控制(CINOC),通过将PDE控制重构为算子学习问题,实现策略在传感器、执行器或智能体配置变化时的零样本迁移和可扩展性。

详情
AI中文摘要

基于学习策略控制偏微分方程(PDE)仍然受到固定维度表示的根本限制:为特定传感器、执行器或智能体配置训练的策略在配置变化时通常会失效。这一限制在多智能体PDE控制中尤为严重,策略无法在不重新训练的情况下跨群体规模扩展。我们通过引入基数不变神经算子控制(CINOC)来解决这一挑战,将PDE控制重新表述为一个算子学习问题,将状态场映射到连续控制函数,并通过可微PDE求解器进行端到端训练,从而产生能够自然适应不同传感器和执行器配置的策略。值得注意的是,在小型群体上训练的CINOC策略表现出基数不变性,允许零样本迁移到显著更大的群体,并且对部分智能体故障具有鲁棒性。这种可扩展性源于智能体共享一个通用策略并通过其物理环境进行协调,这产生了一种涌现的自归一化效应。为了解释这一现象,我们提供了一个基于平均场理论的定理,证明从有限智能体系统计算的策略梯度收敛到连续控制极限的策略梯度。在实验上,我们在线性、非线性、混沌和湍流PDE的跟踪、稳定和密度输运任务上验证了CINOC。

英文摘要

Controlling partial differential equations (PDEs) with learning-based policies remains fundamentally limited by fixed-dimensional representations: policies trained for a specific sensor, actuator, or agent configuration typically fail when the configuration changes. This limitation is particularly severe in multi-agent PDE control, where policies do not scale across population sizes without retraining. We address this challenge by introducing Cardinality Invariant Neural Operator Control (CINOC), reformulating PDE control as an operator learning problem that maps state fields to continuous control functions and trains them end-to-end through differentiable PDE solvers, yielding policies that naturally adapt to varying sensor and actuator configurations. Remarkably, CINOC policies trained on small swarms exhibit cardinality invariance, allowing for zero-shot transfer to significantly larger populations as well as robustness to partial agent failure. This scalability arises from agents sharing a common policy and coordinating through their physical environment, which produces an emergent self-normalization effect. To explain this phenomenon, we provide a theorem grounded in mean-field theory demonstrating that policy gradients computed from finite-agent systems converge to those of a continuous control limit. Empirically, we validate CINOC on tracking, stabilization, and density transport across linear, nonlinear, chaotic, and turbulent PDEs.

2605.25847 2026-05-26 eess.SY cs.SY

Optimal Dispatch of Connected and Autonomous Electric Vehicles to Enhance Short-Term Grid Flexibility in Smart Cities

智能城市中联网自动驾驶电动汽车的最优调度以增强短期电网灵活性

Nikolas Sacchi, Giacomo Basile, Silvia Siri, Manuela Minetti, Andrea Bonfiglio, Antonella Ferrara

AI总结 提出一种协调的能源-出行调度框架,通过动态调度配备虚拟电池分区的联网自动驾驶电动汽车车队,在时间约束下为智能城市提供电网支持服务,并利用模型预测控制满足出行能量需求和截止时间要求。

详情
AI中文摘要

本文提出了一种在时间约束下为智能城市提供电网支持服务的协调能源-出行调度框架。特别地,考虑了一个场景,其中分布式系统运营商在给定截止时间内请求指定数量的能量。配备虚拟电池分区的一队联网自动驾驶电动汽车被动态调度至车辆到电网站点。路由问题被表述为周期性更新的资源约束最短路径,考虑了时间和能量约束,以及来自动态交通模型的拥堵相关出行时间。在车辆层面,模型预测控制策略调节速度以满足出行能量需求,同时确保截止时间合规。通过在意大利拉帕洛的城市网络上的仿真验证了该框架,证明了对拥堵引起的延迟的鲁棒性。

英文摘要

This paper proposes a coordinated energy-mobility dispatch framework for grid support service provision in smart cities under time constraints. In particular, a scenario in which a distributed system operator requests a specified amount of energy within a given deadline is considered. A fleet of connected autonomous electric vehicles equipped with virtual battery partitioning is dynamically dispatched toward vehicle-to-grid stations. The routing problem is formulated as a periodically updated resource-constrained shortest path, accounting for time and energy constraints with congestion-dependent travel times derived from a dynamic traffic model. At the vehicle level, a model predictive control strategy regulates speed to satisfy mobility energy requirements while ensuring deadline compliance. The framework is validated through simulations on the urban network of Rapallo (Italy), demonstrating robustness against congestion-induced delays.

2605.25807 2026-05-26 eess.SY cs.SY

Deterministic and Nonblocking Supervisory Control of Discrete Event Systems under Cyber Attacks

网络攻击下离散事件系统的确定性与非阻塞监督控制

Feng Lin, Caisheng Wang, Jun Chen, Xiang Yin

AI总结 本文利用ALTER模型研究网络攻击下离散事件系统的确定性与非阻塞监督控制,提出CA-D可控性和CA-D可观测性概念,并证明其存在性条件。

详情
AI中文摘要

我们利用ALTER(基于转换的攻击语言)模型研究网络攻击下离散事件系统的确定性与非阻塞监督控制。先前的工作分别考虑实现大语言(上界)或小语言(下界)的监督控制,而确定性监督控制同时实现大语言和小语言,以确保受控系统生成的语言是唯一且确定的。我们引入了CA-D可控性和CA-D可观测性两个新概念,并证明它们对于确定性监督器的存在性是必要且充分的。对于非阻塞监督控制,目标是确保受控系统在任何攻击场景下都能到达标记状态。我们证明了相对封闭性、CA-D可控性和CA-D可观测性共同构成非阻塞监督器存在的必要且充分条件。我们还开发了验证CA-D可控性和CA-D可观测性的方法。最后,通过一个机器人系统示例说明了我们的结果。

英文摘要

We investigate deterministic and nonblocking supervisory control of discrete event systems under cyber-attacks using the ALTER (Attack Language for Transition-basEd Replacement) model. While prior works consider supervisory control that achieves either the large (upper bound) language or small (lower bound) language separately, deterministic supervisory control achieves both large language and small language at the same time to ensure that the language generated by the supervised system is unique and deterministic. We introduce two new concepts of CA-D-controllability and CA-D-observability and prove that they are necessary and sufficient for the existence of a deterministic supervisor. For nonblocking supervisory control, the objective is to ensure that the supervised system can always reach marked states under any attack scenario. We prove that relative closure, CA-D-controllability, and CA-D-observability together are necessary and sufficient for the existence of a nonblocking supervisor. We further develop methods to verify CA-D-controllability and CA-D-observability. We also illustrate our results using a robotic system example.

2605.25688 2026-05-26 eess.SP

Robust Quantum-MUSIC for DoA Estimation Using Rydberg Atomic Receiver Arrays

基于里德伯原子接收器阵列的鲁棒量子MUSIC波达方向估计

Sourav Banerjee, Neel Kanth Kundu, Prajwalita Borah

AI总结 针对量子MUSIC算法对异常值敏感的问题,提出用ℓ1范数替代ℓ2范数的鲁棒量子MUSIC框架,通过迭代重加权最小二乘求解相位恢复,在保持结构不变下实现抗异常值性能。

详情
AI中文摘要

使用里德伯原子接收器的量子无线传感能够实现高灵敏度信号采集和波达方向(DoA)估计。然而,它存在一个基本限制:只能观测到接收信号的幅度。最近提出的量子MUSIC算法通过交替最小化恢复相位信息,随后应用MUSIC算法进行DoA估计,解决了这一问题。然而,现有方法依赖于ℓ2范数相位恢复步骤,使其对硬件故障、传感器饱和或恶意干扰产生的异常值测量高度敏感。本文提出了一种鲁棒量子MUSIC(RobQMUSIC)框架,用ℓ1范数公式替代ℓ2范数。所得的加权相位恢复问题通过嵌入在交替最小化循环中的迭代重加权最小二乘(IRLS)方案高效求解,且相对于基线算法无需增加结构复杂度。仿真结果表明,在理想条件下,RobQMUSIC的DoA估计精度与量子MUSIC几乎相同,而在量子MUSIC完全失效的广泛异常值污染水平下,RobQMUSIC仍保持鲁棒性能。

英文摘要

Quantum wireless sensing using Rydberg atomic receivers enables high-sensitivity signal acquisition direction-of-arrival (DoA) estimation. However, it suffers from a fundamental limitation, where only the magnitude of the received signal is observable. The recently proposed Quantum-MUSIC algorithm addresses this problem by recovering phase information through alternating minimization and subsequently applying the MUSIC algorithm for DoA estimation. However, the existing approach relies on an $\ell_2$-norm phase retrieval step, making it highly sensitive to outlier measurements produced by hardware faults, sensor saturation, or adversarial interference. In this letter, we propose a \emph{Robust Quantum-MUSIC} (RobQMUSIC) framework that replaces the $\ell_2$-norm with an $\ell_1$-norm formulation. The resulting weighted phase-retrieval problem is solved efficiently via an Iteratively Reweighted Least Squares (IRLS) scheme embedded within the alternating minimization loop, requiring no increase in structural complexity relative to the baseline algorithm. Simulation results demonstrate that RobQMUSIC achieves near-identical DoA estimation accuracy to Quantum-MUSIC under ideal conditions, while maintaining robust performance over a wide range of outlier contamination levels at which Quantum-MUSIC fails entirely.

2605.25669 2026-05-26 eess.AS

Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction

基于梅尔频谱的超低比特率神经语音编码:流匹配精化与声码器驱动重建

Hui-Peng Du, Yang Ai, Xiao-Hang Jiang, Yuan Tian, Zhen-Hua Ling

AI总结 提出FMelCodec,一种在梅尔频谱域的三阶段编码-精化-重建框架,通过640倍压缩、在线聚类、条件流匹配精化和HiFi-GAN声码器,在250 bps(16 kHz)和750 bps(48 kHz)超低比特率下实现高质量语音重建和说话人相似性。

详情
Comments
Published at IEEE/ACM Transactions on Audio, Speech, and Language Processing
AI中文摘要

超低比特率语音编码对于带宽受限的通信和深度压缩至关重要,但由于显著的信息损失和量化不稳定性,在如此极端的比特预算下保持自然度和说话人身份仍然具有挑战性。为此,我们提出了FMelCodec,一种在梅尔频谱域的超低比特率神经语音编解码器,采用三阶段编码-精化-重建(CRR)框架,可低至250 bps运行。在CRR框架中,前端梅尔频谱编码阶段采用高度激进的640倍压缩/解压缩编码器-解码器结构,配备单个1024条目VQ码本,并结合在线聚类策略重新分配未充分利用的码字以防止码本崩溃并保持码本多样性。随后的基于条件流匹配(CFM)的梅尔频谱精化阶段利用轻量级速度场估计器和基于CFM的求解器来精化前一个解码器产生的编解码退化梅尔频谱,并采用自一致性训练方案,支持更少的迭代推理步骤以减少计算开销。最后,声码器驱动的波形重建阶段采用HiFi-GAN声码器从精化后的梅尔频谱忠实地重建波形。在两个采样率的数据集上进行的实验表明,在16 kHz的250 bps和48 kHz的750 bps超低比特率约束下,客观和主观评估一致表明,FMelCodec实现了更高的语音重建质量和说话人相似性,同时具有更低的计算和模型复杂度。

英文摘要

Ultra-low-bitrate speech coding is pivotal for bandwidth-constrained communication and deep compression, yet maintaining naturalness and speaker identity at such extreme bit budgets remains challenging due to pronounced information loss and quantization instability. To this end, we propose FMelCodec, an ultra-low-bitrate neural speech codec in the mel-spectrogram domain, cast as a three-stage coding-refinement-reconstruction (CRR) framework that can operate at as low as 250 bps. In the CRR framework, the front-end mel-spectrogram coding stage employs a highly aggressive 640x compression/decompression encoder-decoder structure with a single 1024-entry VQ codebook, coupled with an online clustering strategy that reassigns underused codewords to prevent codebook collapse and preserve codebook diversity. The subsequent conditional flow matching (CFM)-based mel-spectrogram refinement stage leverages a lightweight velocity-field estimator and CFM-based solver to refine the codec-degraded mel-spectrogram produced by the preceding decoder, and adopts a self-consistency training scheme that supports fewer iterative inference steps for the purpose of reducing computational overhead. Finally, the vocoding-driven waveform reconstruction stage employs a HiFi-GAN vocoder to faithfully reconstruct waveform from the refined mel-spectrogram. Experiments conducted on two datasets spanning two sampling rates show that, under ultra-low-bitrate constraints of 250 bps for 16 kHz and 750 bps for 48 kHz, both objective and subjective evaluations consistently demonstrate that FMelCodec achieves higher speech reconstruction quality and speaker similarity, while incurring lower computational and model complexity.

2605.25617 2026-05-26 eess.SY cs.SY

Justice-informed Planning of Intermodal Autonomous Mobility-on-Demand Systems under Operational Constraints

运营约束下多模式自主出行系统的公正知情规划

Giacomo Ganassoli, Francesco Mazzeo, Cecilia Pasquale, Silvia Siri, Mauro Salazar

AI总结 本文提出一种考虑用户预算、安全限制和基础设施容量约束的公正知情优化模型,用于多模式自主出行系统(AMoD)的运营规划,并在纽约曼哈顿案例中验证了免费公共交通政策能接近完全免费AMoD系统的公正水平且不牺牲效率。

详情
Comments
Accepted for presentation at conference IEEE ITSC 2026. This is the preprint version
AI中文摘要

迄今为止,大多数交通规划研究侧重于优化收入或平均出行时间等功利指标,这往往为了利润或效率而惩罚最弱势群体。同时,大多数交通公正研究侧重于评估不公正,而无法提出运营解决方案。本文致力于弥合这一差距,提出了用于多模式交通系统公正知情运营规划的优化模型,明确考虑了用户的预算和安全限制以及基础设施容量约束。具体而言,我们首先关注一个多模式自主出行系统(AMoD)——其中自动驾驶出租车与公共交通和主动出行模式共同提供按需出行——并通过网络流模型从中观规划角度描述其运营。其次,我们利用这些模型通过功利效率和公正知情目标来优化系统运营。我们在纽约曼哈顿的实际案例中展示了我们的框架。结果表明,如果作为交通网络公司部署,货币预算显著限制了AMoD系统的社会公正潜力。同时,提供免费公共交通可以达到与完全免费的多模式AMoD系统非常接近的充足水平,其中公正知情运营可以在不牺牲标准效率指标的情况下实现,最终凸显了社会政策的强大潜力。

英文摘要

To date, most of the research on transport planning has focused on optimizing revenues or utilitarian metrics such as average travel times, which often ends up penalizing the worst-off for the sake of profit or efficiency. At the same time, most of the research in transport justice has focused on assessing injustices, without being able to prescribe operational solutions. This paper contributes to bridging this gap and presents optimization models for justice-informed operational planning of intermodal mobility systems that explicitly account for the budget and safety limitations of users, and for infrastructural capacity constraints. Specifically, we first focus on an intermodal Autonomous Mobility-on-Demand (AMoD) system -- where self-driving robotaxis provide on-demand mobility jointly with public transit and active modes -- and characterize its operations from a mesoscopic planning perspective via network flow models. Second, we leverage these models to optimize system operations through both utilitarian efficiency and justice-informed objectives. We showcase our framework in a real-world case-study for Manhattan, New York. Our results show that monetary budgets significantly limit the social justice potential of AMoD systems if they are to be deployed as transportation network companies. At the same time, granting free public transit can result in sufficiency levels very close to a completely free intermodal AMoD system, where justice-informed operations can be achieved without compromising standard efficiency metrics, ultimately highlighting the strong potential of social policies.

2605.25605 2026-05-26 eess.AS cs.LG

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

在不平衡EEG数据集中基于刺激重建的听觉注意力鲁棒解码

Yuanming Zhang, Yayun Liang, Zhibin Lin, Jing Lu

AI总结 研究不平衡数据集对基于刺激重建的听觉注意力解码性能的影响,提出留一对包交叉验证协议以防止解码准确率膨胀。

详情
AI中文摘要

在过去十年中,许多研究通过刺激重建从脑电图信号中应用深度神经网络解码听觉注意力。然而,数据集平衡对基于刺激重建的AAD解码性能的影响尚未被探索。在本研究中,使用三个公开的EEG-AAD数据集——KUL、DTU和NJU cEEGrid——构建平衡和不平衡的实验条件。我们假设并证明基于刺激重建的DNN解码器倾向于在不平衡数据集上产生高估的解码性能。为了解决这个问题,我们提出了一种留一对包交叉验证协议。实验结果证实,LOPEO有效防止了在不平衡数据集上的解码准确率膨胀。虽然平衡数据集在实验设计中通常更受青睐,但LOPEO为已经发表的不平衡数据集提供了一个原则性的评估框架,填补了该领域的一个重要空白。

英文摘要

In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.

2605.25593 2026-05-26 eess.SP

Time-Varying Parametric Channel Estimation With CP Decomposition Tensor Processing

基于CP分解张量处理的时变参数信道估计

Enrique T. R. Pinto, André L. F. de Almeida, Markku Juntti

AI总结 针对时变频率选择性信道,提出一种基于CP分解和ESPRIT初始化的快速参数信道估计算法,性能接近多起点SAGE但计算量低一个数量级。

详情
AI中文摘要

集成感知与通信(ISAC)是第六代(6G)无线系统的关键用例,其中参数信道估计(PCE)在高移动性场景中实现感知、定位和信道均衡方面起着核心作用。然而,PCE通常比传统信道估计计算量更大,这促使开发低复杂度的解决方案。在这封信中,我们针对时变频率选择性(TVFS)信道提出了一种基于典型多路分解(CP)和张量处理的快速PCE算法,结合基于ESPRIT的初始化、分量细化和精确线搜索交替坐标下降。提出了两种变体:一种用于全数字接收机架构,另一种用于混合接收机架构。数值结果表明,所提方法明显优于相关的基于CP的基线方法,同时以显著更低的计算成本(执行时间约短一个数量级)实现了接近多起点SAGE基准的估计性能。

英文摘要

Integrated sensing and communications (ISAC) is a key use case for sixth-generation (6G) wireless systems, where parametric channel estimation (PCE) plays a central role in enabling sensing, localization, and channel equalization in high-mobility scenarios. However, PCE is typically more computationally demanding than conventional channel estimation, which motivates the development of lower-complexity solutions. In this letter, we propose a fast PCE algorithm for time-varying and frequency-selective (TVFS) channels based on canonical polyadic (CP) decomposition and tensor processing, combined with ESPRIT-based initialization, component refinement, and exact line-search alternating coordinate descent. Two variants are presented: one for fully digital and another for hybrid receiver architectures. Numerical results show that the proposed method clearly outperforms a related CP-based baseline while achieving estimation performance close to a multiple-start SAGE benchmark at a substantially lower computational cost, with about one order of magnitude shorter execution time.

2605.25533 2026-05-26 eess.SP cs.IT math.IT math.ST stat.TH

Projected multi-reference alignment

投影多参考对齐

Amnon Balanov, Josh Katz, Tamir Bendory, Dan Edidin

AI总结 针对投影多参考对齐模型,在高噪声条件下利用前三阶矩恢复信号的二面体轨道,并证明样本复杂度与噪声方差的六次方成正比。

详情
AI中文摘要

受结构生物学应用启发,我们研究了投影多参考对齐(MRA)模型,其中未知信号通过含噪样本观测,每个样本由随机循环移位后接固定投影生成。投影合并反射对称的索引对,从而丢弃方向信息。目标是恢复信号的二面体轨道。我们证明在高噪声条件下,投影观测的前三阶矩确定一个一般的二面体轨道。主要机制是在矩层面将投影MRA约简为二面体MRA的反射不变相位耦合结构。在适应投影的傅里叶-余弦坐标中,一阶矩确定均值分量,二阶矩确定傅里叶幅度,选定的三阶矩给出二面体双谱中出现的余弦相位耦合关系。这些关系导致从三阶矩出发的构造性恢复方案。我们通过有限样本实验补充总体理论,比较期望最大化(EM)、直接矩优化和直接傅里叶-余弦矩优化。结果表明,在高噪声条件下,EM和直接矩优化均与预测的三阶矩样本复杂度标度$n \gtrsim σ^6$一致,其中$n$是观测数,$σ^2$是噪声方差。

英文摘要

Motivated by structural biology applications, we study the projected multi-reference alignment (MRA) model, in which an unknown signal is observed through noisy samples, each generated by applying a random cyclic shift followed by a fixed projection. The projection merges reflection-symmetric index pairs, thereby discarding orientation information. The goal is to recover the dihedral orbit of the signal. We prove that in the high-noise regime, the first three moments of the projected observations determine a generic dihedral orbit. The main mechanism is a reduction, at the moment level, from projected MRA to the reflection-invariant phase-coupling structure of dihedral MRA. In Fourier-cosine coordinates adapted to the projection, the first moment determines the mean component, the second moment determines the Fourier magnitudes, and selected third moments yield the cosine phase-coupling relations appearing in the dihedral bispectrum. These relations lead to a constructive recovery scheme from moments up to order three. We complement the population theory with finite-sample experiments comparing expectation--maximization (EM), direct moment optimization, and direct Fourier-cosine moment optimization. The results show that, in the high-noise regime, both EM and direct moment optimization are consistent with the predicted third-moment sample-complexity scaling $n \gtrsim σ^6$, where $n$ is the number of observations and $σ^2$ is the noise variance.

2605.25512 2026-05-26 eess.AS

cSTMM: A Unified Complex Spherical Student's $t$ Mixture Model for Directional Statistics in Mask-Based Blind Speech Separation

cSTMM:一种用于基于掩码的盲语音分离中方向统计的统一复球面学生t混合模型

Nobutaka Ito

AI总结 提出复球面学生t混合模型(cSTMM),通过自由度参数ν统一cACGMM、cBMM和cWMM,并推导基于广义MM的参数估计方法,在无噪声混响语音分离中取得优于cACGMM的SDRi增益。

详情
AI中文摘要

基于掩码的盲语音分离(BSS)通过利用空间信息对多通道观测进行聚类来估计源特定的时频(TF)掩码。方向统计方法在复单位球面上对归一化的多通道观测进行聚类,无需基于平面波或球面波假设显式提取相位和电平差特征。然而,先前的研究大多比较少量单独定义的方向统计混合模型,而更广泛的分布族将能够更系统地研究密度轮廓如何影响分离性能。我们提出了复球面学生t混合模型(cSTMM),这是一种方向混合模型,通过自由度参数ν将复角中心高斯混合模型(cACGMM)、复宾厄姆混合模型(cBMM)和复沃森混合模型(cWMM)联系起来。我们还推导了一种基于广义最小化-最大化(MM)的参数估计过程。在无噪声LibriSpeech混合(使用实测房间冲激响应混响)上的无重启评估表明,在所有声学条件下,单个开发集选择的值ν*=1比cACGMM等效设置ν=M取得了更高的测试集平均信号失真比改善(SDRi),平均条件增益为0.25dB。实验还数值验证了所提出的公式数值上恢复了cACGMM、cBMM和cWMM的情况。

英文摘要

Mask-based blind speech separation (BSS) estimates source-wise time-frequency (TF) masks by clustering multichannel observations using spatial information. The directional statistical approach clusters normalized multichannel observations on the complex unit sphere, without explicitly extracting phase and level difference features based on the plane-wave or spherical-wave assumptions. However, prior studies have mostly compared a small number of separately defined directional statistical mixture models, whereas a broader distribution family would enable a more systematic study of how density profiles affect separation performance. We propose the complex spherical Student's t mixture model (cSTMM), a directional mixture model that connects the complex angular central Gaussian mixture model (cACGMM), complex Bingham mixture model (cBMM), and complex Watson mixture model (cWMM) through the degrees-of-freedom parameter $ν$. We also derive a generalized minorization-maximization (MM) based procedure for parameter estimation. A no-restart evaluation on noise-free LibriSpeech mixtures reverberated with measured room impulse responses shows that a single development-selected value $ν^\ast=1$ achieved higher test-set mean signal-to-distortion ratio improvements (SDRi) than the cACGMM-equivalent setting $ν=M$ in all acoustic conditions, with an average condition-wise gain of 0.25dB. The experiments also numerically verify that the proposed formulation numerically recovers the cACGMM, cBMM, and cWMM cases.

2605.25506 2026-05-26 eess.AS

WaveNeXt 2: ConvNeXt-Based Fast Neural Vocoders With Residual Denoising and Sub-Modeling for GAN and Diffusion Models

WaveNeXt 2:基于ConvNeXt的快速神经声码器,采用残差去噪和子建模用于GAN和扩散模型

Wangzixi Zhou, Takuma Okamoto, Yamato Ohtani, Sakriani Sakti, Hisashi Kawai

AI总结 提出WaveNeXt 2统一框架,通过残差去噪和子建模使ConvNeXt架构同时适用于GAN和扩散声码器,在多说话人数据集上实现更快推理和竞争性合成质量。

详情
Journal ref
Proc. ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 17012-17016, 2026
Comments
ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
AI中文摘要

大多数神经声码器仅限于一种类型:要么是基于GAN,要么是基于扩散。虽然像Vocos和WaveNeXt这样的最先进模型使用了强大的基于ConvNeXt的生成器,但它们仅用于GAN框架,在多说话人设置中性能有限。此外,扩散模型尽管训练速度比GAN快,但CPU推理速度慢。在本文中,我们介绍了WaveNeXt 2,这是一个统一的基于ConvNeXt的框架,兼容GAN和扩散声码器。其核心创新是残差去噪和子建模,其中每个子模型逐步细化波形。在多说话人数据集上的实验结果证明了我们方法的有效性:(1)GAN-WaveNeXt 2比HiFi-GAN和WaveFit快得多,(2)Diff-WaveNeXt 2与4步FastDiff相比,也提供了更快的推理和具有竞争力的合成质量。Diff-WaveNeXt 2训练效率很高,仅需32小时训练,使其成为资源受限应用的理想选择。

英文摘要

Most neural vocoders are limited to one type: either GAN or diffusion-based. While state-of-the-art models like Vocos and WaveNeXt use powerful ConvNeXt-based generators, they have only been used in GAN frameworks and have limited performance in multi-speaker settings. Moreover, diffusion models, despite training faster than GANs, have slow CPU inference. In this paper, we introduce WaveNeXt 2, a unified ConvNeXt-based framework compatible with both GAN and diffusion vocoders. Its core innovation is residual denoising and sub-modeling, where each sub-model progressively refines the waveform. Experimental results in the multi-speaker dataset demonstrate the effectiveness of our approach: (1) GAN-WaveNeXt 2 is much faster than HiFi-GAN and WaveFit, and (2) Diff-WaveNeXt 2 also delivers much faster inference and competitive synthesis quality compared with FastDiff with 4 steps. The Diff-WaveNeXt 2 is very training-efficient, training in only 32 hours, making it ideal for resource-constrained applications.

2605.25504 2026-05-26 eess.AS

Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control

面向自然情感文本到语音系统:细粒度非语言表达控制

Wangzixi Zhou, Bagus Tris Atmaja, Sakriani Sakti

AI总结 提出一种基于EARS语料库的细粒度非语言表达合成方法,通过新标注方案编码NV类型、频率和时长,在情感TTS中显著提升表现力(eMOS 4.20)和情感识别准确率(78.8%)。

详情
Journal ref
Proc. 2025 28th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1-6, 2025
Comments
2025 28th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
AI中文摘要

虽然当前的情感文本到语音(TTS)模型已成功控制言语韵律,但它们常常忽略非语言发声(NV),而NV对于真实的人类情感至关重要。尽管最近出现了一些非语言数据集,但它们往往缺乏高质量、细粒度的标注,这限制了模型精确控制NV生成的能力。为解决这一局限,我们提出了一种细粒度非语言表达合成的新方法。我们从EARS语料库中整理并重新处理女性NV发声,开发了一种使用标签编码NV类型、频率和时长的新标注方案,并构建了一个情感TTS基准以证明其有效性。我们的评估表明,虽然我们的NV方法在感知自然度上带来轻微折衷,但它显著提升了表现力(eMOS 4.20)和情感识别准确率(78.8%)。情感特异性分析进一步揭示,NV线索对于高唤醒度情感如快乐(82.5%)和恐惧(82.7%)非常有效,并且几乎完美地传达悲伤(98.3%)。

英文摘要

While current emotional Text-to-Speech (TTS) models have successfully controlled verbal prosody, they often ignore non-verbal vocalizations (NVs), which are essential for authentic human emotion. Although some non-verbal datasets have recently emerged, they often lack high-quality, fine-grained annotations, which restricts a model's ability to precisely control NV generation. To address this limitation, we propose a novel approach for fine-grained non-verbal expression synthesis. We curate and reprocess female NV utterances from the EARS corpus, develop a new annotation scheme using tags to encode NV types, frequencies, and durations, and build an emotional TTS benchmark to demonstrate its effectiveness. Our evaluation shows that while our NV approach leads to minor trade-offs in perceived naturalness, it significantly improves expressiveness (eMOS 4.20) and emotional recognition accuracy (78.8%). Emotion-specific analysis further reveals that NV cues are highly effective for high-arousal emotions like happy (82.5%) and fear (82.7%), and almost perfectly convey sadness (98.3%).

2605.25498 2026-05-26 eess.AS

Subspace Track-before-Detect for Passive Multi-Target Tracking with Unknown Emitted Signals

子空间检测前跟踪用于未知发射信号的被动多目标跟踪

Nobutaka Ito, Yoshiaki Bando

AI总结 针对被动多目标跟踪中未知发射信号的问题,提出基于复Bingham分布似然的子空间检测前跟踪方法,无需显式建模或估计未知信号,在-10dB信噪比下实现双目标跟踪。

详情
AI中文摘要

被动多目标跟踪旨在从噪声传感器数据中推断多个目标的运动状态,其中来自未知目标发射信号的贡献是叠加的。检测前跟踪方法通过直接处理原始传感器数据而不依赖前期的检测阶段,提高了对噪声的鲁棒性。然而,许多现有的检测前跟踪方法假设每个目标对传感器数据的贡献仅由其运动状态决定。这一假设限制了它们在被动多目标跟踪中的适用性,因为每个目标的贡献既取决于其运动状态,也取决于未知的发射信号。我们提出了子空间检测前跟踪,一种基于复Bingham分布推导的似然的被动多目标检测前跟踪方法,该方法不需要显式建模或估计未知发射信号。在粒子滤波框架中,每个多目标假设被映射到一个由假设目标状态对应的导向向量张成的低维子空间。然后利用该似然来评估归一化多通道传感器数据与该子空间的对齐程度。使用模拟声学测量和给定目标活动模式的初步实验表明,所提方法可以在信噪比为-10dB的情况下跟踪两个发射未知信号的移动目标,而传统的检测前跟踪基线则产生更大的跟踪误差。

英文摘要

Passive multi-target tracking (MTT) aims to infer the kinematic states of multiple targets from noisy sensor data in which contributions from unknown target-emitted signals are superposed. Track-before-detect (TBD) methods improve robustness to noise by operating directly on raw sensor data without relying on a preceding detection stage. However, many existing TBD methods assume that each target's contribution to the sensor data is determined solely by its kinematic state. This assumption limits their applicability to passive MTT, where each target's contribution depends on both its kinematic state and the unknown emitted signal. We propose subspace TBD, a passive multi-target TBD method based on a likelihood derived from the complex Bingham distribution that does not require explicit modeling or estimation of the unknown emitted signals. In a particle filter (PF) framework, each multi-target hypothesis is mapped to a low-dimensional subspace spanned by the steering vectors corresponding to the hypothesized target states. The likelihood is then used to evaluate the alignment of the normalized multichannel sensor data with this subspace. Preliminary experiments with simulated acoustic measurements and a given target activity pattern show that the proposed method can track two moving targets emitting unknown signals at a signal-to-noise ratio (SNR) of -10dB, whereas a conventional TBD baseline yields substantially larger tracking errors.

2605.25458 2026-05-26 eess.SP

Deep Machine Learning in MIMO Communication Systems

MIMO通信系统中的深度机器学习

Mohammad Reza Ghavidel Aghdam, Alireza Naghavi

AI总结 本文提出一种基于自编码器的端到端MIMO通信系统,通过联合优化发射机、接收机和信道,在瑞利衰落和噪声条件下显著降低误码率。

详情
AI中文摘要

本文提出了一种创新方法,用于增强基于机器学习的通信系统,特别关注使用自编码器的多输入多输出(MIMO)配置。我们在噪声和信道衰落条件下同时优化发射机、接收机和信道,旨在最小化误码率(BER)。通过将瑞利衰落信道(一种广泛认可的无线信道损伤模型)纳入自编码器框架,我们直接训练通信系统以处理真实世界条件。我们引入了一种针对基于深度学习的MIMO通信量身定制的新型优化过程,并深入分析了在不同信噪比(SNR)水平下得到的BER性能。我们的仿真结果表明,所提出的端到端无线通信系统与传统基于块的处理方法相比,实现了显著更低的BER,突显了其在实现更高效、更可靠的无线通信方面的潜力。

英文摘要

This paper presents an innovative approach to enhancing machine learning based communication systems, specifically focusing on multiple-input multiple-output (MIMO) configurations using autoencoders. We optimize the transmitter, receiver, and channel simultaneously under conditions of noise and channel fading, aiming to minimize the bit error rate (BER). By incorporating the Rayleigh fading channel a widely recognized model for wireless channel impairments into the autoencoder framework, we directly train the communication system to handle real world conditions. We introduce a novel optimization process tailored for deep learning-based MIMO communication, and thoroughly analyze the resulting BER performance across various signal to noise ratio (SNR) levels. Our simulation results reveal that the proposed end-to-end wireless communication system achieves significantly lower BER compared to conventional block-based processing methods, highlighting its potential for more efficient and reliable wireless communication.

2605.25456 2026-05-26 eess.SY cs.SY

Aircraft and Fleet Sizing for Regional Air Mobility: College Town Case Studies

面向区域空中交通的飞机与机队规模优化:大学城案例研究

Jung Ho Park, Changyeob Lee, Shangqing Cao, Raja Sengupta, Mark Hansen, Pavan Yedavalli

AI总结 通过联合供需优化框架,研究飞机座位配置与机队规模对区域空中交通盈利能力和吞吐量的影响,发现4座和6座配置在不同市场条件下表现最优。

详情
Comments
Submitted to International Workshop on ATM/CNS (IWAC)
AI中文摘要

我们通过应用一个联合供需优化框架,同时确定市场份额、票价和航班时刻表,研究了飞机座位配置如何与区域空中交通的日常运营相互作用。该框架将二元Logit离散选择模型整合到任务分配公式中,捕捉乘客在区域空中交通和驾车之间的跨时空起讫点对模式选择。我们评估了三个美国大学城走廊,采用4座、6座和8座配置,成本规模从0.4到1.0,机队规模从12到30架飞机。盈利能力和吞吐量作为主要性能指标,并分析定价能力、运营成本和收入以解释不同市场的性能差异。我们发现,较大的飞机配置和机队规模并不普遍提高盈利能力。在规模经济有利、需求充足且方向平衡的市场中,较大的飞机更受青睐。在这些案例研究中,最佳配置是需求不平衡市场中的4座配置,以及平衡或密集市场中的6座配置。

英文摘要

We examine how aircraft seat configuration interacts with daily operation in Regional Air Mobility by applying a joint supply-demand optimization framework that simultaneously determines market share, fare, and flight schedule. The framework integrates a binary logit discrete choice model into a task assignment formulation, capturing passengers' mode choice between Regional Air Mobility and driving across spatiotemporal origin-destination pairs. We evaluate three U.S. college town corridors under 4-, 6-, and 8-seat configurations across cost scales from 0.4 to 1.0 and fleet sizes from 12 to 30 aircraft. Profitability and throughput serve as primary performance metrics, and we analyze pricing power, operating cost, and revenue to explain performance variation across markets. We find that larger aircraft configurations and fleet sizes do not improve profitability universally. Larger aircraft are preferred where economies of scale are favorable and demand is sufficient and directionally balanced. The best configuration in these case studies is the 4-seat in imbalanced markets and the 6-seat in balanced or dense markets.

2605.25431 2026-05-26 cs.NI cs.MA eess.SP

Mode 0: A New 3GPP V2X Resource Allocation Category for Roadside Computing Unit-Assisted Safety Communication

模式0:面向路侧计算单元辅助安全通信的新型3GPP V2X资源分配类别

Dewei Jiang, Xiang Gu

AI总结 针对现有3GPP V2X资源分配框架中基站与车辆UE二元分类的结构性缺陷,提出以路侧计算单元(RCU)为核心实体的新模式0,通过集成感知、通信与计算能力,实现高密度交通场景下的低延迟安全通信,并利用MAPPO仿真验证了其性能优势。

详情
Comments
13 pages, 7 figures, 4 tables. Submitted to IEEE Transactions on Intelligent Transportation Systems
AI中文摘要

3GPP V2X资源分配框架定义了两类实体——基站和车辆UE,以及跨LTE和NR代的四种模式。我们证明这种二元分类在结构上是不完整的。基站主导的调度在高密度交通节点处饱和,产生延迟尾部故障,即使在平均数据包投递率接近服务等级目标时仍然存在。UE自主性在根本上无法对遮挡的交通参与者进行预警,且不足以应对大范围级联环境危害。我们提出模式0,一种新的3GPP V2X类别,其定义实体是路侧计算单元(RCU)——一种集成高架感知(Seeing)、侧链路通信(Speaking)和本地计算评估(Thinking)的基础设施组合,由交通管理部门拥有。模式0定义了一个子族频谱,从模式0a(全被动UE,保证最小值)到模式0c(全主动UE,最优目标)。来自中国国家标准(DB11/T 2329.1-2024、T/ITS 0224.1-2025)、中国联通RS-MEC基础设施以及欧美C-V2X项目的趋同部署证据证实,双方机构正在向路侧交通节点汇聚,但缺乏协调标准。十五次多智能体近端策略优化(MAPPO)仿真验证了该架构族:共享池基线中的模式0a处于分析对称纳什协调下限;具有需求分离的模式0c实现了两类交通的严格帕累托改进(在$ρ_{ m pool} \leq 1$时,M0 PDR 0.999,M1 PDR 0.998),并将最差TTI投递率从接近零提升至0.601——这是唯一在结构上满足延迟安全要求的配置。我们呼吁在NR-V2X侧链路增强工作计划中设立关于模式0的3GPP研究项目。

英文摘要

The 3GPP V2X resource allocation framework defines two entity classes -- the base station and the vehicle UE -- and four modes across LTE and NR generations. We demonstrate that this binary taxonomy is structurally incomplete. Base station-led scheduling saturates at high-density traffic nodes, producing latency-tail failures that persist even when mean packet delivery ratios approach the service-class target. UE autonomy is categorically incapable of pre-emergence warning for occluded traffic participants and insufficient for large-scope cascading environmental hazards. We propose Mode 0, a new 3GPP V2X category whose defining entity is the Roadside Computing Unit (RCU) -- an infrastructure ensemble integrating elevated sensing (Seeing), sidelink communication (Speaking), and local computational evaluation (Thinking), owned by traffic management authorities. Mode 0 defines a subfamily spectrum from Mode 0a (all-passive UEs, the guaranteed minimum) through Mode 0c (all-active UEs, the optimal target). Convergent deployment evidence from Chinese national standards (DB11/T 2329.1-2024, T/ITS 0224.1-2025), China Unicom RS-MEC infrastructure, and European and US C-V2X programs confirms that both institutional sides are converging on the roadside traffic node without a coordination standard. A fifteen-run Multi-Agent Proximal Policy Optimization (MAPPO) simulation validates the architectural family: Mode 0a in shared-pool baseline sits at the analytical symmetric-Nash coordination floor; Mode 0c with demand separation achieves strict Pareto improvement for both traffic classes (M0 PDR 0.999, M1 PDR 0.998 at $ρ_{\rm pool} \leq 1$) and lifts the worst-TTI delivery ratio from near-zero to 0.601 -- the only configuration satisfying the latency safety requirement structurally. We call for a 3GPP study item on Mode 0 within the NR-V2X sidelink enhancement work programme.

2605.25422 2026-05-26 eess.SP cs.AI cs.IT math.IT

A Token/KV-Cache Communication Media Selection and Resource Allocation Strategy for Multi-Agent Collaboration

面向多智能体协作的令牌/KV缓存通信介质选择与资源分配策略

Lipeng Dai, Luping Xiang, Kun Yang

AI总结 针对多智能体协作中异构交互介质带来的端到端延迟权衡问题,提出一种联合通信介质选择与无线资源分配的优化方法,并设计低复杂度算法以最小化延迟。

详情
AI中文摘要

大型语言模型(LLM)与6G网络的融合正在催生自主多智能体协作范式,这预计将大幅增加东西向流量。尽管潜在空间交互机制比符号自然语言(NL)交换能实现更高效的协作,但先前的工作通常忽略了实际无线约束下的相关通信开销。在具身多智能体场景中,异构交互介质会导致不同的推理和传输成本,从而产生固有的端到端(E2E)延迟权衡。为解决这一问题,我们提出了一种联合设计,将通信介质选择与无线资源分配相结合。通过分析表征和基于仿真的评估,我们表明基于令牌的传输和基于键值(KV)缓存的传输在运行状态下并非统一最优,因为性能关键取决于可用计算资源和信道条件等系统参数。因此,我们构建了一个联合优化问题,旨在最小化多智能体协作的E2E延迟,并开发了一种低复杂度的联合介质选择与资源分配(JMSRA)算法。数值结果进一步证实,通过自适应地协调异构链路上的交互介质和带宽分配,所提方案相对于传统的仅NL和仅KV缓存基线显著降低了E2E延迟,从而在未来无线网络中实现高效且鲁棒的多智能体协作。

英文摘要

The convergence of large language models (LLMs) with 6G networks is fostering a paradigm of autonomous multi-agent cooperation, which in turn is expected to substantially increase east-west traffic. Although latent-space interaction mechanisms can enable more efficient collaboration than symbolic natural-language (NL) exchanges, prior work often abstracts away the associated communication overhead under practical wireless constraints. In embodied multi-agent settings, heterogeneous interaction media incur disparate inference and transmission costs, thereby inducing an inherent end-to-end (E2E) latency trade-off. To address this, we propose a joint design that integrates communication-media selection with wireless resource allocation. Through analytical characterization and simulation-based evaluation, we show that neither token-based transmission nor key-value (KV) cache-based transmission is uniformly optimal across operating regimes, as performance depends critically on system parameters such as available computational resources and channel conditions. Accordingly, we formulate a joint optimization problem aimed at minimizing the E2E latency of multi-agent collaboration and develop a low-complexity joint media selection and resource allocation (JMSRA) algorithm. Numerical results further confirm that, by adaptively coordinating the interaction media and bandwidth allocation over heterogeneous links, the proposed scheme achieves markedly reduced E2E latency relative to conventional NL-only and KV-cache-only baselines, enabling efficient and robust multi-agent collaboration in future wireless networks.

2605.25404 2026-05-26 cs.CL eess.AS

Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems

主动应对不确定性:面向口语对话系统的因果感知错误诊断与交互式澄清

Yizhou Peng, Ziyang Ma, Changsong Liu, Yi-Wen Chao, Xie Chen, Eng Siong Chng

AI总结 本文提出一种因果感知的错误恢复范式,通过细粒度检测器解耦ASR中的感知、理解和删除错误,使LLM能够执行多轮针对性澄清策略,从而显著降低词错误率并提升下游任务性能。

详情
AI中文摘要

级联自动语音识别-大语言模型(ASR-LLM)流水线在工业口语对话系统(SDS)中仍然流行,主要因为其解耦设计确保了感知可验证性。然而,级联系统存在错误传播问题,因为转录失败不可避免地级联到后续组件,从而降低最终交互质量。尽管ASR置信度分数为不可靠输入提供了简单过滤,但这种方法存在根本性局限,因为它通常无法检测删除错误,也无法区分声学(听不清)和语言(不理解)不匹配,而这两者都需要针对性的恢复策略。在本文中,我们提出了一种因果感知的错误恢复范式,从根本上重新思考SDS的鲁棒性。与传统的置信度过滤不同,我们引入了一组小型精度聚焦检测器,利用深度ASR潜在表示将词级错误解耦为感知、理解和删除失败。这种细粒度诊断智能使LLM能够编排针对性的多轮澄清策略,有效将模糊信号转化为无缝的用户交互。实验结果验证了我们方法的精度,与基线相比,在领域转移错误上的召回率提高了一倍以上(57.96% vs. 23.66%)。关键的是,这种诊断精度在不同口音、失真和领域下,使词错误率降低高达30%,下游任务性能提升17%。

英文摘要

Cascaded Automatic Speech Recognition -- Large Language Model (ASR-LLM) pipelines remain popular for industrial Spoken Dialogue Systems (SDS), primarily because their decoupled design ensures perceptual verifiability. However, cascaded systems suffer from error propagation, as transcription failures inevitably cascade to subsequent components, thereby degrading the final interaction quality. Although ASR confidence scores offer a simple filter for unreliable inputs, this approach is fundamentally limited because it typically fails to detect deletion errors or to distinguish between acoustic (inability to hear clearly) and linguistic (inability to understand) mismatches, both of which require targeted recovery strategies. In this paper, we propose a cause-aware error recovery paradigm that fundamentally rethinks robustness in SDS. Unlike traditional confidence filtering, we introduce a suite of small precision-focused detectors that exploit deep ASR latent representations to disentangle token-level errors into perception, comprehension, and deletion failures. This fine-grained diagnostic intelligence empowers the LLM to orchestrate targeted, multi-turn clarification strategies, effectively transforming ambiguous signals into seamless user interactions. Experimental results validate the precision of our approach, which more than doubles the recall on domain-shift errors (57.96% vs. 23.66%) compared to baselines. Crucially, this diagnostic precision yields up to a 30% reduction in WER and a 17% improvement on the downstream task across diverse accents, distortions, and domains.

2605.25391 2026-05-26 cs.LG eess.SP

A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access

一种用于机会频谱接入中快速信道分配的上下文增强多玩多臂老虎机算法

Ruiyu Li, Guangxia Li, Xiao Lu, Jichao Liu, Yan Jin

AI总结 针对机会频谱接入中的信道分配问题,提出一种上下文增强的多玩多臂老虎机算法,通过将信道噪声建模为奖励函数的扰动并利用信道状态信息作为上下文,分别针对线性和非线性相关性推导出两种索引策略,实现低遗憾和更合理的次优臂选择。

详情
Comments
Accepted by ISCC'24
AI中文摘要

我们研究了机会频谱接入(OSA)场景中用于信道分配的动态上下文多玩多臂老虎机(MP-MAB)问题。大多数现有的MP-MAB方法对于实际OSA系统不实用,因为它们假设了许多理想条件,计算成本高,最重要的是忽略了与服务质量直接相关的信道噪声的影响。在本研究中,我们通过将信道噪声建模为MP-MAB中臂奖励函数的扰动来体现这种影响。由于信道状态信息与信道噪声之间存在隐含的相关性,我们将前者作为MP-MAB的上下文来表示后者引起的扰动。我们研究了上下文与扰动之间的两种相关性——线性和非线性,并分别推导出两种索引策略。这些策略通过线性模型和神经网络学习相关性,并使用估计的噪声值调整上置信界。数值实验表明,所提出的策略能够实现更低的遗憾,并以更合理的方式选择次优臂。

英文摘要

We study the restless contextual multi-play multi-armed bandit (MP-MAB) problem for channel allocation in the opportunity spectrum access (OSA) scenario. Most existing MP-MAB methods are impractical for real-world OSA systems as they assume many ideal conditions, incur a heavy computational cost, and most importantly, ignore the impact of channel noise which is directly related to the quality of service. In this study, we embody this impact by modeling channel noise as a perturbation of the arm's reward function in MP-MAB. As there is an implicit correlation between channel state information and channel noise, we take the former as a context for MP-MAB to present the perturbation caused by the latter. We investigate two types of correlation between the context and the perturbation -- linear and nonlinear, and derive two index policies, respectively. These policies learn the correlations through a linear model and a neural network, and use estimated noise value to adjust the upper confidence bound. Numerical experiments demonstrate that the proposed policies can achieve lower regret and select sub-optimal arms in a more reasonable way.

2605.25348 2026-05-26 eess.IV cs.AI cs.CV cs.LG cs.SC

Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization

基于深度图拉普拉斯正则化的参数高效CT重建

Veera Varuni Radhakrishnan, Chinthaka Dinesh, Qurat-ul-Ain Azim

AI总结 提出深度图拉普拉斯正则化(Deep GLR)方法,通过将二次图正则化集成到近端前向-后向分裂优化框架中,仅用少量参数和数据即可实现低剂量CT重建的噪声抑制,在参数效率和数据效率上显著优于现有方法。

详情
Comments
7 pages, 3 figures, conference
AI中文摘要

低剂量计算机断层扫描(LDCT)重建面临重建质量与资源需求之间的关键权衡。虽然最近的深度学习方法达到了最先进的性能,但它们通常依赖超过50万个参数,并在超过35,000次扫描的大规模数据集上训练。本文研究在严格资源约束下,基于图的正则化是否能提供有意义的噪声抑制。我们提出了深度图拉普拉斯正则化(Deep GLR),将二次图正则化集成到近端前向-后向分裂优化框架中,并包含三个轻量级CNN模块。在LoDoPaB-CT基准上评估,Deep GLR达到了30.70 dB的PSNR,相比滤波反投影提高了6.33 dB,同时仅使用了91,848个参数,在1000个样本上训练(标准训练集的2.8%)。与基准方法相比,这代表了每dB改进5.8倍的参数效率和30倍的数据效率。学习到的图带宽参数(ε=1.25)收敛到可解释的值,表明该方法捕捉了有意义的图像先验而非过拟合。尽管与最先进方法相比仍有13 dB的差距,但结果表明基于图的正则化为资源受限的医学成像场景提供了有利的效率-质量权衡。

英文摘要

Low-dose computed tomography (LDCT) reconstruction faces a critical tradeoff between reconstruction quality and resource requirements. While recent deep learning methods achieve state-of-the-art performance, they typically rely on over 500,000 parameters trained on large-scale datasets exceeding 35,000 scans. This work investigates whether graph-based regularization can provide meaningful noise reduction under strict resource constraints. We propose Deep Graph Laplacian Regularization (Deep GLR), integrating quadratic graph regularization into a Proximal Forward-Backward Splitting optimization framework with three lightweight CNN modules. Evaluated on the LoDoPaB-CT benchmark, Deep GLR achieves 30.70 dB PSNR, representing a 6.33 dB improvement over filtered backprojection, while using only 91,848 parameters trained on 1000 samples (2.8\% of standard training set). Compared to benchmark methods, this represents 5.8 times better parameter efficiency and 30 times better data efficiency per dB improvement. The learned graph bandwidth parameter ($ε$=1.25) converges to interpretable values, suggesting the method captures meaningful image priors rather than overfitting. While a 13 dB gap remains versus state-of-the-art methods, results demonstrate that graph-based regularization provides a favorable efficiency-quality tradeoff for resource-constrained medical imaging scenarios.

2605.25346 2026-05-26 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

用于学习和规划的并行可微可达性:带认证的神经动力学与控制器

Keyi Shen, Glen Chou

AI总结 提出一种基于JAX的并行可微可达性框架,结合泰勒模型流形构建与CROWN线性界传播,支持GPU批处理和自动微分,并用于认证训练和可达性感知的MPC,在非抓取操作和四旋翼任务中实现在线规划与有界不确定性下的认证可达集过近似。

详情
Comments
Robotics: Science and Systems XXII (RSS 2026)
AI中文摘要

神经网络动力学模型和控制策略在机器人领域取得了强大性能,但在不确定性下提供可靠保证仍然困难,尤其是对于闭环神经网络系统。现有的可达性工具提供了形式化的过近似,但通常不可微、过于保守或对于现代学习和在线规划流程来说太慢。为了解决这个问题,我们提出了一个在JAX中可并行化、可微的可达性框架,适用于连续和离散时间系统,具有解析和基于神经网络的动力学和控制器。我们的框架通过统一表示结合了泰勒模型流形构建和CROWN风格的线性界传播,该表示在支持GPU批处理计算和自动微分的同时保留了仿射依赖。基于这个可达性基元,我们开发了(i)一种认证训练方法,鼓励生成对可达性友好的动力学模型和控制器,以及(ii)一种具有基于梯度细化的可达性感知采样MPC方案。在非抓取操作和四旋翼任务上的实验,包括硬件和更高维度的评估(高达72维),展示了在实际在线规划中保持有界不确定性下认证可达集过近似的可行性。

英文摘要

Neural network (NN) dynamics models and control policies achieve strong performance in robotics, but providing sound guarantees under uncertainty remains difficult, especially for closed-loop NN systems. Existing reachability tools provide formal over-approximations, yet are often non-differentiable, overly conservative, or too slow for modern learning and online planning pipelines. To address this, we present a parallelizable, differentiable reachability framework in JAX for continuous- and discrete-time systems with analytical and NN-based dynamics and controllers. Our framework combines Taylor-model flowpipe construction with CROWN-style linear bound propagation through a unified representation that preserves affine dependencies while supporting GPU-batched computation and automatic differentiation. Building on this reachability primitive, we develop (i) a certified training method that encourages reachability-friendly dynamics models and controllers, and (ii) a reachability-aware sampling-based MPC scheme with gradient-based refinement. Experiments on non-prehensile manipulation and quadrotor tasks, including hardware and higher-dimensional evaluations (up to 72D), demonstrate practical online planning while maintaining certified reachable-set over-approximations under bounded uncertainty.

2605.25332 2026-05-26 cs.NI cs.CR cs.DC cs.SY eess.SY

TIP: A Decentralized Intent-Based Protocol for Declarative IoT Interoperability and Sandboxed Schema Adaptation

TIP:一种用于声明式物联网互操作性和沙盒模式适配的去中心化意图协议

Yeison David Mejia Mosquera

AI总结 提出去中心化声明式协议TIP,通过意图驱动、混合发现、多准则评分和WASM沙盒动态适配,解决物联网异构系统互操作性问题。

详情
Comments
12 pages, 3 figures
AI中文摘要

异构物联网系统在硬件架构、网络协议栈和数据序列化格式上存在碎片化问题。现有标准(如MQTT、COAP和DDS)依赖于地址绑定的命令式路由模型,需要硬编码配置且无法支持运行时模式转换。本文提出TIP(意图协议),一种去中心化的声明式网络协议。节点不指定具体物理端点,而是提交抽象意图,描述所需能力、模式和服务质量约束。TIP引擎使用结合本地多播DNS(mDNS)和Kademlia分布式哈希表(DHT)的混合发现机制来解析匹配节点。通过包含网络延迟、历史信誉和合约合规性的多准则评分算法优化选择。不匹配的数据表示在隔离的WebAssembly(WASM)沙盒中动态编译自TOML规范进行即时协调。安全性通过Ed25519签名、X25519密钥交换和ChaCha20-Poly1305有效载荷加密来保证。我们在Rust和C++中的参考实现评估显示,翻译开销低于毫秒级,并在工业条件下具有稳健的弹性。

英文摘要

Heterogeneous Internet of Things (IoT) systems suffer from fragmentation across hardware architectures, networking stacks, and data serialization formats. Existing standards (such as MQTT, COAP, and DDS) rely on address-bound, imperative routing models that require hardcoded configurations and leave no flexibility for runtime schema translation. This paper presents TIP (The Intent Protocol), a decentralized, declarative network protocol. Instead of addressing specific physical endpoints, nodes submit abstract intents specifying desired capabilities, schemas, and Quality of Service (QoS) constraints. The TIP Engine resolves matching nodes using a hybrid discovery mechanism combining local multicast DNS (mDNS) with Kademlia Distributed Hash Tables (DHT). Selection is optimized via a multi-criteria scoring algorithm incorporating network latency, historical reputation, and contract compliance. Mismatched data representations are reconciled on-the-fly inside isolated WebAssembly (WASM) sandboxes compiled dynamically from TOML specifications. Security is enforced through Ed25519 signatures, X25519 key exchanges, and ChaCha20-Poly 1305 payload encryption. Evaluation of our reference implementation in Rust and C++ shows sub-millisecond translation overhead and robust resilience under industrial conditions.