arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 245
2606.02562 2026-06-02 cs.RO cs.AI cs.LG cs.SY eess.SY

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

通过可信推理实现许可安全:可验证的信念空间神经安全滤波器用于保证交互式机器人

Haimin Hu

AI总结 针对交互式机器人中人类不确定性带来的安全问题,提出一种基于共形预测的信念空间安全滤波器验证方法,在考虑推理可靠性的前提下保证高概率安全,并减少保守性。

详情
Comments
Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR 2026)
AI中文摘要

与人类交互的自主机器人必须在人类引起的不确定性(如偏好、目标、能力和合作意愿)下做出安全高效的决策。安全滤波器是确保交互式机器人安全性的流行方法,其模块化设计将安全性与性能分离,使机器人能够在最小影响任务效率的情况下安全地与人交互。传统安全滤波器通常仅在物理空间中运行,忽略了机器人在线学习和适应的能力,而最近提出的信念空间安全滤波器(BeliefSF)在闭环中考虑机器人安全性,并通过运行时推理主动减少机器人的不确定性,从而降低滤波的保守性。然而,由于运行时推理的误差以及处理信念空间高维性所需的安全滤波器神经近似,为部署BeliefSF的机器人提供形式化安全保证仍然是一个重大挑战。本文提出一种算法方法,使用共形预测来认证BeliefSF的高概率安全性,同时明确考虑机器人运行时推理模块的可靠性。我们的方法利用信念空间安全滤波的结构,将验证集中在预期推理可靠的区域。它保留了标准共形预测的简单性和样本复杂度,但能够认证一个显著更不保守的安全滤波器。通过一个模拟的人-车交互基准测试,我们展示了我们的方法验证了一个比标准共形预测基线更许可的信念空间安全滤波器。

英文摘要

Autonomous robots that interact with people must make safe and efficient decisions under human-induced uncertainty, such as their preferences, goals, competency, and willingness to cooperate. Safety filters are a popular approach for ensuring safety in interactive robotics, since their modular design separates safety from performance, allowing robots to operate safely around people with minimal impact on task efficiency. While traditional safety filters typically operate only in the physical space, neglecting the robot's ability to learn and adapt online, the recently proposed belief-space safety filter (BeliefSF) reasons about robot safety in closed-loop with runtime inference that actively reduces the robot's uncertainty online, thereby reducing conservativeness in filtering. However, providing formal safety guarantees for robots deploying BeliefSF remains a significant challenge due to errors in runtime inference and neural approximation of safety filters required to handle the high dimensionality of belief spaces. In this paper, we propose an algorithmic approach to certify high-probability safety of BeliefSF using conformal prediction, while explicitly accounting for the reliability of the robot's runtime inference module. Our method leverages the structure of belief-space safety filtering by focusing verification on a region where inference is expected to be reliable. It preserves the simplicity and sample complexity of standard conformal prediction, yet can certify a substantially less conservative safety filter. Through a simulated human-vehicle interaction benchmark, we show that our approach verifies a significantly more permissive belief-space safety filter than a standard conformal prediction baseline.

2606.02529 2026-06-02 math.OC cs.GT cs.MA cs.SY eess.SY

A No-Regret Framework for Adaptive Incentive Design

自适应激励设计的无遗憾框架

Georgios Vasileiou, Lantian Zhang, Silun Zhang

AI总结 针对连续动作空间和私有成本的博弈,提出无遗憾自适应激励设计框架,通过切换激励策略实现参数估计和遗憾最小化。

详情
Comments
21 pages, 5 figures
AI中文摘要

激励设计研究中央机构如何通过支付、补贴或税收影响策略性智能体,使个体目标与集体福利一致。本文针对连续动作空间和私有智能体成本的非线性博弈,提出了一个无遗憾自适应激励设计(RAID)框架。在该框架中,机构(规划者)设计激励措施,将纳什均衡调节到社会最优行动配置,同时从重复的策略响应中学习智能体的未知偏好。我们形式化了RAID问题,并构建了一个最小二乘估计器,其强一致性仅需递减激励。利用这一弱激励要求,我们提出了一种切换激励策略,在探测(探索)和基于估计(利用)的激励之间交替。所得策略几乎必然实现$O(t^{-0.5})$的参数估计速率和$O(t^{0.5}\log t)$的平方社会成本遗憾。我们进一步将框架扩展到内生噪声响应模型,其中由于噪声与智能体响应之间的变量误差相关性,标准最小二乘估计存在偏差。我们利用重复采样估计器和相应的切换策略,保持相同的几乎必然收敛和遗憾速率。数值实验验证了该方法的有效性和预测的收敛速率。

英文摘要

Incentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially optimal action profile, while simultaneously learning agents' unknown preferences from repeated strategic responses. We formulate the RAID problem and construct a least-squares estimator whose strong consistency requires only diminishing excitation. Leveraging this weak excitation requirement, we propose a switching incentive policy that alternates between probing (exploration) and estimate-based (exploitation) incentives. The resulting policy achieves an $O(t^{-0.5})$ parameter estimation rate and accumulates $O(t^{0.5}\log t)$ squared social-cost regret, almost surely. We further extend the framework to an endogenous-noise response model, where standard least-squares estimation is biased due to an error-in-variables correlation between the noise and agent responses. We utilize a repeated-sampling estimator and corresponding switching policy that retain the same almost-sure convergence and regret rates. Numerical experiments validate the effectiveness and predicted convergence rates of the method.

2606.02448 2026-06-02 eess.SP cs.SD

Diffusion-Based Heart Sound Generation: Evaluation with Physiological Signal Metrics, Classifiers, and Expert Listening

基于扩散的心音生成:使用生理信号指标、分类器和专家听诊评估

Xinqi Bao, Jia Bi, Xin Chen, Ernest Nlandu Kamavuako, Saikat Chatterjee

AI总结 提出一种在log-mel域上的类别条件扩散模型用于生成心音图,通过生理指标、下游分类准确率和专家听诊评估合成保真度,并分析了异常声学线索保留和重建伪影等挑战。

详情
AI中文摘要

公开可用的心音图(PCG)数据集在规模和病理多样性方面仍然有限,限制了听诊训练和自动心音分类器的泛化能力。本文在log-mel域上开发了一种用于PCG生成的类别条件扩散模型,并使用互补的(i)生理启发的合理性指标、(ii)下游标签一致性评估和(iii)专家听诊来评估合成保真度。实验使用Phy-sioNet/Computing in Cardiology Challenge 2016数据集(3240条记录)进行记录级划分。经过预处理和质量控制后,将16,749个不重叠的4秒片段映射到归一化的1×128×128 log-mel表示,以训练带有无分类器引导的条件2D U-Net去噪器。使用三个轻量级指标在重建波形上量化信号级合理性:包络自相关节律评分、基于幅度的爆炸评分和主周期滞后。合成片段保留了相似的主周期持续时间,但与真实片段相比,包络周期性降低,瞬态突发性增加。在下游评估中,ResNet-50分类器在保留的真实测试集上达到92.24%的准确率,在类别平衡的合成批次上达到82.8%的准确率,表明生成信号保留了与正常/异常分类相关的判别结构。在一项初步的专家听诊研究(60个片段,两名临床医生)中,大多数合成片段被判断为类似心音,而真实和合成的4秒片段对异常敏感性均较低。总体而言,结果为基于扩散的PCG生成提供了实用基线,同时突出了在保留异常声学线索和减少重建伪影方面的剩余挑战。

英文摘要

Publicly available phonocardiogram (PCG) datasets remain limited in size and pathological diversity, constraining both auscultation training and the generalisation of automated heart-sound classifiers. A class-conditional diffusion model for PCG generation is developed in the log-mel domain and synthetic fidelity is assessed using complementary (i) physiology-inspired plausibility metrics, (ii) downstream label-consistency evaluation, and (iii) expert listening. Experiments use the Phy-sioNet/Computing in Cardiology Challenge 2016 dataset (3240 recordings) with recording-level splits. After preprocessing and quality control, 16,749 non-overlapping 4 s clips are mapped to a normalised 1 x 128 x 128 log-mel representation to train a conditional 2D U-Net denoiser with classifier-free guidance. Signal-level plausibility is quantified on reconstructed waveforms using three lightweight metrics: an envelope-autocorrelation rhythm score, an amplitude-based explosion score, and the dominant cycle lag. Synthetic clips preserve similar dominant cycle durations but exhibit reduced envelope periodicity and increased transient burstiness relative to real clips. For downstream evaluation, a ResNet-50 classifier achieves 92.24% accuracy on the held-out real test set and 82.8% accuracy on class-balanced synthetic batches, indicating that generated signals retain discriminative structure relevant to normal/abnormal classification. In a pilot expert listening study (60 clips, two clinicians), most synthetic clips are judged as heart-sound-like, while abnormality sensitivity is low for both real and synthetic 4 s excerpts. Overall, the results provide a practical baseline for diffusion-based PCG generation while highlighting remaining challenges in retaining abnormal acoustic cues and reducing reconstruction-induced artefacts.

2606.02369 2026-06-02 eess.SP cs.IT math.IT

Lossy Microwave Linear Analog Computer (MiLAC) for Future MIMO: Learning-based Architecture Designs for Spectral and Energy Efficiency Maximization

面向未来MIMO的有损微波线性模拟计算机(MiLAC):基于学习的频谱效率和能量效率最大化架构设计

Binggui Zhou, Bruno Clerckx

AI总结 针对有损MiLAC辅助MIMO系统中干扰抑制与硬件损耗/功耗之间的权衡,提出基于学习的联合架构与性能优化框架(LJAPOF),实现频谱效率和能量效率最大化。

详情
Comments
15 pages, 7 figures, 1 table. This paper has been submitted to IEEE journal for possible publication
AI中文摘要

微波线性模拟计算机(MiLAC)通过将复杂信号处理转移到模拟域,为未来多输入多输出(MIMO)系统提供了变革性范式,从而显著降低计算复杂度、射频链和模数转换器数量,同时加快计算速度。然而,MiLAC的实际部署受到互连MiLAC端口的可调导纳组件(TAC)固有硬件损耗的严重限制,这些损耗引入了严重的流间干扰,从根本上限制了系统的频谱效率(SE)。此外,虽然更密集的架构提供了更大的空间自由度来减轻流间干扰,但大量TAC的累积硬件损耗和功耗严重降低了系统的能量效率(EE)。因此,设计有损MiLAC的架构成为一个关键但尚未解决的挑战,因为它需要在干扰抑制与累积硬件损耗/功耗之间取得微妙的平衡。为应对这一挑战,本文研究了有损MiLAC辅助MIMO系统中的联合MiLAC架构设计与性能(SE/EE)最大化问题。我们提出了一种新颖的基于学习的联合架构与性能优化框架(LJAPOF),该框架在面向SE和EE的目标下,统一了MiLAC架构和模拟波束成形配置的设计。数值结果表明,通过智能地导航干扰抑制与硬件/功耗之间的基本权衡,所提出的LJAPOF能够设计出最优的MiLAC架构,在最大化系统SE和EE方面始终优于干连接和全连接MiLAC。

英文摘要

Microwave linear analog computers (MiLACs) offer a transformative paradigm for future multiple-input multiple-output (MIMO) systems by shifting complex signal processing into the analog domain, thereby significantly reducing computational complexity, radio-frequency chains, and analog-digital converters, while speeding up computation. However, the practical deployment of MiLACs is severely constrained by the inherent hardware losses of the tunable admittance components (TACs) interconnecting MiLAC ports, which introduce severe inter-stream interference and fundamentally limit the spectral efficiency (SE) of the system. In addition, while denser architectures offer greater spatial degrees of freedom to mitigate inter-stream interference, the cumulative hardware losses and power consumption of massive TACs severely degrade the system's energy efficiency (EE). Consequently, designing architectures for lossy MiLACs emerges as a critical yet unresolved challenge, as it necessitates striking a delicate tradeoff between interference suppression and cumulative hardware losses/power consumption. To address this challenge, this paper investigates the joint MiLAC architecture design and performance (SE/EE) maximization in lossy MiLAC-aided MIMO systems. We propose a novel learning-based joint architecture and performance optimization framework (LJAPOF) that unifies the design of MiLAC architectures and analog beamforming configurations for lossy MiLACs under both SE- and EE-oriented objectives. Numerical results demonstrate that by intelligently navigating the fundamental tradeoff between interference suppression and hardware/power consumption, the proposed LJAPOF can design optimal MiLAC architectures that consistently outperform stem-connected and fully-connected MiLACs in maximizing the system's SE and EE.

2606.02368 2026-06-02 cs.NI cs.SY eess.SY

Certified Closed-Loop Control for Packet Networks: A Compositional Certification Framework

数据包网络的认证闭环控制:一种组合认证框架

Muhammad Bilal, Jon Crowcroft, Xiaolong Xu, Huaming Wu

AI总结 本文提出一种组合认证框架,通过在提议者与数据平面之间插入认证算子,将候选动作投影到满足证书的可执行动作或报告不可行并执行有量化松弛的默认动作,从而保证数据包网络在延迟、部分状态信息等条件下的安全性、稳定性与组合性。

详情
Comments
29 pages, 11 figures, 3 tables
AI中文摘要

数据包网络是具有不连续性、延迟观测和部分状态信息的受控动态系统。自适应或学习驱动的提议者可以提升性能,但不安全的提议仍可能导致饥饿、尾延迟尖峰或不稳定的队列行为。本文将数据包网络控制视为一个执行动作认证问题。认证算子位于任何提议者与数据平面之间。在每个控制周期,提议者发出一个任意的候选动作 $\tilde u(t)$。算子要么将其投影为满足配置编译证书的可执行动作 $u(t)$,要么报告不可行并执行一个始终定义的具有量化松弛的默认动作。证书还导出一个可审计的包络 $\bar z(t)$ 用于下游组合。保证是有条件的且明确的。它们适用于算子报告已认证、声明的到达包络和积压界限有效且平台实现了假设的服务下限的周期。在这些条件下,一种机制涵盖了积压上限、服务下限、缓解上限、Foster-Lyapunov漂移约束和组合包络契约。我们证明了算子级安全性、使用导出包络的前馈组合安全性和稳定性,以及在小增益条件下的循环闭包结果。我们还定义了违约和不可行语义,讨论了将认证目标与实际调度器行为联系起来的服务跟踪因子的校准,并在延迟遥测、延迟执行、弱提议者、包络不匹配、过载和毫秒级认证下评估了该设计。本评估在字节级闭环后端验证了认证执行边界;部署级调度器跟踪留待未来的Linux或硬件实验。

英文摘要

Packet networks are controlled dynamical systems with discontinuities, delayed observations, and partial state information. Adaptive or learning-driven proposers can improve performance, but an unsafe proposal may still cause starvation, tail-delay spikes, or unstable queue behaviour. This paper treats packet-network control as an executed-action certification problem. A certified operator sits between any proposer and the dataplane. At each control tick, the proposer emits an arbitrary candidate action $\tilde u(t)$. The operator either projects it to an executable action $u(t)$ that satisfies a configuration-compiled certificate, or reports INFEASIBLE and executes an always-defined fallback with quantified slack. The certificate also exports an auditable envelope $\bar z(t)$ for downstream composition. The guarantees are conditional and explicit. They apply on ticks where the operator reports CERTIFIED, the declared arrival envelope and backlog bound are valid, and the platform realises the assumed service lower bound. Under these conditions, one mechanism covers backlog caps, service floors, mitigation caps, Foster--Lyapunov drift constraints, and compositional envelope contracts. We prove operator-level safety, feed-forward compositional safety and stability using exported envelopes, and a cyclic closure result under a small-gain condition. We also define breach and infeasibility semantics, discuss calibration of the service-tracking factor that links certified targets to realised scheduler behaviour, and evaluate the design under delayed telemetry, delayed actuation, weak proposers, envelope mismatch, overload, and millisecond-scale certification. The present evaluation validates the certified execution boundary in a byte-level closed-loop backend; deployment-level scheduler tracking is left to future Linux or hardware experiments.

2606.02327 2026-06-02 eess.AS

Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets

利用噪声不可分离性进行基于噪声目标弱监督判别语音去噪

Matthew Maciejewski, Samuele Cornell

AI总结 提出一种利用噪声估计抵消语音估计中残留噪声的方法,通过联合训练人工和自然噪声混合物实现领域自适应,在WHAM!和CHiME-3基准上验证有效性。

详情
Comments
Submitted to IWAENC 2026
AI中文摘要

语音去噪不仅是人类听觉的常见需求,也是下游处理系统在嘈杂真实声学条件下缺乏鲁棒性时的必要步骤。不幸的是,去噪问题中传统的域内监督训练并非易事,因为训练目标无法由人工标注:生成自然噪声语音记录的干净版本本身就是待解决的任务。监督训练通常通过向干净语音录音人工添加噪声来进行,这些噪声只能来自受控领域,由于神经网络的域外泛化能力差,这是一个重大限制。另一种方法是噪声目标训练(NyTT),它简单地将干净语音替换为域内噪声记录,希望学习去除人工噪声能推广到自然噪声。尽管NyTT显示出有希望的结果,但其训练目标并非由干净语音估计最小化。我们表明,除了估计自然噪声语音外,同时估计人工噪声,实际上可以利用不良最优解:通过简单减法,语音估计中的残留噪声可以被噪声估计抵消。关键是,该最优解与常规人工混合物完全兼容,使得能够使用两种类型的数据进行联合训练,优化目标一致,从而为改进领域适应性打开了大门。我们通过WHAM!和CHiME-3基准测试证明了我们方法的有效性。

英文摘要

Speech denoising is an often necessary step not only for human listening, but also for downstream processing by systems lacking robustness to noisy, real-world acoustic conditions. Unfortunately, denoising is a problem where conventional in-domain supervised training is not trivial, as the training targets cannot be annotated by humans: producing a clean version of a naturally-noisy speech recording is itself the task to solve. Supervised training is typically performed through the artificial addition of noise to clean speech recordings, which can only be sourced from controlled domains, a significant limitation due to the poor out-of-domain generalization of neural networks. An alternative is noisy target training (NyTT), which simply replaces the clean speech with in-domain noisy recordings, with the hope that learning to remove the artificial noise will extend to the natural. Though having shown promising results, NyTT's training objective is not minimized by clean speech estimates. We show that by estimating the artificial noise in addition to the naturally-noisy speech, the undesirable optimum can actually be exploited: the residual noise in the speech estimate can be canceled by the noise estimate via simple subtraction. Crucially, the optimum is fully compatible with conventional artificial mixtures, enabling joint training using both types of data with consistent optimization targets, opening the door to improved domain adaptability. The effectiveness of our approach is demonstrated through WHAM! and CHiME-3-based benchmarks.

2606.02281 2026-06-02 eess.SP

Distributed MoE-based Uplink Detection for Cell-Free Communication Systems

基于分布式MoE的无蜂窝通信系统上行检测

Le Zhao, Xuesong Pan, Xinyi Wang, Zhong Zheng, Zesong Fei

AI总结 提出分布式混合专家检测网络(DMoE-DetNet),通过每个AP作为本地专家使用CNN进行非线性特征提取,CPU中注意力编码器融合全局时空依赖,门控网络动态加权各AP贡献,最后线性检测器输出符号概率,显著优于传统线性处理方法。

详情
AI中文摘要

无蜂窝大规模多输入多输出(MIMO)被认为是超越5G网络的关键技术,其中分布式接入点(AP)联合服务用户设备(UE),以解决蜂窝系统中固有的小区间干扰问题。传统的分布式信号检测方法在性能和前传负载之间提供了实用的平衡,但它们从根本上受到线性处理限制。在本文中,我们通过引入分布式混合专家检测网络(DMoE-DetNet),提出了一种新颖的基于深度学习的上行检测框架。在该架构中,每个AP充当本地专家,采用卷积神经网络(CNN)进行非线性特征提取,并将本地最小均方误差(MMSE)检测结果和统计信道信息传输到中央处理单元(CPU)。在CPU中,基于注意力的编码器模块捕获用户间复杂的时空依赖关系以进行全局特征融合,中央处理器的门控网络动态加权来自不同AP的贡献。最后,线性检测器输出符号概率。仿真结果表明,所提出的DMoE-DetNet在符号错误率方面显著优于传统的基于线性处理的无蜂窝信号检测方法,展示了人工智能赋能通信系统的潜力。

英文摘要

Cell-free Massive multiple input and multiple output (MIMO) is recognized as a key technology for beyond-5G networks, where distributed access points (APs) jointly serve user equipments (UEs) to address the inherent inter-cell interference issue inherent in cellular systems. While conventional distributed signal detection methods offer a practical balance between performance and fronthaul load, they are fundamentally limited by linear processing constraints. In this paper, we propose a novel deep learning based uplink detection framework by introducing the distributed mixture of experts detection network (DMoE-DetNet). In this architecture, each AP acts as a local expert employing convolutional neural networks (CNNs) for non-linear feature extraction, and transmits the local minimum mean square error (MMSE) detection results and statistical channel information to the central processing unit (CPU). In the CPU, an attention-based encoder module captures complex spatio-temporal dependencies among users for global feature fusion, with a gating network at the central processor dynamically weighting the contributions from different APs. At last, a linear detector outputs the symbol probability. Simulation results demonstrate that the proposed DMoE-DetNet significantly outperforms conventional linear processing based cell-free signal detection methods in terms of symbol error rate, showcasing the potential of artificial intelligence-enabled communication systems.

2606.02278 2026-06-02 eess.SY cs.LG cs.SY

Physics-Guided Recurrent State-Space Neural Networks for Multi-Step Prediction

物理引导的循环状态空间神经网络用于多步预测

Ruiyuan Li, Ajay Seth, Manon Kok

AI总结 提出PG-RSSNN,一种结合物理知识和循环结构的状态空间神经网络,通过缓解梯度消失和数值发散风险,在有限数据和部分物理模型下提升多步预测性能。

详情
Comments
6 pages, 3 figures. Accepted at IFAC World Congress 2026
AI中文摘要

状态空间模型传统上基于物理知识,但由于模型不准确,这些物理模型的多步预测可能较差。黑盒深度学习作为替代方案显示出潜力,但这些方法依赖于大量数据集的可用性,且潜在可用的物理知识被忽略。我们提出PG-RSSNN,一种物理引导的循环状态空间神经网络,它结合循环结构以在多步预测中使用非饱和激活函数。它缓解了梯度消失,并消除了现有结构中因反馈状态估计而导致的训练数值发散风险。在多个具有不同物理模型不完善性的系统上(从带高斯噪声的线性状态空间模型到机械臂和级联水箱系统)的实验结果表明,与黑盒神经网络和纯物理模型相比,所提出的PG-RSSNN即使在训练数据有限且物理模型仅部分已知的情况下,也能保持稳定的训练行为,并改善多步预测。

英文摘要

State-space models are traditionally based on physical knowledge, but multi-step predictions from these physical models can be poor due to model inaccuracy. Black-box deep learning has shown promise as an alternative. However, these methods rely on the availability of large datasets and potentially available physical knowledge is neglected. We propose the PG-RSSNN, a physics-guided recurrent state-space neural network that incorporates recurrent structures to enable the use of non-saturating activation functions in multi-step prediction. It mitigates the vanishing gradients and eliminates the risk of numerical divergence in training seen in existing structures that feed back state estimates. Results across multiple systems with various physical model imperfections, from linear state-space models with Gaussian noise to a robotic arm and a cascaded water tank system, show that the proposed PG-RSSNN maintains stable training behavior, and improves multi-step predictions, as compared with black-box neural networks and physics-only models, even with limited training data and when physical models are only partially known.

2606.02272 2026-06-02 eess.SP

Scattering Environment Aware Joint Multi-BS Channel Estimation and Localization with Clock Asynchronism

散射环境感知的联合多基站信道估计与时钟异步定位

Yani Chi

AI总结 针对基站与用户间时钟异步导致散射体定位精度下降的问题,提出利用多基站共享散射体信息的联合信道估计与定位方案,通过期望最大化框架迭代优化用户位置、时间偏移及散射体参数。

详情
AI中文摘要

基站(BS)与用户之间的时钟异步会显著降低散射体定位精度。为解决这一问题,本文提出了一种多基站联合信道估计与定位方案,该方案利用多个基站之间的共享散射体信息。首先,通过利用多基站信道的联合稀疏性,在位置域进行信道建模。随后,仅基于到达角(AoA)估计开发了一种多基站散射体关联算法。利用共享散射体以及散射体、基站和用户设备(UE)之间的几何关系,获得UE位置和时间偏移的粗略估计。基于这些散射体位置、UE位置和时间偏移的粗略估计,采用期望最大化(EM)框架。具体而言,迭代优化UE位置和时间偏移,同时联合实现散射体位置和信道系数的高精度估计。仿真结果表明,与基线方法相比,所提方案在信道估计和定位精度方面均实现了显著提升。

英文摘要

Clock asynchronism between base stations (BSs) and users significantly degrades scatterer localization accuracy. To address this issue, this paper proposes a multi-BS joint channel estimation and localization scheme that exploits shared scatterer information among multiple BSs. First, channel modeling in the location domain is performed by leveraging the joint sparsity of multi-BS channels. Subsequently, a multi-BS scatterer association algorithm is developed based solely on Angle of Arrival (AoA) estimates. By utilizing the shared scatterers and the geometric relationships among the scatterers, BSs, and the user equipment (UE), coarse estimates of the UE location and timing offsets are obtained. Based on these coarse estimates of scatterer locations, UE location, and timing offsets, an expectation-maximization (EM) framework is employed. Specifically, the UE location and timing offsets are iteratively refined while jointly enabling high-precision estimation of scatterer locations and channel coefficients. Simulation results demonstrate that the proposed scheme achieves significant improvements in both channel estimation and localization accuracy compared with baseline methods.

2606.02251 2026-06-02 cs.RO cs.AI eess.SP

FW-NKF: Frequency-Weighted Neural Kalman Filters

FW-NKF: 频率加权神经卡尔曼滤波器

Adnan Harun Dogan, Berken Utku Demirel, Christian Holz

AI总结 提出频率加权神经卡尔曼滤波器(FW-NKF),通过将因果谱整形算子嵌入卡尔曼测量残差并联合学习观测和状态转移网络,抑制频带受限噪声,在混沌系统和惯性姿态估计等任务中定位误差降低达10%。

详情
Comments
Published at ICRA 2026
AI中文摘要

鲁棒状态估计是机器人自主性的核心,然而经典卡尔曼滤波器难以应对频率相关干扰和模型失配,如传感器振动、电磁干扰和周期性噪声。尽管深度卡尔曼滤波器(DKF)变体通过学习潜在状态转移扩展了扩展卡尔曼滤波(EKF)框架,但它们缺乏明确的机制来抑制在实际场景中通常污染传感器测量的带限噪声分量。我们引入了频率加权神经卡尔曼滤波器(FW-NKF),这是一种统一的混合方法,将因果谱整形算子嵌入卡尔曼测量残差,并联合学习观测网络和状态转移网络。通过同时调整滤波器频谱和潜在状态表示,FW-NKF在抑制噪声主导频带的同时捕获复杂的残差结构。我们在四个异构基准上进行了广泛实验,包括混沌系统(如多维洛伦兹系统)和全身惯性姿态估计,发现定位误差降低高达10%,且方向精度显著提升。我们的消融研究证实,频率加权和深度潜在状态建模对整体性能有贡献。

英文摘要

Robust state estimation is central to robotic autonomy, yet classical Kalman filters struggle with frequency-dependent disturbances and model mismatch such as sensor vibrations, electromagnetic interference, and periodic noise. Although Deep Kalman Filter (DKF) variants extend the Extended Kalman Filtering (EKF) framework by learning latent transitions, they lack explicit mechanisms to suppress band-limited noise components that typically corrupt sensor measurements in real-world scenarios. We introduce the Frequency-Weighted Neural Kalman Filter (FW-NKF), a unified hybrid approach that embeds a causal spectral-shaping operator into the Kalman measurement residual and jointly learns observation, and transition networks. By adapting both the filter spectrum and the latent state representation, FW-NKF attenuates the noise-dominated frequency bands while capturing complex residual structures. We conduct extensive experiments on four heterogeneous benchmarks, including chaotic systems such as multi-dimensional Lorenz systems and full-body inertial pose estimation, and find a reduction in localization error of up to 10% as well as marked improvements in orientation accuracy. Our ablation studies confirm that frequency weighting and deep latent-state modeling contribute to overall performance.

2606.02185 2026-06-02 eess.AS

Breaking the Pair: Evaluating Dyadic Interaction via Speaker Switching

打破配对:通过说话者切换评估二元交互

Nishchay Nilabh, Neeraj Kumar Sharma

AI总结 提出说话者切换测试和二元距离矩阵(DDM)来评估对话表示是否捕获真正的交互结构,实验表明真实DDM与切换后的DDM可区分。

详情
AI中文摘要

对话中的说话者不断在声学、词汇和语义维度上调整其交际行为,这种现象称为对话性适应。建模这一过程需要捕获交互全局结构的表示,但先前的方法未能将二元特定模式与说话者特定特征分离,限制了其捕获真实对话适应的能力。我们通过二元距离矩阵(DDM)解决这一问题,该矩阵编码整个对话中两个说话者话轮之间的所有成对相似性,捕获长距离跨说话者依赖。这引发了一个关键问题:DDM是代表真实的交互,还是仅仅反映个体说话者特征?我们提出说话者切换测试,这是一种原则性控制方法,其中将一个说话者的话轮替换为来自不同对话中无关说话者的话轮。这保留了话轮级统计信息,同时破坏了原始的二元共同适应。区分真实DDM与切换后DDM的能力直接评估表示是否编码了交互特定结构。在CANDOR语料库上,使用四种嵌入类型和包括ResNet-50在内的分类器,真实DDM始终与切换后的DDM可区分。与LibriSpeech的比较显示,在朗读语音中可区分性更高,突出了自然对话中韵律变异的作用。GradCAM分析进一步揭示了驱动分类的独特结构特征。这些结果将说话者切换测试确立为验证二元对话交互表示的稳健诊断方法。

英文摘要

Speakers in dialogue continuously adapt their communicative behavior across acoustic, lexical, and semantic dimensions, a phenomenon known as conversational entrainment. Modeling this process requires representations that capture the global structure of interaction, yet prior approaches fail to disentangle dyad-specific patterns from speaker-specific traits, limiting their ability to capture true conversational adaptation. We address this with the Dyadic Distance Matrix (DDM), which encodes all pairwise similarities between the turns of two speakers over an entire conversation, capturing long-range cross-speaker dependencies. This raises a key question: does the DDM represent genuine interaction, or merely reflect individual speaker characteristics? We propose the speaker-switch test, a principled control in which one speaker's turns are replaced with those from an unrelated speaker drawn from a different conversation. This preserves turn-level statistics while disrupting the original dyadic coadaptation. The ability to distinguish real from switched DDMs thus directly evaluates whether the representation encodes interaction-specific structure. Across four embedding types and classifiers including ResNet-50 on the CANDOR corpus, real DDMs are consistently distinguishable from their switched counterparts. Comparisons with LibriSpeech show higher discriminability in read speech, highlighting the role of prosodic variability in naturalistic conversations. GradCAM analysis further reveals distinct structural signatures driving classification. These results establish the speaker-switch test as a robust diagnostic for validating representations of dyadic conversational interaction.

2606.02156 2026-06-02 eess.IV cs.AI cs.CV cs.IR cs.LG

Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel

基于术前肠道血供映射预测结直肠吻合口漏风险

Zahra Tabatabaei, Jon Sporring, Mark Bremholm Ellebæk, Alaa El-Hussuna

AI总结 提出一种基于术前CT影像的AI驱动系统,通过分析血管和组织特征量化吻合口漏风险,并结合内容检索支持临床决策。

详情
AI中文摘要

吻合口漏仍然是结直肠癌手术后最严重的并发症之一,显著影响患者预后、康复轨迹和医疗成本。尽管影像技术有所进步,目前的术前评估仍依赖临床评估,这一过程主观、易出错且高度依赖个人经验。迄今为止,尚无经过验证的基于CT的方法能够在术前预测吻合口漏风险。本方案论文概述了一个全面的框架,用于开发和验证一个AI驱动的系统,该系统利用对比增强前后的CT影像进行术前风险评估。研究描述了数据收集、伦理处理、符合GDPR的患者数据预处理、图像预处理以及旨在生成临床可解释输出的深度学习架构探索等阶段。该工作流程的两个主要成果是:1) 风险评估模块,通过分析CT扫描中的血管和组织特征量化漏液可能性;2) 基于内容的医学图像检索(CBMIR)模块,识别并显示相似历史病例以支持循证手术决策。该方案论文需要医院和大学之间的密切合作;本方案表明,此类系统在现有医疗基础设施内技术上可行且临床可实施。通过遵循所提出的方法论阶段和监管原则,其他机构可以复制此工作流程以开发类似的决策支持工具。最终,这一跨学科框架旨在加强手术规划、减少漏液发生率,并推动向可解释、数据驱动的精准手术的更广泛范式转变。

英文摘要

Anastomotic leak remains one of the most serious complications following colorectal cancer surgery, substantially affecting patient outcomes, recovery trajectories, and healthcare costs. Despite advances in imaging technology, current preoperative assessment relies only on clinical assessment, a process that is subjective, error-prone, and highly dependent on individual expertise. To date, no validated CT-based method exists to predict anastomotic leak risk prior to surgery. This protocol paper outlines a comprehensive framework for developing and validating an AI-driven system for preoperative risk assessment using pre- and post-contrast CT imaging. The study describes the stages of data collection, ethical handling, and preprocessing of patient data in accordance with GDPR, image preprocessing, and the exploration of deep learning architectures designed to generate clinically interpretable outputs. Two integrated tools constitute the main deliverables of this workflow: 1) a risk assessment module, which quantifies the likelihood of leakage by analyzing vascular and tissue features in CT scans, and 2) a Content-Based Medical Image Retrieval (CBMIR) module, which identifies and displays similar historical cases to support evidence-based surgical decision making. The protocol paper requires close collaboration between hospitals and universities; this protocol demonstrates that such a system is technically feasible and clinically implementable within existing healthcare infrastructures. By following the proposed methodological stages and regulatory principles, other institutions can reproduce this workflow to develop analogous decision-support tools. Ultimately, this interdisciplinary framework aims to enhance surgical planning, reduce leak incidence, and contribute to a broader paradigm shift toward explainable, data-driven precision surgery.

2606.02151 2026-06-02 cs.AI cs.SY eess.SY

S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty

S3TS:面向不确定性下高级规划的随机情景结构化树搜索

Fabio Pavirani, Bert Claessens, Pierre Pinson, Chris Develder

AI总结 提出随机情景结构化树搜索(S3TS)算法,通过情景树显式表示不确定性并集成非线性模型,在需求响应信号发布问题上实现近最优性能,成本比最优解高14%以内,在非线性场景中比贪心算法和确定性MCTS分别降低51%和5.4%的成本。

详情
AI中文摘要

能源领域的有效调度对于确保电网及其连接资产的可靠运行至关重要,例如通过优化发电机组和储能系统的调度。有效的规划策略必须(a)适应先进且可能非线性的系统模型——利用现代电网日益增长的数据可用性,以及(b)显式处理由可再生能源整合等引起的不确定性。虽然现有方法可以处理非线性(例如蒙特卡洛树搜索)或不确定性(例如随机数学优化),但缺乏能够同时应对这两个挑战的规划技术。为填补这一空白,我们提出了一种随机情景结构化树搜索(S3TS)算法,该算法通过情景树显式表示不确定性,同时能够集成先进的非线性模型。我们在一个模拟的需求响应信号发布问题上评估了S3TS,该问题很大程度上模仿了比利时的失衡结算机制。结果表明,在线性、可解析处理的设置中,S3TS实现了接近最优的性能,成本在情景树条件下比数学最优解高14%以内。在高度非线性的场景中,S3TS显著优于基线方法,与贪心算法和确定性MCTS相比,成本分别降低了51%和5.4%。

英文摘要

Effective scheduling in the energy sector is essential to ensure the reliable operation of electrical grids and their connected assets by, for instance, optimizing the dispatch of generation units and storage systems. An effective planning strategy must (a) accommodate advanced and potentially non-linear system models -- exploiting the increasing data availability of modern grids, and (b) explicitly handle uncertainties arising, for instance, from the integration of renewable energy sources. While existing approaches can address either non-linearity (e.g., Monte Carlo Tree Search) or uncertainty (e.g., stochastic mathematical optimization), there is a lack of planning techniques capable of addressing both challenges simultaneously. To bridge this gap, we propose a Stochastic Scenario-Structured Tree Search (S3TS) algorithm that explicitly represents uncertainty through scenario trees while enabling the integration of advanced non-linear models. We evaluate S3TS on a simulated demand response signal publication problem, largely mimicking the imbalance settlement mechanism in Belgium. The results demonstrate near-optimal performance in linear, analytically tractable settings, with costs within 14% of the mathematically optimal solution conditioned to the scenario trees. In highly non-linear scenarios, S3TS significantly outperforms baseline methods, achieving cost reductions of up to 51% and 5.4% compared to a myopic algorithm and deterministic MCTS, respectively.

2606.02127 2026-06-02 eess.AS cs.SD

Localizing broadband noise sources using the Loève spectrum and a 2.5D approach

使用Loève谱和2.5D方法定位宽带噪声源

Christian H. Kasess, Wolfgang Kreuzer, Holger Waubke

AI总结 针对移动宽带随机声源定位问题,提出一种基于2.5D设置和Loève谱的逆定位方法,推导了移动源功率谱密度与静态接收器Loève谱的关系,并通过多窗估计实现源定位。

详情
Comments
31 pages, 13 figures
AI中文摘要

使用麦克风阵列定位移动声源通常基于修改信号以补偿多普勒效应。在时域中,这种补偿是逐样本进行的。在频域中,需要使用短时间片段,其中假设多普勒效应近似恒定,并对每个片段进行离散傅里叶变换。相比之下,作者开发了一种针对均匀移动单频源的逆2.5D定位方法,该方法在谱域中工作,并允许使用更长的窗口。这是通过修改2.5D正向模型以直接计算运动在静态观察者位置的影响来实现的。该方法既不需要修改测量信号,也不需要在所使用的窗口内要求测量准平稳。不幸的是,这种方法不直接适用于宽带随机源,在本文中,我们将研究均匀移动随机源在静态观察者处观测时其统计特性如何变化。使用2.5D设置,推导了移动源功率谱密度与静态接收器处互谱密度推广形式——Loève谱之间的关系。基于速度高达100 m/s的模拟数据,本文提供了一种基于多窗估计Loève谱的方法的概念验证,用于定位移动宽带随机源。目前,该方法要求源信号平稳,并且谱密度在感兴趣频率附近的一定范围内平坦。此外,目前不考虑源之间的相关性。

英文摘要

The localization of moving sound sources using a microphone array is typically based on modifying the signal to compensate for the Doppler effect. In the time domain this compensation is done on a sample-by-sample basis. In the frequency domain short time segments need to be used in which the Doppler effect is assumed to be approximately constant and a discrete Fourier transform is done on each segment. In contrast, the authors developed an inverse 2.5D localization method for uniformly moving single-frequency sources that works in the spectral domain and allows for the use of longer windows. This was achieved by modifying the 2.5D forward model to directly compute the effect of the motion in the static observer position. The method does neither require to modify the measured signal nor does it require quasi-stationary of the measurements within the window used. Unfortunately, this approach is not directly suitable for broad-band stochastic sources, and in the present work we will investigate how the statistical properties of a uniformly moving stochastic source change when observed at a static observer. Using a 2.5D setting, the relation between the power spectral density of the moving source and the Loève spectrum, which is a generalization of the cross-spectral density at the static receivers, was derived. Based on simulated data with speeds up to 100 m\,s$^{-1}$, the work presented here provides a proof of concept for a method based on multi-taper estimates for the Loève spectrum to localize moving broad-band stochastic sources . Currently, the method requires a stationary source signal and that the spectral density is flat within a certain range around the frequency of interest. Also, correlations between sources are currently not considered.

2606.02114 2026-06-02 math.OC cs.SY eess.SY

Switched Event-Triggered Adaptive Control of Reaction-Diffusion PDE-ODE with Neural Operator Implementation

反应扩散PDE-ODE的切换事件触发自适应控制及神经算子实现

Hongpeng Yuan, Ji Wang, Mamadou Diagne

AI总结 针对反应扩散PDE-ODE级联系统的不确定性,提出基于切换事件触发机制的自适应边界控制,利用深度神经算子逼近反推核函数,实现全局指数稳定和L2全局渐近调节。

详情
AI中文摘要

本文针对一类反应扩散PDE-ODE级联系统开发了切换事件触发自适应边界控制,其中ODE中的系统和输入矩阵以及PDE中的空间变化反应系数是不确定的。构建了两步反推变换以导出连续时间控制律。然后基于切换事件触发机制,提出了一种新颖的PDE-ODE级联动态事件触发控制策略,确保闭环系统的全局指数稳定性,取代了基于反推的经典动态ETC通常实现的指数收敛,同时固有地排除Zeno行为。为了解决PDE-ODE级联中的不确定性,开发了自适应更新律,导致时变增益核,通过事件触发控制机制自适应调度。此外,为了促进高效的实时实现,采用深度神经算子(DeepONet)将反推核近似为从估计参数到核函数的映射,从而消除了在线重复求解核PDE的需要。通过结合事件触发机制、参数自适应和核逼近误差影响的Lyapunov分析,我们证明了所得闭环系统的L2全局渐近调节。总之,本文的主要贡献有三方面:(i)为反应扩散PDE-ODE级联系统开发了基于自适应DeepONet的框架;(ii)将现有的反应扩散PDE自适应事件触发控制设计扩展到具有更复杂不确定性的情况;(iii)将具有全局指数稳定性的切换动态ETC推广到PDE-ODE级联。通过数值仿真证明了所提出方法的有效性。

英文摘要

This paper develops a switched event-triggered adaptive boundary control for a class of reaction-diffusion PDE-ODE cascade systems, where the system and input matrices in the ODE as well as the spatially-varying reaction coefficient in the PDE are uncertain. A two-step backstepping transformation is constructed to derive the continuous-time control law. Then a novel dynamic event-triggered control strategy for the PDE-ODE cascade is proposed based on a switched event-triggering mechanism, ensuring global exponential stability of the closed-loop system in place of the exponential convergence commonly achieved with backstepping-based classical dynamic ETC, while inherently excluding Zeno behavior. To address the uncertainties in the PDE-ODE cascade, adaptive update laws are developed, leading to time-varying gain kernels that are adaptively scheduled through the event-triggered control mechanism. Furthermore,to facilitate efficient real-time implementation, deep neural operators (DeepONet) are employed to approximate the backstepping kernels as mappings from the estimated parameters to kernel functions, thereby eliminating the need to repeatedly solve kernel PDEs online. Through a Lyapunov analysis that incorporates the effects of the event-triggering mechanism, parameter adaptation, and kernel approximation errors, we prove the $L^2$ global asymptotic regulation of the resulting closed-loop system. In summary, the key contributions of the paper are threefold: (i) developing an adaptive DeepONet-based framework for reaction-diffusion PDE-ODE cascade systems; (ii) extending the existing adaptive event-triggered control design for reaction-diffusion PDEs to the case with more complex uncertainties; and (iii) generalizing switched dynamic ETC with global exponential stability to PDE-ODE cascades. The effectiveness of the proposed approach is demonstrated through numerical simulations.

2606.02102 2026-06-02 eess.SP

Spectrum Anomaly Detection in OFDMA Systems: Simulation Framework and Benchmark Dataset

OFDMA系统中的频谱异常检测:仿真框架与基准数据集

Anton Schösser, Mohammadhadi Salehi, Sinuo Ma, Philipp Schulz, Gerhard Fettweis

AI总结 针对OFDMA系统中频谱异常检测缺乏公开数据集的问题,提出一个包含五种干扰类型的基准数据集和可扩展的开源仿真框架,并给出监督与无监督学习方法的基线评估。

详情
Comments
12 pages, 9 figures, submitted for publication to IEEE Open Journal of the Communications Society
AI中文摘要

无线连接支撑着现代社会和工业,使得关键应用如工业自动化的5G超可靠低延迟通信(URLLC)成为可能。然而,无线介质的开放性使其暴露于频谱异常,包括无意干扰和恶意干扰,这威胁着5G和新兴6G网络中的通信与感知功能。尽管频谱异常检测至关重要,但缺乏反映真实场景的公开数据集阻碍了研究进展。为此,我们提出了一个用于正交频分多址接入(OFDMA)系统(5G及未来的核心技术)中频谱异常检测的基准数据集。该数据集包含分布在分布式传感单元网络中的频谱图,覆盖五种不同的干扰类型,从简单噪声到高级导频感知攻击。这些异常是在工业工厂环境中使用一个多功能开源框架模拟的,该框架作为本工作的一部分开发并发布,支持扩展到新场景和干扰类型。我们提供了监督和无监督学习方法的基线评估,展示了不同干扰机带来的挑战,并突出了进一步研究的领域。该数据集和框架支持可重复研究,并作为推进频谱异常检测的基础,其应用可扩展到网络数字孪生。通过弥补公开数据集可用性的差距,本工作使研究社区能够验证和比较用于弹性下一代无线系统的高级检测方法。

英文摘要

Wireless connectivity underpins modern society and industry, enabling critical applications such as 5G ultra-reliable low-latency communication (URLLC) for industrial automation. However, the openness of the wireless medium exposes it to spectrum anomalies, including unintentional interference and malicious jamming, which threaten communication and sensing functionalities in 5G and emerging 6G networks. Despite its importance, spectrum anomaly detection research is hindered by a lack of publicly available datasets reflecting real-world scenarios. To address this, we present a benchmark dataset for spectrum anomaly detection in orthogonal frequency-division multiplexing access (OFDMA) systems, a core technology for 5G and beyond. The dataset includes spectrograms generated across a distributed network of sensing units, covering five distinct jammer types, from simple noise to advanced pilot-aware attacks. These anomalies are simulated in an industrial factory environment using a versatile open-source framework developed and published as part of this work, enabling extensibility to new scenarios and interference types. We provide baseline evaluations for supervised and unsupervised learning methods, demonstrating the challenges posed by different jammers and highlighting areas for further research. The dataset and framework support reproducible studies and serve as a foundation for advancing spectrum anomaly detection, with applications extending to network digital twins. By bridging the gap in open dataset availability, this work empowers the research community to validate and compare advanced detection methods for resilient next-generation wireless systems.

2606.02092 2026-06-02 eess.IV cs.AI cs.CV

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE:用于土地覆盖估计的轻量级Transformer架构

Ümit Mert Çağlar, Alptekin Temizel

AI总结 提出LALE架构,通过分辨率分支编码器(轻量级ConvMixer处理高分辨率局部特征,Transformer处理低分辨率全局上下文)和全MLP多尺度解码器,在遥感图像分割中实现高效性能与计算成本的平衡。

详情
AI中文摘要

遥感图像的语义分割需要模型在严格的计算预算下同时捕捉全局上下文和局部细节。先前的工作通常针对这些轴之一进行优化:注意力用于全局上下文,卷积用于局部细节,或紧凑性用于效率。虽然混合方法旨在同时捕捉两者,但它们需要架构更改和带有计算开销的编码器骨干,限制了效率和性能。我们提出了LALE(用于土地覆盖估计的轻量级Transformer架构),一种端到端的遥感图像分割架构,它通过分辨率分支编码器:轻量级ConvMixer阶段处理高分辨率局部特征,而Transformer阶段处理低分辨率全局上下文,将自注意力的二次成本限制在深层、下采样的特征图上。全MLP多尺度解码器,以及贯穿始终的RMSNorm和StarReLU,进一步减少了计算量和参数数量。在大型ARAS400k遥感分割基准上,LALE相对于CNN、Transformer和混合基线建立了强大的效率-性能权衡。我们最小的变体(仅1.6M参数)在F1分数上达到最佳基线(UPerNet)的2.6分以内,同时使用4.5倍更少的参数、7倍更少的存储、17倍更少的GMACs,并提供1.8倍更高的吞吐量。

英文摘要

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

2606.02077 2026-06-02 eess.SP cs.SY eess.SY

Integrated Sensing and Covert Communication In Low-Altitude Networks: A Smart Radio Environment Perspective

低空网络中的集成感知与隐蔽通信:智能无线电环境视角

Jianyu Wei, Haichao Wang, Laixian Peng, Jiangchun Gu, Ziqi Liu, Lifeng Chen, Guoru Ding

AI总结 针对低空网络中通信安全问题,本文引入智能无线电环境技术,通过联合优化飞行轨迹、干扰功率、可移动天线位置等参数,实现集成感知与隐蔽通信的吞吐量最大化。

详情
AI中文摘要

低空经济和6G的兴起正在推动低空网络(LAN)的发展,使通信安全成为紧迫问题。与传统安全方法不同,隐蔽通信通过隐藏传输行为本身提供更强的保护。集成感知与通信(ISAC)作为6G的关键技术,通过硬件集成高效支持感知和通信任务,从而为隐蔽通信带来显著增益。然而,城市环境的复杂性和动态性带来了关键挑战。借鉴智能无线电环境(SRE)技术的最新进展,本文将其引入集成感知与隐蔽通信(ISACC)中,以抑制隐蔽信道衰落并抵消LAN中的感知精度损失。我们首先调查了ISACC在LAN中的应用和最新研究成果,强调了关键的实际挑战。随后,我们介绍了SRE的核心概念,并从四个维度详细阐述了其使能技术。为了提供更多见解,我们探索了将SRE集成到ISACC中的潜在途径。为了最大化隐蔽吞吐量,我们通过联合优化飞行轨迹、干扰功率、可移动天线位置、带宽分配和波束成形向量,进行了基于强化学习的案例研究。仿真结果表明,所提方案相比基准方案实现了优越的性能。最后,讨论了一些开放挑战和潜在方向。

英文摘要

The rise of low-altitude economies and 6G is driving the evolution of low-altitude networks (LANs), making communication security a pressing concern. Unlike traditional security approaches, covert communication offers enhanced protection by hiding the transmission behavior itself. Integrated sensing and communication (ISAC), a key technology of 6G, efficiently supports both sensing and communication tasks through hardware integration, thereby promising significant gains for covert communication. Nevertheless, the complexity and dynamics of urban environments pose critical challenges. Drawing on the latest advances in smart radio environment (SRE) technologies, this paper introduces them into integrated sensing and covert communication (ISACC) to suppress covert channel fading and counteract sensing precision loss in LANs. We first survey the applications and state-of-the-art findings of ISACC in LANs, highlighting key practical challenges. Subsequently, we introduce the core concept of SRE and elaborate on its enabling techniques across four dimensions. To deliver more insights, we explore potential pathways for integrating SRE into ISACC. To maximize covert throughput, a reinforcement learning-based case study is conducted by jointly optimizing flight trajectory, jamming power, movable antenna position, bandwidth allocation, and beamforming vectors. Simulation results show that the proposed scheme achieves superior performance compared to the benchmark. Finally, some open challenges and potential directions are discussed.

2606.02075 2026-06-02 eess.SY cs.SY

Detecting Cyber Attacks in Power System AGC Using a Drifted Ornstein-Uhlenbeck Process

利用漂移Ornstein-Uhlenbeck过程检测电力系统AGC中的网络攻击

Mingqiu Du, Xiaozhe Wang, Qinglai Guo

AI总结 提出一种基于漂移多元Ornstein-Uhlenbeck过程的最大似然估计方法,用于快速准确检测自动发电控制(AGC)系统中的虚假数据注入攻击,优于传统未知输入观测器和LSTM自编码器方法。

详情
AI中文摘要

自动发电控制(AGC)系统依赖于通信网络上的实时测量,容易受到隐蔽的虚假数据注入攻击(FDIAs),导致设备损坏和经济损失。我们提出了一种鲁棒的FDIA检测方法,该方法使用漂移多元Ornstein-Uhlenbeck(OU)过程的最大似然估计(MLE)。独立于负载可观测性,在各种网络攻击场景下,所提出的FDIA检测方法能够准确快速地检测复杂的FDIAs,优于传统未知输入观测器(UIO)方法(漏检)和长短期记忆自编码器(LSTM-AE)方法(检测时间过长)。

英文摘要

The Automatic Generation Control (AGC) system, reliant on real-time measurements over communication networks, is susceptible to stealthy false data injection attacks (FDIAs), risking equipment damage and economic losses. We propose a robust FDIA detection method using maximum likelihood estimation (MLE) of a drifted multivariate Ornstein-Uhlenbeck (OU) process. Independent of load observability, in various cyberattack scenarios, the proposed FDIA detection method delivers accurate and rapid detection of sophisticated FDIAs, outperforming traditional unknown input observer (UIO) methods, which miss detections, and Long Short-Term Memory Autoencoder (LSTM-AE) approaches, which suffer from prolonged detection times.

2606.02000 2026-06-02 cs.CV cs.AI eess.IV

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

迈向3D感知视频扩散模型:基于网格标记化的无渲染人体运动控制

Jingyun Liang, Min Wei, Shikai Li, Yizeng Han, Hangjie Yuan, Lei Sun, Weihua Chen, Fan Wang

AI总结 提出一种无渲染框架,通过压缩的3D人体网格标记直接条件化视频生成,实现精确的人体运动控制,减少2D引导伪影并提升3D结构建模能力。

详情
Comments
Project page: https://jingyunliang.github.io/MeshToken/
AI中文摘要

扩散模型在视频生成方面取得了显著成功。然而,这类模型是否真正感知视觉观察背后的3D结构,而不仅仅是生成合理的2D投影,仍是一个开放问题。本文通过人体运动控制这一任务来探究该问题,该任务需要对人体3D几何、运动、相机视角和场景上下文进行精确建模。与依赖渲染的2D运动引导视频的先前方法不同,我们提出了一种无渲染框架,直接基于压缩的3D人体网格标记条件化视频生成。该表示保留了完整的3D几何信息,同时实现了统一的基于标记的生成流程,在DiT架构中联合处理视频标记和运动标记。这种设计要求模型在视频生成过程中联合推理外观、3D结构和相机视角。实验结果表明,该方法在人体运动控制基准上表现强劲,同时减少了由视角依赖的2D引导和编辑过程中轨迹-姿态不匹配引起的伪影。这些发现表明,配备网格标记化的视频扩散模型能够更好地捕捉复杂的3D人体结构及其与周围环境的交互。

英文摘要

Diffusion models have shown remarkable success in video generation. However, whether such models are truly aware of the 3D structure underlying visual observations, rather than simply reproducing plausible 2D projections, remains an open question. In this work, we investigate this question through human motion control, a task that requires precise modelling of 3D human geometry, motion, camera viewpoint, and scene context. Unlike prior methods that rely on rendered 2D motion guidance videos, we propose a render-free framework that conditions video generation directly on compressed 3D human mesh tokens. This representation preserves full 3D geometric information while enabling a unified token-based generation pipeline that processes video tokens jointly with motion tokens in a DiT-based architecture. This design requires the model to reason jointly about appearance, 3D structure, and camera viewpoint during video generation. Experimental results demonstrate strong performance on human motion control benchmarks, while reducing artifacts induced by view-dependent 2D guidance and trajectory-pose mismatches during editing. These findings suggest that video diffusion models, when equipped with mesh tokenization, can better capture complex 3D human structures and their interactions with the surrounding environment.

2606.01972 2026-06-02 eess.SY cs.SY

AI-Based KPI Prediction Methods in Future 6G Networks: A Survey

基于AI的未来6G网络KPI预测方法:综述

Niloofar Mehrnia, Gourav Prateek Sharma, Samie Mostafavi, Andreas Johnsson, Sinem Coleri, Carlo Fischione, James Gross

AI总结 本文系统综述了面向未来6G网络的数据驱动KPI预测方法,提出了多维分类法,分析了从经典统计模型到深度学习和强化学习的代表性方法,并讨论了部署挑战和未来研究方向。

详情
Comments
Submitted to IEEE Communications Surveys and Tutorials, 30 pages
AI中文摘要

从5G到5G-Advanced的演进以及6G的愿景要求前所未有的网络性能水平,其中满足严格的网络关键性能指标(KPI),包括容量、延迟、覆盖和可靠性,对于支持自动驾驶、工业自动化和沉浸式通信等新兴应用至关重要。在这种背景下,传统的反应式网络管理已不足够,推动了对预测性、数据驱动方法的需求。机器学习(ML)已成为关键使能技术,能够从多种数据源预测KPI趋势,从而在移动网络中实现主动的、AI原生的自动化。本综述首次对面向未来6G网络的数据驱动KPI预测方法进行了全面系统的回顾。我们引入了一个多维分类法,根据KPI类型、数据源、预测KPI的网络协议栈层次、预测时间范围、模型家族和预测目标对预测方法进行分类。利用该分类法,我们分析了各种KPI的最新研究进展,重点介绍了从经典统计模型到深度学习和强化学习的代表性方法。我们进一步讨论了使能系统方面,包括数据收集和学习架构,并考察了部署挑战,包括数据可用性、可扩展性、隐私和可持续性。最后,我们概述了开放的研究方向,涵盖新的KPI定义、概率性和可解释性预测。本综述旨在为研究人员和从业者提供对KPI预测领域的结构化理解,以及未来6G系统中预测性网络自动化的路线图。

英文摘要

The evolution from 5G to 5G-Advanced and the vision of 6G demand unprecedented levels of network performance, in which meeting stringent network Key Performance Indicators (KPIs), including capacity, latency, coverage, and reliability, is critical to supporting emerging applications such as autonomous driving, industrial automation, and immersive communications. Traditional reactive network management is insufficient in this context, driving the need for predictive, data-driven approaches. Machine Learning (ML) has emerged as a key enabler, enabling the forecasting of KPI trends from diverse data sources and thereby enabling proactive, AI-native automation in mobile networks. This survey provides the first comprehensive and systematic review of data-driven KPI prediction methods for future 6G networks. We introduce a multi-dimensional taxonomy that classifies prediction approaches by KPI type, data source, the network protocol stack at which the KPI is predicted, prediction horizon, model family, and prediction objective. Using this taxonomy, we analyze the state of the art across various KPIs, highlighting representative methods ranging from classical statistical models to deep learning and reinforcement learning. We further discuss enabling system aspects, including data collection and learning architectures, and examine deployment challenges, including data availability, scalability, privacy, and sustainability. Finally, we outline open research directions spanning new KPI definitions, probabilistic and explainable predictions. This survey aims to provide researchers and practitioners with a structured understanding of the KPI prediction landscape and a roadmap toward predictive network automation in future 6G systems.

2606.01970 2026-06-02 cs.RO cs.MA cs.SY eess.SY

Market-Based Replanning for Safety-Critical UAV Swarms in Search and Rescue Missions

基于市场重规划的搜救任务中安全关键无人机群

Luiz Giacomossi, Andrea Haglund, Claire Namatovu, Emily Zainali, Esaias Målqvist, Yonatan M. Beyene, Ivan Tomasic, Baran Çürüklü, Håkan Forsberg

AI总结 提出一种分布式协调架构IRDS,通过反向拍卖市场机制和几何共识协议,在无人机故障下自主重分配任务,在25%退化下保持93%任务成功率。

详情
Comments
6 pages, 4 figures, accepted at MIPRO 2026
AI中文摘要

搜救任务中可靠自主无人机群需要能够容忍代理退化并维持操作的容错协调。本文介绍了智能重规划无人机群(IRDS),一种为资源受限环境设计的分布式协调架构。所提出的框架采用反向拍卖市场机制,其中代理基于距离加权成本函数竞标服务搜索区域,并结合几何共识协议进行目标验证。我们通过物理仿真(N=8个代理,8x8网格)评估该方法,并施加随机故障注入。结果表明,无人机群能够以相对于总任务持续时间较低的延迟自主重新分配来自故障代理的任务,在25%劳动力退化下保持93%的任务成功率。所提出的框架展示了一种稳健的、经过实证测试的空中机器人自愈协调方法。

英文摘要

Reliable autonomous UAV swarms in Search and Rescue (SAR) missions require fault-tolerant coordination capable of sustaining operations despite agent degradation. This paper introduces the Intelligent Replanning Drone Swarm (IRDS), a distributed coordination architecture designed for resource-constrained environments. The proposed framework employs a Reverse-Auction market mechanism where agents bid to service search sectors based on a distance-weighted cost function, coupled with a geometric consensus protocol for target verification. We evaluate the approach through physics-based simulations (N=8 agents, 8x8 grid) subjected to stochastic fault injection. Results indicate that the swarm autonomously reallocates tasks from failed agents with low latency relative to the total mission duration, maintaining a mission success rate of 93% under 25% workforce degradation. The proposed framework demonstrates a robust, empirically tested method for self-healing aerial robotic coordination.

2606.01965 2026-06-02 eess.SP cs.NA math.NA

High-order synchrosqueezed wavelet-chirplet transform for instantaneous frequency and chirprate estimation

高阶同步压缩小波-切普变换用于瞬时频率和切普率估计

Shuixin Li, Jiecheng Chen, Qingtang Jiang, Gang Yu

AI总结 提出高阶同步压缩小波-切普变换(HSWCT),通过放松线性切普假设并推导高阶IF和切普率重分配算子,实现对交叉瞬时频率多分量信号的准确估计,并首次给出任意阶重分配算子的近似误差理论分析。

详情
AI中文摘要

具有交叉瞬时频率(IF)曲线的多分量信号的分离仍然是时频分析中的一个基本挑战。尽管同步压缩小波-切普变换(SWCT)通过引入切普率变量增强了时频可读性,但其有效性受限于局部线性切普的基本假设。因此,该方法在分析具有强频率调制的信号时表现不佳。本文通过放松线性切普假设扩展了SWCT框架。我们将信号分量建模为在短区间内具有多项式相位行为,并推导了高阶IF和切普率重分配算子的紧凑表达式。所提出的高阶同步压缩小波-切普变换(HSWCT)能够准确估计IF和切普率,并支持即使在IF曲线相交的情况下也能稳健地提取模态。另一个关键贡献是对用于IF和切普率估计的任意阶重分配算子的近似误差进行了严格的数学分析。当切普率为零时,HSWCT简化为传统的高阶同步压缩小波变换。据我们所知,文献中尚无关于任意阶SST IF重分配算子对IF的近似理论分析。作为本工作的副产品,我们建立的定理提供了这样的分析,从而填补了高阶SST理论框架中的空白。

英文摘要

The separation of multicomponent signals with crossing instantaneous frequency (IF) curves remains a fundamental challenge in time-frequency analysis. Although the synchrosqueezed wavelet-chirplet transform (SWCT) enhances time-frequency readability by introducing a chirprate variable, its effectiveness is constrained by the underlying assumption of local linear chirp. Consequently, this method does not perform well when analyzing signals characterized by strong frequency modulation. This paper extends the SWCT framework by relaxing the linear chirp assumption. We model signal components as having polynomial phase behavior over short intervals and derive compact expressions for high-order IF and chirprate reassignment operators. The proposed high-order synchrosqueezed wavelet-chirplet transform (HSWCT) enables accurate estimation of both IF and chirprate, and supports robust mode retrieval even with intersecting IF curves. Another key contribution is a rigorous mathematical analysis of the approximation errors of arbitrary-order reassignment operators for IF and chirprate estimation. When the chirprate vanishes, HSWCT simplifies to the traditional high-order synchrosqueezed wavelet transform. To our best knowledge, no theoretical analysis exists in the literature on the approximation of arbitrary-order SST IF reassignment operators to the IF. As a by-product of this work, our established theorem provides such an analysis, thereby filling a gap in the theoretical framework of high-order SSTs.

2606.01959 2026-06-02 eess.SY cs.SY

Anti-Windup in PID Control: Review, Analysis, and New Tuning Directions

PID控制中的抗饱和:综述、分析与新整定方向

Malena Caparroz, Kristina Soltesz, Tore Hägglund, José Luis Guzmán

AI总结 针对PID控制中执行器饱和导致的积分饱和问题,本文综述并比较了多种抗饱和技术,提出了一种结合条件积分与动态反算的新型混合策略,并开发了针对负载扰动抑制的反算方案跟踪时间常数系统整定规则。

详情
AI中文摘要

执行器饱和是一种基本非线性特性,通过引发积分饱和显著降低PID控制系统的性能,导致超调、恢复缓慢甚至不稳定。尽管已提出众多抗饱和策略,但在许多工业场景中,其实用整定仍主要依赖启发式方法且次优。本文对PI控制的一阶加纯滞后(FOPDT)过程在广泛操作条件下的经典和先进抗饱和技术进行了全面比较研究。分析包括动态和瞬时反算、条件积分以及改进方案。此外,提出了一种新型混合抗饱和策略,将条件积分与动态反算相结合,以改善饱和期间的响应性,同时保持平滑的恢复动态。本文的一个关键贡献是为反算方案中的跟踪时间常数开发了系统整定规则,特别针对负载扰动抑制进行了优化。这些规则源于一项广泛的优化研究,考虑了饱和比、控制器激进度和扰动特性。所得准则提供了简单而有效的公式,无需复杂计算即可实现接近最优的性能。仿真结果表明,所提方法显著优于常用的启发式规则,特别是在扰动抑制场景中,并为工业应用中抗饱和策略的选择和整定提供了清晰实用的建议。

英文摘要

Actuator saturation is a fundamental nonlinearity that significantly degrades the performance of PID-controlled systems by inducing integrator windup, leading to overshoot, slow recovery, and even instability. Although numerous anti-windup strategies have been proposed, their practical tuning remains largely heuristic and suboptimal in many industrial scenarios. This paper presents a comprehensive comparative study of classical and advanced anti-windup techniques for PI-controlled first-order-plus-dead-time (FOPDT) processes under a wide range of operating conditions. The analysis includes dynamic and instantaneous back-calculation, conditional integration, and adapted schemes. In addition, a novel hybrid anti-windup strategy is proposed, combining conditional integration with dynamic back-calculation to improve responsiveness during saturation, whilst preserving smooth recovery dynamics. Moreover, a key contribution of this work is the development of systematic tuning rules for the tracking time constant in back-calculation schemes, specifically optimised for load-disturbance rejection. These rules are derived from an extensive optimisation study that considers the saturation ratio, controller aggressiveness, and disturbance characteristics. The resulting guidelines provide simple yet effective formulas that achieve near-optimal performance without requiring complex computations. Simulation results demonstrate that the proposed methods significantly outperform commonly used heuristic rules, particularly in disturbance rejection scenarios, and provide clear, practical recommendations for selecting and tuning anti-windup strategies in industrial applications.

2606.01909 2026-06-02 cs.SD cs.AI eess.AS

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Echo: 一种用于共享潜在空间中说话人日志和语音识别的联合嵌入预测架构

Louis Mouchon

AI总结 提出Echo系统,基于单个25M参数ViT编码器,通过JEPA预训练和分阶段特化,在512维潜在空间中联合实现说话人日志、语音分离和语音识别,无需部署时微调。

详情
Comments
18 pages, 17 tables, 1 figure. Proof-of-concept, independent research
AI中文摘要

我们提出Echo,一个围绕单个25M参数ViT编码器构建的概念验证音频系统。该编码器使用JEPA目标进行预训练,然后分阶段特化,以在同一个512维潜在空间中承载说话人身份、语音内容和动态源路由,部署时无需针对每个任务进行微调。轻量级头部处理说话人日志(ArcFace + VBx)和动态源分离(空目标K集预测)。在未知K的合成VoxCeleb2混合数据上,标准堆栈达到15.00%的盲DER、97.80%的PIT分离准确率,潜在SI-SDR提升+9.52 dB,以及在留出k-NN探针上说话人/内容因子化差距为+53.50分。Echo的意义不在于任何单一任务上的新SOTA,而在于三个任务在一个编码器上以这种规模共同共存。我们逐阶段记录了设计,报告了死胡同,并识别了通过VQ瓶颈进行端到端ASR的结构性障碍,该瓶颈仍然限制了PoC。

英文摘要

We present Echo, a proof-of-concept audio system built around a single 25 M-parameter ViT encoder. The encoder is pretrained with a JEPA objective and then specialised by stages to carry speaker identity, phonetic content, and dynamic source routing in the same 512-dimensional latent space, with no per-task fine-tuning at deployment. Light heads handle diarization (ArcFace + VBx) and dynamic source separation (null-target K-set prediction). On synthetic VoxCeleb2 mixtures with unknown K, the canonical stack reaches 15.00% blind DER, 97.80% PIT separation accuracy with +9.52 dB latent SI-SDR, and a +53.50-point speaker/content factorisation gap on a held-out k-NN probe. The point of Echo is not a new SOTA on any single task but the joint coexistence of three tasks on one encoder at this footprint. We document the design stage by stage, report the dead-ends, and identify the structural wall on end-to-end ASR through the VQ bottleneck that still bounds the PoC.

2606.01905 2026-06-02 eess.AS cs.SD

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

通过语音-文本表示学习推进电喉语音增强

Ding Ma, Jinyi Mi, Fengji Li, Lester Phillip Violeta, Jiajun He, Wenchin Huang, Kazuhiro Kobayashi, Tomoki Toda

AI总结 提出一种融合语音和文本表示的学习框架,通过序列到序列语音转换模型改进电喉语音到正常语音的映射与重建质量,实验证明优于仅依赖语音表示的方法。

详情
Journal ref
IEEE Transactions on Biomedical Engineering, Early Access, 2026
Comments
15 pages, 7 figures. Accepted to IEEE TBME
AI中文摘要

目的:喉切除者依赖机电设备产生电喉(EL)语音。与正常语音相比,EL语音存在严重失真、有限的语音变化、不自然的韵律和时间偏移,降低了自然度和可懂度。尽管基于序列到序列(seq2seq)语音转换(VC)的EL语音到正常语音转换(EL2SP)很有前景,但EL与正常语音之间的显著不匹配不可避免地导致累积映射误差,限制了性能。为解决这一问题,我们描述了一种新颖的表示学习框架,该框架整合语音和文本表示,以改善seq2seq VC模型内的映射和重建质量。方法:我们的方法包括两个主要阶段:1)表示整合与学习,以及2)重建训练。首先构建一个能够融入辅助文本信息的网络,使用预训练模块学习基于语音-文本的整合表示。然后,采用自编码器风格的重建策略完成EL2SP模型,以继承这些表示而不增加模型复杂度。我们引入了三种融合策略,包括中级、输入级和混合级融合策略,逐步增强学习。此外,除了标准的seq2seq VC目标外,还引入了对整合表示的额外重建损失,以细化表示迁移。结果:在不同EL2SP数据集上的实验一致表明,我们的方法结合数据增强,优于仅依赖语音表示的基线方法。此外,随着系统设计深度的逐步改进验证了我们方法的有效性。意义:所提出的方法为EL语音增强和辅助通信技术提供了一种可扩展且实用的方法。

英文摘要

Objective: laryngectomees depend on an electromechanical device to generate electrolaryngeal (EL) speech. Compared with normal speech, EL speech suffers from severe distortion, limited phonetic variation, unnatural prosody, and temporal shifts, degrading naturalness and intelligibility. Although sequence-to-sequence (seq2seq) voice conversion (VC) based EL-speech-to-normal-speech conversion (EL2SP) is promising, substantial mismatches between EL and normal speech inevitably cause cumulative mapping errors that limit performance. To address this, we describe a novel representation learning framework integrating speech and text representations to improve mapping and reconstruction quality within a seq2seq VC model. Methods: our methodology comprises two main stages: 1) representation integration and learning, and 2) reconstruction training. A network capable of incorporating auxiliary text information is first constructed with pretrained modules to learn speech--text-based integrated representations. Then, an autoencoder-style reconstruction strategy finalizes EL2SP model to inherit these representations without increasing model complexity. We introduce three fusion strategies including middle-, input-, and hybrid-level fusion strategies that progressively enhance learning. Moreover, besides standard seq2seq VC objectives, an additional reconstruction loss on the integrated representation is introduced to refine representation transfer. Results: experiments under different EL2SP datasets consistently demonstrate that our methods, combined with data augmentations, outperform baselines relying solely on speech representations. Furthermore, progressive improvements with system design depth validate the effectiveness of our methods. Significance: the proposed methods provide an extensible and practical methodology for EL speech enhancement and assistive communication technologies.

2606.01902 2026-06-02 eess.SP

LO-Free Joint Communication and Sensing via Inter-Antenna Cross-Correlation and Graph-Based Spatial Phase Inference

无需本振的联合通信与感知:基于天线间互相关和图结构空间相位推断

Hasan Atalay Günel, Mohaned Chraiti, Ali Görçin

AI总结 提出一种无需本振的联合通信与感知接收机架构,利用天线间互相关处理抑制载波分量并合成相对相位观测,实现数据通信和到达角估计,通过图推断恢复相位并推导克拉美-罗下界。

详情
Comments
This paper has been accepted to PIMRC 2026
AI中文摘要

联合通信与感知通常依赖相干下变频来恢复阵列处理所需的相位关系。同时,本振是毫米波和亚太赫兹接收机中成本、功耗和实现复杂性的主要来源。现有的无本振接收机设计通常基于包络检测或相关的非相干操作,这些操作不保留支路间相位信息,限制了其在JCAS中的应用。本文提出一种无本振的JCAS接收机架构,利用成对支路间互相关处理来抑制公共载波分量,并合成天线阵列上的相对相位可观测量,从而实现数据通信和到达角估计。发射符号被设计为诱导不同的相位差模式,使得相关相位同时包含数据依赖分量和DoA依赖分量。我们将恢复问题建模为在相关图上的推断,其中支路为节点,成对相关为边,并表明由此产生的循环一致性冗余能够在噪声和扰动下实现鲁棒的相对相位恢复。我们进一步在局部解缠绕近似下推导了用于DoA估计的拓扑感知克拉美-罗下界。数值结果证实,增加图连接性可同时改善误码率和DoA精度,且感知性能接近所推导的下界。

英文摘要

Joint communication and sensing (JCAS) typically rely on coherent downconversion to recover the phase relationships required for array processing. Meanwhile, Local Oscillators (LOs) are a major source of cost, power consumption, and implementation complexity in millimeter-wave (mmWave) and sub-THz receivers. Existing LO-free receiver designs are typically based on envelope detection or related non-coherent operations that do not preserve inter-branch phase information, which limits their applicability to JCAS. This work proposes an LO-free JCAS receiver architecture that leverages pairwise inter-branch correlation processing to suppress the common carrier component and to synthesize relative-phase observables across the antenna array, enabling both data communication and Direction-of-Arrival (DoA) estimation. The transmitted symbols are designed to induce distinct phase-difference patterns, such that the resulting correlation phases contain both a data-dependent component and a DoA-dependent component. We formulate recovery as inference over a correlation graph, where branches are nodes and pairwise correlations are edges, and show that the resulting cycle-consistent redundancy enables robust relative-phase recovery under noise and perturbations. We further derive a topology-aware Cramér-Rao lower bound for DoA estimation under a locally unwrapped approximation. Numerical results confirm that increasing graph connectivity improves both bit-error rate and DoA accuracy, with sensing performance approaching the derived bound.

2606.01899 2026-06-02 eess.SP cs.AI

RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

RA-LWLM:基于检索增强的上下文无线定位基础模型

Guangjin Pan, Hui Chen, Hei Victor Cheng, Henk Wymeersch

AI总结 提出RA-LWLM框架,通过将场景特定信息外化到指纹数据库,实现无需训练的跨场景无线定位,利用冻结的无线基础模型编码器、检索模块和基于Transformer的上下文学习模块预测用户位置。

详情
Comments
13 pages, 9 figures. This work has been submitted to the IEEE for possible publication
AI中文摘要

无线定位是第六代(6G)网络的基本能力。传统的基于模型的方法需要对传播环境进行精确建模,在复杂的多径和非视距场景中性能下降,而基于学习的方法将模型参数紧密耦合到训练场景中,每当基站(BS)配置或传播环境变化时需要昂贵的重新训练。在本文中,我们提出RA-LWLM,一种检索增强的上下文定位框架,通过将场景特定信息外化到每个场景的指纹数据库(而非编码在模型权重中)来实现无需训练的跨场景适应。该框架由三个组件组成:一个冻结的无线基础模型(FM)编码器,将原始信道状态信息映射为场景无关的表示;一个检索模块,通过表示空间中的相似性搜索从每个场景的数据库中选择最具信息量的参考;以及一个基于Transformer的上下文学习(ICL)模块,将查询与检索到的参考融合以预测用户设备(UE)位置。为了适应不同查询的检索质量和传播复杂性,ICL模块采用混合专家设计,其中专家专注于不同的上下文大小,并由可学习的选择器软组合。跨不同BS配置的异构场景的广泛基于射线追踪的实验表明,RA-LWLM在未见和已见场景上实现了几乎相同的精度,无需任何每个场景的重新训练,显著优于端到端和基于FM的基线。这些结果验证了所提出的检索增强上下文范式作为6G网络中跨场景定位的可扩展解决方案。

英文摘要

Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.

2606.01842 2026-06-02 eess.SP

Channel Estimation and Reconstruction in Fluid Antenna Multiple Access: Myths, Misconceptions and Critical Questions

流体天线多址接入中的信道估计与重构:迷思、误解与关键问题

Taissir Y. Elganimi, Pablo Ramírez-Espinosa, David Morales-Jiménez, F. Javier López-Martínez

AI总结 本文质疑了流体天线多址接入(FAMA)中基于全局误差指标的信道估计方法,指出其导致训练开销过大,并提出了四个关键问题以实现FAMA部署。

详情
Comments
7 pages and 6 figures. This work has been submitted to the IEEE for publication
AI中文摘要

流体天线系统(FAS)代表了一种范式转变,其中天线元件(端口)在空间孔径内模拟运动或流动的错觉以优化性能。FAS的关键用例之一是提供开环流体天线多址接入(FAMA),通过空间干扰置零实现复用增益,而无需发射端信道状态信息(CSI)。然而,这需要接收端精确的信道重构以成功识别最佳端口。当前研究将这一感知任务映射为传统的MIMO式估计问题,侧重于最小化全局重构误差,如归一化均方误差(NMSE)。本文认为,由于FAS本质上是基于选择的,类似NMSE的方法往往导致过度的训练开销和净吞吐量下降。我们重新审视FAS中的信道估计与重构问题,挑战一些普遍存在的迷思,涉及:(i) 全局误差指标的适用性;(ii) 重构信道或聚合干扰的便利性;(iii) 空间过采样的必要性;(iv) 端口选择精度的影响。我们还确定了成功实现FAMA部署必须回答的四个关键问题:(i) 选择最优采样律的定义;(ii) 适当重构方法的识别;(iii) 多端口感知与选择增益之间的固有权衡;(iv) 向电子可重构FAS迈进时引入的挑战。

英文摘要

Fluid antenna systems (FAS) represent a paradigm shift in which antenna elements (ports) emulate the illusion of motion or fluidity within a spatial aperture to optimize performance. One of FAS's key use cases is the provision of open-loop fluid antenna multiple access (FAMA), enabling multiplexing gains through spatial interference nulling without requiring channel state information (CSI) at the transmitter side. However, this comes at the price of requiring a precise channel reconstruction at the receiver to successfully identify the optimal port. Current research efforts map this sensing task to a legacy MIMO-style estimation problem focused on minimizing global reconstruction errors such as normalized mean-squared error (NMSE). In this work, we argue that because FAS is inherently selection-based, NMSE-like approaches often lead to excessive training overhead and reduced net throughput. We revisit the problem of channel estimation and reconstruction in FAS, challenging some prevalent myths related to (i) the adequacy of global error metrics; (ii) the convenience of reconstructing channels or aggregate interference; (iii) the need for spatial oversampling; and (iv) the impact of port selection accuracy. We also identify four critical questions that must be answered for successfully enabling FAMA deployments: (i) the definition of a selection-optimal sampling law; (ii) the identification of proper reconstruction methodologies; (iii) the inherent trade-offs between multi-port sensing and selection gain; and (iv) the challenges introduced when moving towards electronically reconfigurable FAS.

2606.01836 2026-06-02 eess.IV

Face Liveness Detection Using RGB and Thermal Image Fusion

使用RGB和热成像融合的人脸活体检测

Merve Erşan, Melike Girgin, Tayfun Akgül

AI总结 提出一种融合RGB图像边缘信息与热成像的方法,利用自定义ARISTOF数据集和YOLOv8-Face模型,有效提升人脸活体检测的鲁棒性。

详情
Comments
Published in ELECO 2026
AI中文摘要

使用可见光谱相机进行人脸检测可以捕获面部特征,但通常难以区分活体对象与照片、面具或雕像等欺骗源。以往基于纹理、运动或生理线索的方法对光照变化敏感,且对欺骗攻击的鲁棒性有限。热成像通过检测热辐射自然排除欺骗人脸,有助于克服这些限制。本研究提出一种混合方法,使用包含活体和欺骗人脸的定制ARISTOF数据集,融合RGB图像的边缘信息与对应的热成像。首先使用YOLOv8-Face模型评估融合图像,比较RGB、热成像和融合模态的人脸检测性能。结果表明,所提方法提高了热成像的人脸检测精度。随后使用融合图像训练YOLOv8-Face模型进行活体和欺骗分类,证明所提多模态融合有效支持鲁棒的人脸活体检测。

英文摘要

Face detection with visible-spectrum cameras can capture facial features, but it often fails to distinguish live subjects from spoof sources such as photographs, masks, or statues. Previous approaches based on texture, motion, or physiological cues are sensitive to illumination changes and show limited robustness against spoofing attacks. Thermal imaging helps overcome these limitations by detecting heat emissions, naturally excluding spoof faces. This study proposes a hybrid approach that fuses the edge information of RGB images with corresponding thermal images using a custom ARISTOF dataset containing live and spoof faces. The fused images are first evaluated using the YOLOv8-Face model to compare face detection performance across RGB, thermal, and fused modalities. The results show that the proposed method enhances the face detection accuracy of thermal images. The fused images are subsequently used to train a YOLOv8-Face model for live and spoof classification, demonstrating that the proposed multimodal fusion effectively supports robust face liveness detection.