arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 114
2605.12467 2026-05-13 eess.SY cs.SY

Towards Closed-loop Stability of Nonlinear Receding Horizon Games

Sophie Hall, Florian Dörfler, Timm Faulwasser

AI总结 本文研究了无终端成分的非线性滚动时域博弈的闭环稳定性问题。通过引入不动点现象,作者在较弱假设下证明了递归可行性的成立,并给出了闭环轨迹实现实际渐近收敛的充分条件。研究还表明,随着预测时域的增加,系统围绕稳态纳什均衡的吸引域呈指数级收缩,这一特性与模型预测控制中的行为一致。此外,引入线性终端惩罚项可有效抑制脱离轨迹现象,确保系统渐近收敛至稳态纳什均衡。

详情
英文摘要

We analyze Receding Horizon Games without any MPC-like terminal ingredients. We show that recursive feasibility can be inferred from the turnpike phenomenon under mild assumptions. Moreover, we prove sufficient conditions for practical asymptotic convergence of the closed-loop trajectories, and we discuss how the gap towards practical asymptotic stability may be closed. We use numerical examples to show that the closed-loop region of attraction around the steady-state GNE shrinks exponentially with the horizon length, a behavior previously known only for model predictive control. Further, we apply a linear end penalty and demonstrate in numerical simulations that it suppresses the leaving arc and ensures asymptotic convergence to the steady-state GNE.

2605.12455 2026-05-13 cs.IT cs.NI eess.SP math.IT quant-ph

Simultaneously Minimizing Storage and Bandwidth Under Exact Repair With Quantum Entanglement

Lei Hu, Mohamed Nomeir, Alptug Aytekin, Sennur Ulukus

AI总结 本文研究了在量子纠缠辅助的分布式存储系统中实现精确修复的再生编码问题,旨在同时最小化存储开销和修复带宽。作者提出了一种基于经典乘积矩阵框架和CSS稳定子形式的方法,在节点失效时,利用存活节点共享的纠缠态进行精确修复,使得新节点能够恢复与原节点完全相同的数据。该方法在节点数满足一定条件时,实现了与功能修复下相同的最优存储与带宽平衡点,为量子增强的分布式存储系统提供了理论支持和实用方案。

详情
英文摘要

We study exact-regenerating codes for entanglement-assisted distributed storage systems. Consider an $(n,k,d,α,β_{\mathsf{q}},B)$ distributed system that stores a file of $B$ classical symbols across $n$ nodes with each node storing $α$ symbols. A data collector can recover the file by accessing any $k$ nodes. When a node fails, any $d$ surviving nodes share an entangled state, and each of them transmits a quantum system of $β_{\mathsf{q}}$ qudits to a newcomer. The newcomer then performs a measurement on the received quantum systems to generate its storage. Recent work [1] showed that, under functional repair where the regenerated content may differ from that of the failed node, there exists a unique optimal regenerating point that \emph{simultaneously minimizes both storage $α$ and repair bandwidth $d β_{\mathsf{q}}$} when $d \geq 2k-2$. In this paper, we show that, under \emph{exact repair}, where the newcomer reproduces exactly the same content as the failed node, this optimal point remains achievable. Our construction builds on the classical product-matrix framework and the Calderbank-Shor-Steane (CSS)-based stabilizer formalism.

2605.12453 2026-05-13 eess.SP cs.AI cs.DB cs.LG cs.NI

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Mannam Veera Narayana, Rohit Singh, Deepa M. R, Radha Krishna Ganti

AI总结 本研究针对高速移动场景下5G用户设备(UE)切换(HO)中断时间长、测量报告开销大等问题,提出了一种基于真实部署网络环境的数据集,涵盖步行、骑行、汽车、公交和火车等多种移动方式及不同速度条件下的UE移动数据。该数据集重点采集了切换过程中的时序提前(TA)测量信息,包括RACH触发、MAC CE和PDCCH授权等关键信令事件,填补了现有研究的空白。该数据集可支持AI/ML模型在切换管理、波束管理和TA预测等场景下的训练与评估,为6G智能移动性研究提供了重要基础。

详情
英文摘要

To address the issues of high interruption time and measurement report overhead under user equipment (UE) mobility especially in high speed 5G use cases the use of AI/ML techniques (AI/ML beam management and mobility procedures) have been proposed. These techniques rely heavily on data that are most often simulated for various scenarios and do not accurately reflect real deployment behavior or user traffic patterns. Therefore, there is an utmost need for realistic datasets under various conditions. This work presents a dataset collected from a commercially deployed network across various modes of mobility (pedestrian, bike, car, bus, and train) and at multiple speeds to depict real time UE mobility. When collecting the dataset, we focused primarily on handover (HO) scenarios, with the aim of reducing the HO interruption time and maintaining continuous throughput during and immediately after HO execution. To support this research, the dataset includes timing advance (TA) measurements at various signaling events such as RACH trigger, MAC CE, and PDCCH grant which are typically missing in existing works. We cover a detailed description of the creation of the dataset; experimental setup, data acquisition, and extraction. We also cover an exploratory analysis of the data, with a primary focus on mobility, beam management, and TA. We discuss multiple use cases in which the proposed dataset can facilitate understanding of the inference of the AI/ML model. One such use case is to train and evaluate various AI/ML models for TA prediction.

2605.12443 2026-05-13 eess.SY astro-ph.IM cs.MS cs.SE cs.SY

Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference

Anubhav Gupta

AI总结 本文提出了一种基于 Docker 的容器化工作流程,用于解决 Basilisk 星载器导航与控制(GN&C)仿真框架在不同开发环境中配置不一致的问题。该方法将完整的构建环境、依赖项和仿真基础设施封装在可移植的容器镜像中,确保了仿真的可重复性和可移植性。文章通过一系列复杂度递增的仿真场景展示了该工作流程的应用,并详细描述了 BSKSim 的类层次结构、动力学模型架构及仿真执行模式,为 GN&C 工程师和研究人员提供了可复现的仿真环境参考。

详情
Comments
21 pages, 8 figures
英文摘要

Basilisk is an open-source astrodynamics simulation framework widely used for spacecraft guidance, navigation, and control (GN&C) research and development. Despite its flexibility and computational capabilities, configuring Basilisk consistently across heterogeneous development environments presents practical challenges due to dependency management, operating system compatibility, and software configuration requirements. This paper presents a Docker-based containerization workflow for Basilisk that encapsulates the complete build environment, dependencies, and simulation infrastructure within a portable container image. The workflow is demonstrated through a progression of simulation scenarios of increasing complexity, from standalone orbital dynamics scripts to BSKSim-based attitude dynamics and control simulations with Monte Carlo analysis. The BSKSim class hierarchy, dynamics model architecture, flight software implementation, and scenario execution patterns are described in detail. The presented workflow provides a self-contained implementation reference for GN&C engineers and researchers seeking reproducible and portable Basilisk simulation environments. This work expands upon a workshop presentation delivered at the 46th Rocky Mountain AAS GN&C Conference, February 2024, available at https://doi.org/10.5281/zenodo.15008785.

2605.12434 2026-05-13 eess.SP

Massive MIMO CSI Feedback with Spiking Neural Networks

Yanzhen Liu, Geoffrey Ye Li

AI总结 本文研究了基于脉冲神经网络(SNN)的大规模MIMO系统信道状态信息(CSI)反馈问题,提出了一种名为SpikingCSINet的新方法,通过脉冲信号实现反馈与网络计算。为了解决二值脉冲在高维重建中的信息瓶颈问题,设计了一种渐进残差架构,利用SNN的时序特性增强信息紧凑性。实验表明,该方法在性能与效率之间取得了更好的平衡,相比轻量卷积基线和Transformer基线,能耗降低超过93%。

详情
英文摘要

Deep learning-based channel state information (CSI) feedback has achieved empirical success in massive multiple-input multiple-output (MIMO) systems. However, existing approaches largely rely on dense artificial neural networks (ANNs), whose computational overhead limits their practical applications. In this article, we exploit bio-inspired spiking neural networks (SNNs) for massive MIMO CSI feedback, referred to as SpikingCSINet, where both the feedback and the main network computations are implemented through spikes. To overcome the information bottleneck of binary spikes in high-dimensional reconstruction, we develop a progressive residual (PR) architecture that exploits the natural temporal dimension of SNNs, encoding successive residuals across time steps to enhance information compactness. Experiments on the COST 2100 benchmark show that SpikingCSINet attains a more favorable performance-efficiency tradeoff than lightweight convolutional baselines. Moreover, it achieves performance competitive with Transformer-based feedback while reducing energy consumption by over $93\%$.

2605.12297 2026-05-13 cs.CV cs.RO eess.IV

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

Luming Wang, Hao Shi, Jiajun Zhai, Kailun Yang, Kaiwei Wang

AI总结 本文提出EgoEV-HandPose,一种基于立体事件相机的端到端框架,用于解决第一人称视角下的3D双手姿态估计与手势识别问题。核心方法KeypointBEV通过将特征提升至统一的鸟瞰视角,并结合迭代重投影引导的优化循环,有效解决了深度不确定性与运动模糊问题。同时,研究还发布了首个大规模真实场景立体事件相机数据集EgoEVHands,显著提升了低光和双手遮挡场景下的性能,为事件相机在第一人称感知领域的发展提供了新基准。

详情
Comments
Extended version of SMC 2025 paper arXiv:2503.12419. The established dataset and source code will be publicly released at https://github.com/ZJUWang01/EgoEV-HandPose
英文摘要

Egocentric 3D hand pose estimation and gesture recognition are essential for immersive augmented/virtual reality, human-computer interaction, and robotics. However, conventional frame-based cameras suffer from motion blur and limited dynamic range, while existing event-based methods are hindered by ego-motion interference, monocular depth ambiguity, and the lack of large-scale real-world stereo datasets. To overcome these limitations, we propose EgoEV-HandPose, an end-to-end framework for joint 3D bimanual pose estimation and gesture recognition from stereo event streams. Central to our approach is KeypointBEV, a flexible stereo fusion module that lifts features into a canonical bird's-eye-view space and employs an iterative reprojection-guided refinement loop to progressively resolve depth uncertainty and enforce kinematic consistency. In addition, we introduce EgoEVHands, the first large-scale real-world stereo event-camera dataset for egocentric hand perception, containing 5,419 annotated sequences with dense 3D/2D keypoints across 38 gesture classes under varying illumination. Extensive experiments demonstrate that EgoEV-HandPose achieves state-of-the-art performance with an MPJPE of 30.54mm and 86.87% Top-1 gesture recognition accuracy, significantly outperforming RGB-based stereo and prior event-camera methods, particularly in low-light and bimanual occlusion scenarios, thereby setting a new benchmark for event-based egocentric perception. The established dataset and source code will be publicly released at https://github.com/ZJUWang01/EgoEV-HandPose.

2605.12287 2026-05-13 eess.AS cs.SD

The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

Jaehoon Ahn, Tae Gum Hwang, Moon-Ryul Jung

AI总结 近年来,基于深度神经网络的节拍跟踪模型在主流打击乐数据集上表现出色,但在SMC数据集上却始终表现不佳。本文分析了当前最先进的模型在SMC数据集中的失败模式,发现其主要问题包括八度错误、连续性错误以及整体跟踪失败,并指出这些模型容易产生“自信但错误”的激活结果。研究还揭示了标准DBN模型因默认最低节拍限制导致对21%的SMC曲目无法正确推断节拍,从而影响了整体性能,为改进节拍和强拍检测提供了具体方向。

详情
Comments
6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026
英文摘要

Over the past two decades, the task of musical beat tracking has transitioned from heuristic onset detection algorithms to highly capable deep neural networks (DNN). Although DNN-based beat tracking models achieve near-perfect performance on mainstream, percussive datasets, the SMC dataset has stubbornly yielded low F-measure scores. By testing how well state-of-the-art models detect beats on individual tracks in the SMC dataset, we identify three distinct failure modes: octave errors, continuity errors, and complete tracking failure where all metrics fall below 0.3. We reveal that state-of-the-art models tend to generate "confident-but-wrong" activations. Furthermore, we show that the standard DBN's default minimum tempo of 55 BPM prevents it from inferring the correct tempo for 21\% of SMC tracks, forcing double-tempo predictions on slow music. By exposing such fundamental oversights, we provide concrete directions for improving beat and downbeat detection, specifically emphasizing training data diversification and multi-hypothesis tempo estimation.

2605.12241 2026-05-13 eess.SP cs.AI cs.LG

Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study

M A Al-Masud, Nils Strodthoff

AI总结 本文系统研究了心电图(ECG)基础模型的预训练策略及其规模扩展,评估了五种不同的自监督学习目标,并在最多1100万条公开数据上分析了模型性能随数据量增长的变化趋势。研究发现,对比预测编码(CPC)在多种临床任务中表现出最佳的迁移能力,且随着数据量增加,大多数目标的性能仍有显著提升。此外,研究还表明结构化状态空间模型在ECG表示学习中优于Transformer和CNN模型,其强归纳偏置可能是提升模型性能的关键因素。

详情
Comments
59 pages, 16 figures, 59 Tables. Code available at https://anonymous.4open.science/r/ecg-pretraining-strategies-4DE3
英文摘要

Specialized foundation models are beginning to emerge in various medical subdomains, but pretraining methodologies and parametric scaling with the size of the pretraining dataset are rarely assessed systematically and in a like-for-like manner. This work focuses on foundation models for electrocardiography (ECG) data, one of the most widely captured physiological time series world-wide. We present a comprehensive assessment of pretraining methodologies, covering five different contrastive and non-contrastive self-supervised learning objectives for ECG foundation models, and investigate their scaling behavior with pretraining dataset sizes up to 11M input samples, exclusively from publicly available sources. Pretraining strategy has a meaningful and consistent impact on downstream performance, with contrastive predictive coding (slightly ahead of JEPA) yielding the most transferable representations across diverse clinical tasks. Scaling pretraining data continues to yield meaningful improvements up to 11M samples for most objectives. We also compare model architectures across all pretraining methodologies and find evidence for a clear superiority of structured state space models compared to transformers and CNN models. We hypothesize that the strong inductive biases of structured state space models, rather than pretraining scale alone, are the primary driver of effective ECG representation learning, with important implications for future foundation model development in this and potentially other physiological signal domains.

2605.12230 2026-05-13 eess.SY cs.SY

Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation

Hendrik Schäfke, Daniel O. M. Weber, Askar Vagapov, Christoph Schweers, Thomas Seel, Simon F. G. Ehlers

AI总结 本研究提出了一种基于神经网络的虚拟轮速传感器,旨在提升低速状态下车辆状态估计的准确性。该方法通过融合轮速和电机转速信号,有效减少了传统传感器的量化误差和延迟问题,以及电动车辆中传动系统扭矩引起的信号失真。实验结果表明,该模型在实际路测数据上表现出色,相比现有传感器和优化滤波器,误差分别降低了85%和47%,并具有良好的实时性和泛化能力。

详情
Comments
Accepted for publication in the Proceedings of the 22nd IFAC World Congress, Busan, Republic of Korea, 2026
英文摘要

Accurate wheel speed information is crucial for vehicle control and state estimation. Conventional sensors suffer from quantization and latency, especially at low velocities, while motor-speed signals in electric vehicles are distorted by drivetrain torsion. This work presents a neural-network-based virtual wheel-speed sensor that fuses wheel-speed and motor-speed signals to reduce errors from both sources. Validated on real-world Volkswagen ID.7 data, the real-time capable model achieves an error reduction of up to 85% compared to the production sensor and 47% compared to an optimized zero-phase filter, providing a smooth signal for driver-assistance functions. The results demonstrate robust generalization across diverse real-world maneuvers within the vehicle platform.

2605.12203 2026-05-13 eess.SY cs.SY

Efficient Learning of Affine and Rational Dependency LPV Models With Linear Fractional Representation

Roel Drenth, Jan H. Hoekstra, Maarten Schoukens, Roland Tóth

AI总结 本文研究了如何高效学习具有有理调度依赖性的仿射和有理依赖型线性参数时变(LPV)模型。作者提出了一种基于线性分数表示(LFR)的LPV模型结构,能够以更少的调度变量描述复杂非线性系统。通过引入直接参数化方法,并利用输入输出数据联合估计被控对象和调度映射,该方法确保了模型的良定义性,实验结果验证了其有效性。

详情
Comments
Accepted for IFAC WC 2026
英文摘要

Identifying control-friendly models of nonlinear systems remains one of the major challenges at the intersection of system identification and control. The Linear Parameter-Varying (LPV) framework offers a promising solution, but existing identification methods often rely on model structures with affine scheduling dependency. Instead, this work proposes the use of LPV models with Linear Fractional Representation (LFR) admitting a rational scheduling-dependency, capable of modelling complex nonlinear systems with fewer scheduling variables compared to affine models. This work introduces a direct parameterization to ensure well-posedness of rational LPV-LFR models, which by joint-estimation of an LPV plant and scheduling map, using only input-output data, is capable of modelling complex nonlinear systems. Accuracy of the proposed approach is shown on two simulation examples.

2605.12193 2026-05-13 eess.SP

BFLA: Block-Filtered Long-Context Attention Mechanism

Chong Wu, Zhenan Feng, Renjie Xu, Houwang Zhang, Jiawang Cao, Maolin Che, Wenbo Zhu, Hong Yan

AI总结 本文提出了一种名为BFLA的块过滤长上下文注意力机制,用于在无需重新训练或修改模型的前提下,提升长上下文推理的效率。BFLA采用两阶段设计,首先通过粗粒度块压缩和softmax质量估计生成输入相关的块重要性掩码,再将其扩展至Triton注意力网格,并结合多种策略减少信息损失,实现高效稀疏预填充。实验表明,BFLA在多个主流大语言模型上显著加速了长上下文预填充过程,且精度下降极小。

详情
Comments
14 pages, 5 tables, 1 figure
英文摘要

This paper proposes Block-Filtered Long-Context Attention (BFLA), a training-free sparse prefill attention mechanism for long-context inference. BFLA adopts a two-stage design. In Stage 1, query and key sequences are compressed into coarse blocks, and lightweight block-level softmax mass estimation is performed to construct an input-dependent block importance mask. In Stage 2, the coarse mask is expanded to the Triton attention-tile grid. Several tile-level rescue strategies are applied to reduce information loss, where a fused sparse prefill kernel skips unimportant KV tiles while preserving exact token-level attention inside every retained tile. BFLA requires no retraining, calibration, preprocessing, or model modification and can be plugged into existing vLLM-style paged-attention workloads. Experiments on Gemma 4, Llama 3.1, Qwen 3.5, and Qwen 3.6 series models show that BFLA substantially accelerates long-context prefilling with minimal accuracy degradation compared to dense Triton FlashAttention. Project website: https://github.com/Alicewithrabbit/BFLA.

2605.12192 2026-05-13 eess.SP

Slow Movable Antenna System Design Based on Cell-Specific Long-Term Angular Power Spectrum

Ge Yan, Lipeng Zhu, Wenyan Ma, Rui Zhang

AI总结 本文提出了一种基于小区级长期角度功率谱的可移动天线系统设计方法,旨在减少传统依赖短期信道状态信息带来的信道估计开销和复杂度。研究通过构建小区特定的统计信道模型,提出了一种基于协方差特征值平衡的天线位置优化方法(CEBAP),能够在长时间尺度上有效提升系统性能,如加权总速率和最小信噪比。此外,还提出了一种低复杂度的优化算法LOBPO用于求解CEBAP,并通过仿真验证了其在不同性能指标下的优越性。

详情
Comments
16 pages, 27 figures
英文摘要

Movable antenna (MA) has recently emerged as a promising paradigm for enhancing wireless communication performance by exploiting spatial degrees of freedom through flexible antenna repositioning. However, most existing designs rely on short-term user-specific instantaneous/statistical channel state information (CSI), which incurs excessive channel estimation overhead and complexity due to frequent antenna movement. To address this issue, this paper proposes a new design framework for antenna position optimization over a much longer timescale based on the cell-level statistical channel information acquired at the base station (BS). To this end, a cell-specific statistical channel model is developed for MA-aided multiuser communication systems, based on which the antenna position optimization framework for maximizing the ergodic system utility is formulated. Then, the covariance-eigenvalues-balancing antenna positions (CEBAP) design is derived to asymptotically approximate optimal solutions by statistically reducing users' channel correlation. Notably, the CEBAP solution solely depends on the BS-side angular power spectrum (APS) of wireless channels for mobile users across the cell, which significantly alleviates the overhead of channel acquisition and antenna movement, and yet remains effective for improving various system utilities over long timescales, such as weighted sum rate and minimum signal-to-interference-plus-noise ratio. Moreover, a low-complexity log-barrier penalized optimization (LOBPO) method is proposed to numerically solve the CEBAP. Simulation results based on realistic urban layouts and ray-tracing channels demonstrate consistent performance gains of the proposed CEBAP over fixed-position antenna systems across different utility functions, which closely approaches the upper bound achieved by instantaneous CSI-based MA optimization for moderately large antenna regions.

2605.12166 2026-05-13 physics.flu-dyn cs.SY eess.SY

Structured input-output analysis of oblique turbulent bands in Waleffe flow

Jino George, Chang Liu

AI总结 本研究采用结构化输入输出分析(SIOA)方法,对Waleffe流中的斜向湍流带进行分析。通过引入结构化不确定性,该方法能够捕捉纳维-斯托克斯方程中非线性项的分量结构,并利用结构奇异值量化流动响应。研究识别了大域直接数值模拟中观察到的斜向湍流带的波长和倾斜角,并发现其响应随雷诺数的变化关系约为 $Re^{1.7}$。

详情
Comments
2 pages, 3 figures, accepted to LSU Symposium on Control, Learning, and Intelligent Systems 2026
英文摘要

This work employs structured input-output analysis (SIOA) to study Waleffe flow. The SIOA framework employs structured uncertainty to include the componentwise structure of nonlinearity in Navier-Stokes equations, and SIOA quantifies the flow response using structured singular values. The structured input-output analysis identifies the wavelength and inclination angle of oblique turbulent bands observed in large-domain direct numerical simulations. The structured input-output response scales over Reynolds number as $\sim Re^{1.7}$.

2605.12164 2026-05-13 eess.IV physics.med-ph

A Comparative Analysis of CT Degradation for LDCT Nodule Classification using Radiomics

Jiaying Liu, Anna Corti, Valentina D. A. Corino, Luca Mainardi

AI总结 该研究探讨了如何通过降级标准剂量CT(SDCT)图像生成低剂量CT(LDCT)样本来提升肺结节分类模型的性能。研究比较了三种图像降级方法,发现基于CycleGAN的方法在图像分布对齐和分类任务表现上最优。实验表明,使用生成的LDCT样本来训练分类模型,能够有效提升模型在真实LDCT数据上的泛化能力,为低剂量CT肺结节筛查提供了可行的数据增强策略。

详情
英文摘要

Low-dose computed tomography (LDCT) is the standard modality for lung cancer screening, known for its low radiation dose but high noise levels. While existing literature focuses on denoising LDCT images, comparative research on simulating LDCT characteristics to directly use these images for model development is lacking. This study shifts the focus from denoising images to degrading available standard-dose CT (SDCT) data, generating synthetic images for data augmentation to train classifiers for screening-detected nodules. We compare three degradation methods: (1) a sinogram domain statistical noise insertion; (2) replicate a validated physics-based simulation using Pix2Pix; and (3) unpaired CycleGAN. The generated images were utilized to simulate LDCT screening scenario replacing 695 SDCT cases from the LIDC-IDRI dataset, from which radiomic features were extracted to train machine learning models for lung nodule classification. Regarding image quality, CycleGAN achieved the best Fréchet inception distance (0.1734) and kernel inception distance (0.0813; 0.1002) scores, indicating distributional alignment with the target low-dose domain. In the nodule classification task, results confirmed the necessity of domain adaptation since a baseline model trained on non-degraded SDCT data failed to generalize to the real LDCT set (AUC 0.789) with a low sensitivity (0.571). Degraded images generated using CycleGAN approach led to the most balanced performance on the classification task using Adam Booster classifier, achieving an AUC of 0.861, sensitivity of 0.743 and specificity of 0.858 in the independent test. Our findings confirm that generating synthetic LDCT data from standard-dose scans is a viable strategy for training robust nodule classifiers for screening detected nodules.

2605.12155 2026-05-13 quant-ph cs.SY eess.SY

Optimal State Preparation for Impulse Estimation in Gaussian Quantum Systems

Kaspar Schmerling, Andreas Kugi, Andreas Deutschmann-Olek

AI总结 本文研究了在连续监测的线性经典和量子系统中,如何通过非平衡态优化脉冲扰动的估计精度。作者提出了一种基于最优控制的方法,通过时间依赖的参数调制动态调整估计协方差,从而在已知脉冲时刻最大化信息增益。该方法在纳米机械谐振器和悬浮纳米颗粒等系统中应用,相比稳态操作,可将估计方差降低至一半,显著优于传统周期调制方法。

详情
Comments
Accepted for presentation at IFAC-Worldcongress 2026
英文摘要

We present an optimal control-based strategy to enhance the estimation of impulse-like disturbances in continuously monitored linear classical and quantum systems by exploiting non-equilibrium states. Using optimal estimation techniques for linear Gaussian systems to collect information from the temporal vicinity of the disturbance, we cast the minimization of disturbance estimation uncertainty as a nonlinear optimal control problem over time-dependent system parameters. The resulting method dynamically shapes the estimation covariances through parametric modulation, maximizing information gain at a known impulse time. This differs fundamentally from conventional squeezing protocols using periodic modulation that effectively degrade inference of impulse-like disturbances. Applied to nanomechanical resonators and levitated nanoparticles, optimal parametric driving reduces estimation variance by up to a factor of two relative to steady-state operation

2605.12135 2026-05-13 cs.SD cs.LG eess.AS

STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts

Joshua Opria

AI总结 本文提出STRUM模型,一种无需任何人工标注元数据即可将原始音频转换为可玩的节奏游戏图表(如Clone Hero和YARG)的端到端系统,支持鼓、吉他、贝斯、人声和键盘等乐器。STRUM采用多阶段混合方法,结合卷积循环神经网络(CRNN)进行鼓声起始检测、神经网络进行吉他和贝斯的单音音高跟踪、词对齐的语音识别处理人声,并利用频谱分析检测键盘音符。实验在基于音频质量筛选的30首歌曲数据集上进行,取得了较高的F1分数,并对模型组件进行了全面消融分析。

详情
Comments
9 pages, 4 figures, 3 tables. Code and models: https://github.com/<your-github-username>/autocharter
英文摘要

We present STRUM (Spectral Transcription and Rhythm Understanding Model), an audio-to-chart pipeline that converts raw recordings into playable Clone Hero / YARG charts for drums, guitar, bass, vocals, and keys without any oracle metadata. STRUM is a multi-stage hybrid: a two-stage CRNN onset detector and a six-model ensemble classifier for drums; neural onset detectors with monophonic pitch tracking for guitar and bass; word-aligned ASR for vocals; and spectral keyboard detection for keys. We evaluate on a 30-song in-envelope benchmark constructed by screening candidate songs on a single audio-quality criterion -- the median 1-second drum-stem RMS after htdemucs_6s source separation. On this benchmark STRUM achieves drums onset F1 = 0.838, bass F1 = 0.694, guitar F1 = 0.651, and vocals F1 = 0.539 at a +/- 100 ms tolerance with per-song global offset search. We report a complete ablation of seven drum-pipeline components with paired per-song Wilcoxon tests, an analysis of ground-truth-to-audio timing distributions in community Clone Hero charts, and a per-class confusion matrix for the drum classifier. Code, model weights, and the full benchmark manifest are released.

2605.12107 2026-05-13 eess.AS

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

Danilo de Oliveira, Tal Peer, Timo Gerkmann

AI总结 本文研究了现代自动语音识别(ASR)系统在评估语音增强(SE)性能中的有效性。通过听觉实验发现,经过大规模噪声训练并嵌入语言模型的现代ASR系统,其词错误率(WER)与人类识别结果的相关性更高,其中转录模型表现最为可靠。然而,这些模型对噪声的鲁棒性和上下文使用能力在评估语音增强的声学性能时可能并不具有参考价值。

详情
英文摘要

Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate (WER). However, WER scores depend heavily on the choice of ASR system and text normalization pipeline. In this paper, we investigate how modern ASR models correlate with human recognition of enhanced speech. A listening experiment reveals that modern ASR models with large-scale noisy training and embedded language models correlate more with human WER than simpler ones, with a transducer model providing the most reliable transcriptions. Nevertheless, we also show that these models' robustness to noise and use of context can be uninformative to an acoustics-focused evaluation of enhancement performance.

2605.12086 2026-05-13 eess.SP cs.IT math.IT

Low-Complexity Blind SNR Estimator for mmWave Multi-Antenna Communications

Hanyoung Park, Homin Jang, Ji-Woong Choi

AI总结 本文提出了一种低复杂度的盲估计算法,用于毫米波大规模多天线上行通信系统中的平均噪声功率、平均信号功率和信噪比(SNR)。该方法仅需单个接收信号样本,无需导频信号、迭代优化或先验信号知识,通过利用毫米波信道在波束域中的稀疏特性,结合排序与有限差分准则识别噪声主导成分,并基于高斯噪声的顺序统计特性实现信噪分离。该算法计算复杂度低,适合实时实现,并在硬件平台上验证了其低延迟和硬件资源高效利用的特性。仿真结果表明,该估计算法在估计精度上优于现有的单样本方法。

详情
Comments
Submitted to a journal
英文摘要

In this paper, we propose a low-complexity blind estimator for the average noise power, average signal power, and signal-to-noise ratio (SNR) in millimeter-wave (mmWave) massive multi-antenna uplink systems. In particular, the proposed method is designed to operate using only a single received signal sample, without relying on pilot signals, iterative optimization, or multiple observations, and without requiring prior knowledge of the transmitted signal. By exploiting the inherent sparsity of mmWave channels in the beamspace domain, the estimator identifies noise-dominant components through a sorting-based procedure combined with a finite-difference criterion. This separation is further supported by the order statistics of noise power under Gaussian assumptions, enabling statistically grounded discrimination between signal and noise elements. The average noise power is estimated from the identified noise-only components, and the signal power and SNR are subsequently obtained through simple arithmetic operations. The proposed algorithm achieves low computational complexity and is well-suited for real-time implementation. To demonstrate its practical feasibility, a hardware-efficient very large-scale integration (VLSI) architecture is developed and implemented on a AMD-Xilinx Kintex UltraScale+ KCU116 Evaluation Kit, with corresponding field-programmable gate array (FPGA) results provided. The implementation exhibits low latency and sublinear scaling of hardware resource utilization with respect to the number of antennas, and enables parameter estimation within a duration shorter than a single symbol of conventional wireless systems. Simulation results verify that the proposed estimator achieves high estimation accuracy compared to existing single-sample-based methods.

2605.12084 2026-05-13 cs.RO cs.AI cs.IT cs.LG cs.SY eess.SY math.IT

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

Youwei Yu, Jionghao Wang, Zhengming Yu, Wenping Wang, Lantao Liu

AI总结 本文研究了如何为机器人探索任务设计可学习的信息论目标函数,以更有效地减少模型参数的不确定性。作者提出了一种基于最优实验设计的自适应信息目标——准最优实验设计(QOED),通过分析费舍尔信息矩阵的特征空间,识别可观察的参数方向并抑制无关参数的干扰,从而优化探索策略。实验表明,该方法在导航和操作任务中显著提升了探索效率和策略性能。

详情
英文摘要

Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of \SI{35.23}{\percent} and \SI{21.98}{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.

2605.12071 2026-05-13 cs.RO cs.SY eess.SY

Control of Fully Actuated Aerial Vehicles: A Comparison of Model-based and Sensor-based Dynamic Inversion

Ali Sidar Yilmaz, Buday Turan, Lukas Pries, Markus Ryll

AI总结 本文比较了基于模型的几何非线性动态逆控制器(geometric NDI)与基于传感器的增量动态逆控制器(INDI)在固定倾角六旋翼飞行器上的控制性能。研究通过多个实验评估了两种控制器在参数偏差、风扰、传感器退化等不同条件下的表现,发现INDI在参数不匹配和传感器退化情况下具有明显优势,而几何NDI在控制频率降低时表现出更优的姿态跟踪能力。该工作首次对具有解耦平动和转动动力学的完整姿态跟踪INDI控制器进行了实验验证,揭示了基于测量与基于模型的动态逆方法在鲁棒控制与快速部署之间的权衡。

详情
英文摘要

Fully actuated multirotor platforms decouple translational force generation from vehicle attitude, enabling independent control of position and orientation and shifting performance limitations from attitude authority to actuator dynamics and control effectiveness. This paper compares a model-based nonlinear dynamic inversion controller (geometric NDI) with a sensor-based incremental dynamic inversion controller (INDI) on a fixed-tilt fully actuated hexarotor. Both controllers share an identical outer-loop structure and are both executed at 500 Hz; therefore, performance differences can be attributed primarily to the inversion strategy. Controller performance is evaluated in five experiments covering attitude step tracking under nominal conditions and under a 50% mismatch in the rotor force coefficient, hover disturbance rejection under an external lateral load, waypoint tracking in the presence of wind gust disturbances, reduced control frequency, and injected sensor degradation. The results show that INDI offers clear advantages under parameter mismatch, gust disturbances, and sensor degradation, and maintains lower position errors across the controller-frequency sweep. However, its advantages are not universal: geometric NDI yields better attitude tracking at reduced control frequencies. To the authors' best knowledge, this work presents the first experimental validation of a full pose tracking INDI controller with decoupled translational and rotational dynamics. These findings highlight the trade-off between measurement-based and model-based inversion for robust control and rapid deployment of fully actuated UAVs.

2605.12036 2026-05-13 eess.AS

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Guojian Li, Zhixian Zhao, Zhennan Lin, Jingbin Hu, Qirui Zhan, Yuang Cao, Pengyuan Xie, Chuan Xie, Jie Liu, Qiang Zhang, Zhonghua Fu, Lei Xie

AI总结 当前语音大语言模型在基础语音识别任务上表现优异,但在细粒度、多维语音感知方面存在明显不足,难以准确解析微声学线索、声学场景和副语言信号等复杂特征。为解决这一问题,本文提出了一个鲁棒的数据处理流程,构建了涵盖14个语音属性维度的FMSU-Bench基准测试集,并设计了基于解耦属性建模和渐进式课程微调框架的FM-Speech模型,显著提升了模型对多维细粒度语音特征的理解能力。

详情
英文摘要

While speech Large Language Models (LLMs) excel at conventional tasks like basic speech recognition, they lack fine-grained, multi-dimensional perception. This deficiency is evident in their struggle to disentangle complex features like micro-acoustic cues, acoustic scenes, and paralinguistic signals. This resulting incomplete comprehension of real-world speech fundamentally bottlenecks the development of perceptive and empathetic next-generation speech systems. At its core, this persistent perceptual limitation primarily stems from three interacting factors: scarce high-quality expressive data, absent fine-grained modeling for multi-dimensional attributes, and reliance on restricted coverage, coarse-grained benchmarks. We address these challenges through three pillars: First, our robust data curation pipeline resolves complex acoustic environments and long-audio timestamp alignment challenges to extract a high-quality spontaneous speech corpus from audiovisual sources. Second, we construct FMSU-Bench, a pioneering benchmark covering 14 speech attribute dimensions to rigorously assess the fine-grained, multi-dimensional speech understanding capabilities of current models. Third, empowered by our curated corpus, we introduce FM-Speech. Driven by a decoupled attribute modeling and progressive curriculum fine-tuning framework, it substantially elevates fine-grained, multi-dimensional acoustic perception. Extensive evaluations on FMSU-Bench reveal that current speech LLMs still require significant improvement in multi-dimensional, fine-grained understanding. In contrast, FM-Speech substantially outperforms current open-source models, establishing a robust paradigm for real-world speech understanding.

2605.12026 2026-05-13 cs.CV cs.AI eess.SP

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Alexandra G. Roberts, Maneesh John, Jinwei Zhang, Dominick Romano, Mert Sisman, Ki Sueng Choi, Heejong Kim, Mert R. Sabuncu, Thanh D. Nguyen, Alexey V. Dimov, Pascal Spincemaille, Brian H. Kopell, Yi Wang

AI总结 本文提出了一种新型的光谱视觉变换器架构,旨在在数据量有限的情况下实现高效的图像分块处理,特别关注医学影像应用。该方法利用光谱基函数的选择带来了空间不变性和最优信噪比等理论优势,并通过光谱投影降低了模型复杂度。实验表明,与多种主流模型相比,该方法在参数更少的情况下仍能取得相当甚至更优的性能,适用于多种类型的数据集。

详情
英文摘要

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial invariance and optimal signal-to-noise ratio. We show reduced complexity arising from the spectral projection compared to spatial vision transformers. We show equitable or superior performance with a reduced number of parameters as compared to a variety of models including compact and standard vision transformers, convolutional neural networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression. We include simulated, public, and clinical data in our analysis and release our code at: \verb+github.com/agr78/spectralViT+.

2605.11992 2026-05-13 cs.LO cs.FL cs.SY eess.SY

sweap: Reactive Synthesis for Infinite-State Integer Problems

Shaun Azzopardi, Luca Di Stefano, Nir Piterman

AI总结 本文介绍了一种名为 **sweap** 的工具,用于从规范中合成无限状态整数反应系统。该工具采用 CEGAR 方法,利用先进的有限状态合成工具解决抽象合成问题,并支持多种输入形式,如时序流逻辑模理论、反应程序博弈等。sweap 引入了双抽象方法、非确定性和无界更新支持等新特性,在不可实现性证明和优化方面表现出色,实验表明其性能优于当前领域内唯一竞争对手。

详情
Comments
to be published in proceedings of CAV 2026
英文摘要

Recent years have seen a significant increase in the interest in reactive synthesis from specifications that relate to infinite state spaces. We present sweap, a tool for synthesis of infinite-state Linear Integer Arithmetic reactive systems. sweap implements a CEGAR approach, relying on state-of-the-art finite-state synthesis tools as black boxes to solve abstract synthesis problems. sweap supports most common input formalisms for infinite-state reactive-synthesis problems: Temporal Stream Logic Modulo Theories, Reactive Program Games, the bespoke input of the ISSY tool, and our own bespoke input. We present a mature version of sweap with novel features: a dual abstraction approach that improves its capabilities in proving unrealisability, support for nondeterministic and unbounded updates, more general initialization of variables, and equirealisable reductions for optimisation. Experimental evaluation shows that sweap outperforms its only competitor in this domain.

2605.11972 2026-05-13 cs.RO cs.AI cs.ET cs.SY eess.SY

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

Mohammad Khoshkdahan, John Pravin Arockiasamy, Andy Flores Comeca, Alexey Vinel

AI总结 该研究针对非视线交叉路口的碰撞问题,提出了一种结合集体感知与协作机器人的交通调控系统。系统通过双摄像头和V2X技术融合感知信息,实时监测道路环境,并由协作机器人在检测到潜在碰撞风险时发出停止手势,阻止车辆违规合并。实验表明,该方法能有效提升非视线条件下的交通安全,填补了现有V2X技术在未连接车辆中的感知与干预空白。

详情
Comments
Accepted for publication in the Proceedings of the 2026 IEEE Vehicular Technology Conference (VTC2026-Spring)
英文摘要

Collisions at non-line-of-sight (NLOS) intersections remain a major safety concern because drivers have limited visibility of approaching traffic. V2X based warnings can reduce these risks, yet many vehicles are not equipped with V2X and drivers may ignore in vehicle alerts. Collective perception (CP) can compensate for low V2X penetration by extending the awareness of connected vehicles, but it cannot influence unconnected vehicles. To fill this gap, our work introduces a complementary concept that adds a cooperative humanoid robot as an active traffic moderator capable of physically stopping a vehicle that attempts to merge into an unseen traffic stream. The system operates on two parallel perception pathways. A dual camera infrastructure unit detects the position, speed and motion of approaching vehicles and transmits this information to the robot as a collective perception message (CPM). The robot also receives cooperative awareness messages (CAM) from connected vehicles through its onboard V2X unit and can act as a relay for decentralized environmental notification messages (DENM) when safety events originate elsewhere along the road. A fusion module combines these streams to maintain a robust real time view of the main road. A Zone of Danger (ZoD) is defined and used to predict whether an approaching vehicle creates a collision risk for a merging road user. When such a risk is detected, the robot issues a human-like STOP gesture and blocks the merging path until the hazard disappears. The full system was deployed at the Future Mobility Park (FMP) in Rotterdam. Experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions.

2605.11957 2026-05-13 eess.SY cs.SY

KIND: A Kalman-Inspired Adaptive Estimator for SRF Cavity Detuning

Andrei Maalberg, Axel Neumann, Pablo Echevarria, Andriy Ushakov, Jens Knobloch

AI总结 本文提出了一种受卡尔曼滤波启发的自适应估计器KIND,用于估计超导射频腔体的调谐偏移问题。该方法结合动态模态分解模型捕捉稳态模态行为,并采用基于Transformer的预测器处理暂态动态,同时输出学习到的不确定性信号以支持异常检测。实验表明,KIND在实际腔体数据上的表现优于传统卡尔曼滤波,为未来基于预测的不确定性感知控制提供了新思路。

详情
Comments
Accepted for publication at the 2026 IFAC World Congress (IFAC 2026). 6 pages
英文摘要

Superconducting radio frequency cavities with a high quality factor enable energy-efficient accelerator operation but are very sensitive to mechanical disturbances that detune their resonance. Accurate detuning estimation is therefore essential for efficient resonance control and stable beam conditions. This paper introduces Kalman-Inspired Neural Decomposition (KIND), a data-driven estimator that fuses a Dynamic Mode Decomposition model for stationary modal behavior with a Transformer-based predictor for transient dynamics. KIND further outputs learned uncertainty signals that indicate regime changes, enabling anomaly detection. Using operational cavity data, we compare KIND with a classical Kalman filtering baseline and discuss its potential as a foundation for future uncertainty-aware, forecast-based control.

2605.11937 2026-05-13 eess.SP

Adaptive RSMA-OMA for Resilient MIMO Networks Under Imperfect CSI and SIC

Sayanti Ghosh, Indrakshi Dey, Nicola Marchetti

AI总结 本文研究了在存在空间相关性、不完美信道状态信息(CSI)和残余干扰抑制(SIC)误差等实际干扰下,下行多输入多输出(MIMO)网络中速率分割多址(RSMA)系统的功率控制问题。提出了一种新颖的退化感知框架,通过自适应调整公共流和私有流之间的功率分配,确保在CSI不确定和SIC不完美条件下的最优性能,并引入了RSMA与正交多址(OMA)之间的动态切换机制,以保障系统可行性与鲁棒性。实验结果表明,该方法显著提升了功率效率,降低了中断概率,增强了系统整体鲁棒性,为现实CSI和SIC条件下的现代无线网络提供了可行且高效的RSMA解决方案。

详情
Comments
13 pages, 10 figures, submitted to IEEE Transactions on Cognitive Communications and Networking, 2026
英文摘要

This paper addresses the challenge of power control in Rate-Splitting Multiple Access (RSMA) systems for downlink Multi-Input Multi-Output (MIMO) networks under practical impairments such as spatial correlation, imperfect Channel State Information (CSI), and residual Successive Interference Cancellation (SIC) errors. We propose a novel degeneracyaware framework that adaptively adjusts the power allocation between the common and private streams, ensuring optimal performance despite CSI uncertainty and imperfect SIC. Our approach incorporates a dynamic switching mechanism between RSMA and Orthogonal Multiple Access (OMA) to maintain system feasibility and resilience in the face of these impairments. Extensive analytical and simulation results demonstrate that the proposed framework significantly enhances power efficiency, mitigates outage probability, and improves overall system robustness, making RSMA a viable and efficient solution for modern wireless networks with realistic CSI and SIC conditions.

2605.11917 2026-05-13 eess.SP

Spatial Power Estimation via Riemannian Covariance Matching

Or Cohen, Alon Amar, Ronen Talmon

AI总结 本文提出了一种基于黎曼几何的新型空间功率谱估计方法,用于阵列信号处理。该方法利用Hermitian正定矩阵的黎曼流形结构,提出了一种名为SERCOM的协方差匹配算法,采用Jensen-Bregman LogDet(JBLD)散度作为度量,避免了传统方法中忽略协方差矩阵流形结构的问题。实验表明,SERCOM在低信噪比、快拍数少和源信号相关等挑战性场景下,优于现有方法,在到达方向和功率估计方面表现更优。

详情
英文摘要

We propose a new method for spatial power spectrum estimation in array processing that leverages the Riemannian geometry of Hermitian positive definite (HPD) matrices. We show that conventional approaches minimize variants of the Euclidean distance between the sample covariance matrix and a model covariance matrix, without considering the fact that covariance matrices lie on the Riemannian manifold of HPD matrices. By exploiting this manifold, we present a Riemannian-aware covariance matching algorithm, termed SERCOM, using the Jensen-Bregman LogDet (JBLD) divergence, which, unlike other Riemannian distances, can be evaluated efficiently without eigen-decomposition. We theoretically compare the JBLD divergence to other Euclidean- and Riemannian-based distances, demonstrating robustness to spectral distortions. Experimental results demonstrate that SERCOM consistently outperforms existing methods in direction-of-arrival (DOA) and power estimation, particularly in challenging scenarios with low SNR, limited number of snapshots, and correlated sources.

2605.11915 2026-05-13 eess.SP

ISAC for AI: A Trade-off Framework Across Data Acquisition and Transfer in Federated Learning

Lai Jiang, Kaitao Meng, Murat Temiz, Christos Masouros

AI总结 本文提出了一种用于集成感知与通信(ISAC)系统中的联邦学习(FL)资源分配框架,同时考虑了通信中的模型传输可靠性与感知中的数据采集质量。不同于以往假设训练数据已预收集或仅设定固定信噪比阈值的方法,本文明确刻画了感知信噪比、数据集大小与上传可靠性之间的关系,并基于共享能量预算对感知与通信资源进行联合分配。通过推导收敛上界并设计两层优化算法,实现了在能量约束下提升联邦学习性能的系统性优化。

详情
英文摘要

In this paper, we propose a resource allocation framework for federated learning (FL) in integrated sensing and communication (ISAC) systems, where we consider not only the reliability of model transfer through communication, but also the quality of data acquisition through sensing in the first place. Unlike existing works that assume training data is pre-collected or only impose a fixed sensing signal-to-noise ratio (SNR) threshold to reflect data quality, we explicitly characterize the relationship between sensing data quality (measured by sensing SNR), dataset size, and the upload reliability in FL training, and exploit this relationship to allocate resources between sensing and communication under a shared energy budget. This is non-trivial due to the intricate coupling among sensing data quality, transmission reliability, and communication resource allocation; nevertheless, it enables a principled joint optimization framework that directly enhances learning performance. Specifically, we derive a closed-form convergence upper bound that quantifies the joint impact of these factors on the FL optimality gap. Utilizing this upper bound, the original intractable optimization problem can be reformulated into a tractable resource allocation problem that jointly optimizes the sensing transmit power, number of sensing snapshots, and communication transmit power at each device subject to individual energy budget constraints. To solve the reformulated problem, we propose a two-layer optimization algorithm with linear complexity, where the outer layer employs golden section search and the inner layer solves per-device subproblems with closed-form solutions.

2605.11875 2026-05-13 eess.SP cs.AI

Modulation Consistency-based Contrastive Learning for Self-Supervised Automatic Modulation Classification

Chenxu Wang, Shuang Wang, Lirong Han, Xinyu Hu, Hanlin Mo, Hantong Xing, Licheng Jiao

AI总结 本文针对自动调制分类(AMC)任务中自监督学习方法依赖任务无关预训练目标、导致表征受干扰因素影响的问题,提出了一种基于调制一致性的对比学习框架Mod-CL。该方法利用同一信号不同时间片段之间调制类型一致但波形不同的特性,构建正样本对以学习共享的调制信息并抑制干扰因素。实验表明,Mod-CL在多个RadioML数据集上显著优于现有方法,尤其在标签稀缺场景下表现出色。

详情
英文摘要

Deep learning-based AMC methods have achieved remarkable performance, but their practical deployment remains constrained by the high cost of labeled data. Although self-supervised learning (SSL) reduces the reliance on labels, existing SSL-based AMC methods often rely on task-agnostic pretext objectives misaligned with modulation classification, leading to representations entangled with nuisance factors such as symbol, channel, and noise. In this paper, we identify intra-instance modulation consistency as a task-aware structural prior, whereby different temporal segments of the same signal may differ in waveform while preserving the same modulation type, thus providing a principled cue for task-aligned self-supervision. Based on this prior, we propose Mod-CL, a Modulation consistency-based Contrastive Learning framework that constructs positive pairs from different temporal segments of the same signal instance, to encourage the model to learn shared modulation information while suppressing nuisance variations. We further develop a contrastive objective tailored to Mod-CL, which jointly exploits temporal segmentation and data augmentation to pull together views sharing the same modulation semantics while avoiding supervisory conflicts within each signal instance. Extensive experiments on RadioML datasets show that Mod-CL consistently outperforms strong baselines, especially in low-label regimes, achieving substantial improvements in linear probing accuracy.

2605.11863 2026-05-13 cs.CV eess.IV

GATA2Floor: Graph attention for floor counting in street-view facades

Ngoc Tan Le, Tzoulio Chamiti, Eirini Papagiannopoulou, Nikos Deligiannis

AI总结 本文研究如何从街景立面图像中自动分析建筑的楼层数量,提出了一个基于图注意力机制的模型GATA2Floor。该方法将建筑立面建模为包含窗户和门的图结构,并引入多头图注意力网络来预测楼层数,同时通过可学习的跨注意力查询将元素分配到潜在的楼层槽位,从而获得可解释且鲁棒的结果。为了解决数据标注不足的问题,作者还提出了一种无需标注的轻量级提案机制,利用自监督特征和视觉-语言评分实现无监督学习,展示了图注意力关系推理在立面理解中的有效性。

详情
Comments
Accepted at IEEE ICIP 2026; 6 pages, 5 figures, 3 tables
英文摘要

Automated analysis of building facades from street-level imagery has great potential for urban analytics, energy assessment, and emergency planning. However, it requires reasoning over spatially arranged elements rather than solely isolated detections. In this work, we model each facade as a graph over window/door detections with a vertical prior on edges. Additionally, we introduce GATA2Floor, a multi-head Graph Attention v2 (GATv2) based model that predicts the global floor count of a building and, via learnable cross-attention queries, softly assigns elements to latent floor slots, yielding interpretable outputs and robustness to irregular designs. To mitigate the lack of labeled datasets, we demonstrate that the proposed graph-based reasoning can be applied without annotations by leveraging a lightweight label-free proposal mechanism based on self-supervised features and vision-language scoring. Our approach demonstrates the value of graph-attention-based relational reasoning for facade understanding.