arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.11030 2026-03-12 eess.SP

Exploiting Spatial Modulation for Strong PhaseNoise Mitigation in mmWave Massive MIMO

Oshin Daoud, Haifa Fares, Amor Nafkha, Yahia Medjahdi, Laurent Clavier

详情
英文摘要

This letter investigates phase noise (PN) mitigation in generalized receiver spatial modulation (GRSM) massive MIMO systems at mmWave under a common local oscillator (CLO). Under CLO, the received energy remains invariant relative to the no-PN scenario, enabling reliable energy-based spatial detection using the no-PN threshold. PN-sensitivity and geometry-based metrics are introduced to design compact, PN-resilient MQAM symbol pools with low detection complexity. PN robustness is further improved through an enhanced PN-aware GRSM-MQAM system that exploits spatial modulation (SM) to recover part of the MQAM bits and strategically maps spatial-pattern Hamming weights to reduce the effective PN impact. In addition, a practical single-stage PN estimation/compensation architecture is proposed, while a benchmark double-stage compensation is adopted to quantify the upper bound achievable via separate Tx/Rx PN mitigation. Results show that under PN, the overall BER is mainly dominated by MQAM symbol detection errors, especially for denser constellations, whereas spatial detection remains robust. The proposed single-stage compensation improves PN resilience, while the benchmark double-stage compensation approaches near PN-free performance.

2603.10958 2026-03-12 eess.SP

Distortion Is Not Noise: On the Limits of the Kappa Model for Monostatic ISAC

Haofan Dong, Ozgur B. Akan

详情
英文摘要

Monostatic ISAC sensing differs from communication because the transmitter can monitor its distorted transmit waveform. Thus, the aggregate $κ$ distortion model, which treats impairments as unknown noise, is appropriate for communication but pessimistic for monostatic sensing. We derive PA-aware sensing Cramér--Rao bounds (CRBs) and a PN-aware CRB that reveals an irreducible velocity-error floor, and quantify when $κ$-based bounds overestimate sensing degradation. Simulations validate the analysis and show robustness to practical DPD template errors (less than 1~dB overhead at a typical $-25$~dB NMSE).

2603.10947 2026-03-12 eess.IV

Regularizing INR with diffusion prior self-supervised 3D reconstruction of neutron computed tomography data

Maliha Hossain, Haley Duba-Sullivan, Amirkoushyar Ziabari

详情
英文摘要

Recently, generative diffusion priors have made huge strides as inverse problem solvers, including the ability to be adapted for inference on out-of-distribution data. Concurrently, implicit neural representations (INRs) have emerged as fast and lightweight inverse imaging solvers that are amenable to hybrid approaches that combine learned priors with traditional inverse problem formulations. In this paper, we present a diffusive computed tomography (CT) inversion framework for regularizing INRs called Diffusive INR (DINR), designed to enable high-quality reconstruction from sparse-view neutron CT. Pretrained purely on synthetic data, DINR is evaluated on simulated and experimentally obtained observations of concrete microstructures, where traditional reconstruction methods suffer substantial degradation when the number of views is reduced. Our approach delivers superior performance, reduces reconstruction artifacts, and achieves gains in PSNR and SSIM, enabling accurate micro-structural characterization even under extreme data limitations compared to state-of-the-art sparse-view reconstruction techniques.

2602.17929 2026-03-12 cs.CV cs.LG eess.IV

ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

Athanasios Angelakis

Comments 24 pages, 15 figures, 5 tables. Code and models available at https://github.com/Bluesman79/ZACH-ViT

详情
英文摘要

Vision Transformers rely on positional embeddings and class tokens encoding fixed spatial priors. While effective for natural images, these priors may be suboptimal when spatial layout is weakly informative, a frequent condition in medical imaging. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a compact Vision Transformer that removes positional embeddings and the [CLS] token, achieving permutation-invariant patch processing via global average pooling. Zero-token denotes removal of the dedicated aggregation token and positional encodings. Patch tokens remain unchanged. Adaptive residual projections preserve training stability under strict parameter constraints. We evaluate ZACH-ViT across seven MedMNIST datasets under a strict few-shot protocol (50 samples/class, fixed hyperparameters, five seeds). Results reveal regime-dependent behavior: ZACH-ViT (0.25M parameters, trained from scratch) achieves strongest advantage on BloodMNIST and remains competitive on PathMNIST, while relative advantage decreases on datasets with stronger anatomical priors (OCTMNIST, OrganAMNIST), consistent with our hypothesis. Component and pooling ablations show positional support becomes mildly beneficial as spatial structure increases, whereas reintroducing a [CLS] token is consistently unfavorable. These findings support that architectural alignment with data structure can outweigh universal benchmark dominance. Despite minimal size and no pretraining, ZACH-ViT achieves competitive performance under data-scarce conditions, relevant for compact medical imaging and low-resource settings. Code: https://github.com/Bluesman79/ZACH-ViT

2601.03410 2026-03-12 cs.LG cs.CV eess.IV

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

Abdul Rehman Akbar, Alejandro Levya, Ashwini Esnakula, Elshad Hasanov, Anne Noonan, Lingbin Meng, Susan Tsai, Vaibhav Sahai, Midhun Malla, Sarbajit Mukherjee, Upender Manne, Anil Parwani, Wei Chen, Ashish Manne, Muhammad Khalid Khan Niazi

详情
英文摘要

Molecular subtyping of PDAC into basal-like and classical has established prognostic and predictive value. However, its use in clinical practice is limited by cost, turnaround time, and tissue requirements, thereby restricting its application in the management of PDAC. We introduce PanSubNet, an interpretable deep learning framework that predicts therapy-relevant molecular subtypes directly from standard H&E-stained WSIs. PanSubNet was developed using data from 1,055 patients across two multi-institutional cohorts (PANCAN, n=846; TCGA, n=209) with paired histology and RNA-seq data. Ground-truth labels were derived using the validated Moffitt 50-gene signature refined by GATA6 expression. The model employs dual-scale architecture that fuses cellular-level morphology with tissue-level architecture, leveraging attention mechanisms for multi-scale representation learning and transparent feature attribution. On internal validation within PANCAN using five-fold cross-validation, PanSubNet achieved mean AUC of 88.5% with balanced sensitivity and specificity. External validation on the independent TCGA cohort without fine-tuning demonstrated robust generalizability (AUC 84.0%). PanSubNet preserved and, in metastatic disease, strengthened prognostic stratification compared to RNA-seq based labels. Prediction uncertainty linked to intermediate transcriptional states, not classification noise. Model predictions are aligned with established transcriptomic programs, differentiation markers, and DNA damage repair signatures. By enabling rapid, cost-effective molecular stratification from routine H&E-stained slides, PanSubNet offers a clinically deployable and interpretable tool for genetic subtyping. We are gathering data from two institutions to validate and assess real-world performance, supporting integration into digital pathology workflows and advancing precision oncology for PDAC.

2510.00676 2026-03-12 eess.SY cs.SY

Formation Control via Rotation Symmetry Constraints

Zamir Martinez, Daniel Zelazo

详情
英文摘要

This work introduces a distributed formation control strategy for multi-agent systems based solely on rotation symmetry constraints. We propose a potential function that enforces inter-agent \textbf{rotational} symmetries, whose gradient defines a control law that drives the agents toward a desired planar symmetric configuration. We show that only $n-1$ edges (the minimal connectivity requirement) are sufficient to implement the strategy, where $n$ is the number of agents. We further augment the design to address the \textbf{maneuvering problem}, enabling the formation to undergo coordinated translations, rotations, and scaling along a predefined virtual trajectory. Simulation examples are provided to validate the effectiveness of the proposed method.

2503.11627 2026-03-12 cs.SD cs.LG eess.AS

Are Deep Speech Denoising Models Robust to Adversarial Noise?

Will Schwarzer, Neel Chaudhari, Philip S. Thomas, Andrea Fanelli, Xiaoyu Liu

Comments 22 pages, 14 figures. Related conference version accepted to ICLR 2026: see https://openreview.net/forum?id=WtH2JxKJKf

详情
英文摘要

Deep noise suppression (DNS) models enjoy widespread use throughout a variety of high-stakes speech applications. However, we show that four recent DNS models can each be reduced to outputting unintelligible gibberish through the addition of psychoacoustically hidden adversarial noise, even in low-background-noise and simulated over-the-air settings. For three of the models, a small transcription study with audio and multimedia experts confirms unintelligibility of the attacked audio; simultaneously, an ABX study shows that the adversarial noise is generally imperceptible, with some variance between participants and samples. While we also establish several negative results around targeted attacks and model transfer, our results nevertheless highlight the need for practical countermeasures before open-source DNS systems can be used in safety-critical applications.

2603.10909 2026-03-12 eess.SP

Level Crossing Rate Analysis for Optimal Single-user RIS Systems

Amy S. Inwood, Peter J. Smith, Philippa A. Martin, Graeme K. Woodward

详情
Journal ref
Proc. GLOBECOM 2024 - 2024 IEEE Global Communications Conference
英文摘要

We analyse the level crossing rate (LCR) of an uplink single-user (SU) reconfigurable intelligent surface (RIS) aided system. It is assumed that the RIS to base station (RIS-BS) channel is deployed as line-of-sight (LoS), and the user (UE)-RIS and UE-BS channels are correlated Rayleigh. For the optimal RIS reflection matrix, we derive a novel and exact analytical LCR expression for when the direct (UE-BS) channel is blocked, i.e. the RIS-only channel. Also, the existing exact expression for the direct-only channel (equivalent to classical maximal-ratio-combining (MRC)) suffers from extreme numerical precision problems when the BS has many elements. Therefore, we propose a new stable and accurate approximation to the LCR of the direct channel. The approximation is based on replacing any small similar eigenvalues of the channel correlation matrix by their average. We show that increasing the number of elements at the RIS or BS and decreasing channel correlation makes the LCR drop more rapidly for thresholds away from the mean SNR. Crucially, we find that RIS systems do not significantly amplify temporal variations in the channel. This is particularly beneficial for RIS systems considering the difficulty in acquiring channel state information (CSI).

2603.10906 2026-03-12 eess.SY cs.SY

Towards Polynomial Immersion of Port-Hamiltonian Systems

Mohammad Itani, Manuel Schaller, Karl Worthmann, Timm Faulwasser

详情
英文摘要

Port-Hamiltonian (pH) systems offer a highly structured and energy-based modular framework for control systems. Many pH systems exhibit non-polynomial non-linearities. We consider the problem of immersing such systems into a higher-dimensional polynomial representation. We prove that, along system trajectories, important features of the non-polynomial pH system are preserved such as the internal interconnection geometry, the energy balance relation with passivity supply rate, as well as energy dissipation. We illustrate how the lifted system enables the design of stabilizing feedback laws by combining sum-of-squares optimization with concepts from passivity-based control. We draw upon several examples to illustrate our findings.

2603.10901 2026-03-12 eess.SP

Phase Selection and Analysis for Multi-frequency Multi-user RIS Systems Employing Subsurfaces

Amy S. Inwood, Peter J. Smith, Philippa A. Martin, Graeme K. Woodward

详情
Journal ref
Proc. 2023 IEEE Wireless Communications and Networking Conference (WCNC)
英文摘要

In this paper, we analyse the performance of a reconfigurable intelligent surface (RIS) aided system where the RIS is divided into subsurfaces. Each subsurface is designed specifically for one user, who is served on their own frequency band. The other subsurfaces (those not designed for this user) provide additional uncontrolled scattering. A new subsurface RIS design is developed based on the optimal single-user design for a pure line-of-sight (LoS) base station (BS) to RIS channel. This is also extended to arbitrary BS-RIS channels. For our method, exact closed form solutions for the mean SNR and a mean rate upper bound are derived for the BS-RIS LoS scenario. For each user, the designed subsurface performs optimally in LoS conditions and is remarkably robust to non-LoS conditions. The system design drives down complexity to extremely low levels, reducing RIS design and receiver processing complexity and reducing the channel estimation requirements. We also quantify the complexity-performance trade-off for the new design relative to multi-user approaches.

2603.10890 2026-03-12 cs.RO cs.SY eess.SY

A gripper for flap separation and opening of sealed bags

Sergi Foix, Jaume Oriol, Carme Torras, Júlia Borràs

Comments 8 pages, Accepted at the 2026 IEEE International Conference on Robotics & Automation (ICRA2026)

详情
英文摘要

Separating thin, flexible layers that must be individually grasped is a common but challenging manipulation primitive for most off-the-shelf grippers. A prominent example arises in clinical settings: the opening of sterile flat pouches for the preparation of the operating room, where the first step is to separate and grasp the flaps. We present a novel gripper design and opening strategy that enables reliable flap separation and robust seal opening. This capability addresses a high-volume repetitive hospital procedure in which nurses manually open up to 240 bags per shift, a physically demanding task linked to musculoskeletal injuries. Our design combines an active dented-roller fingertip with compliant fingers that exploit environmental constraints to robustly grasp thin flexible flaps. Experiments demonstrate that the proposed gripper reliably grasps and separates sealed bag flaps and other thin-layered materials from the hospital, the most sensitive variable affecting performance being the normal force applied. When two copies of the gripper grasp both flaps, the system withstands the forces needed to open the seals robustly. To our knowledge, this is one of the first demonstrations of robotic assistance to automate this repetitive, low-value, but critical hospital task.

2603.10812 2026-03-12 math.OC cs.SY eess.SY

Distributed Stability Certification and Control from Local Data

Surya Malladi, Nima Monshizadeh

详情
英文摘要

Most data-driven analysis and control methods rely on centralized access to system measurements. In contrast, we consider a setting in which the measurements are distributed across multiple agents and raw data are not shared. Each agent has access only to locally held samples, possibly as little as a single measurement, and agents exchange only locally computed signals. Consequently, no individual agent possesses sufficient information to identify the entire system or synthesize a controller independently. To address this limitation, we develop distributed dynamical algorithms that enable the agents to collectively compute global system certificates from local data. Two problems are addressed. First, for stable linear time-invariant (LTI) systems, the agents compute a Lyapunov certificate by solving the Lyapunov equation in a fully distributed manner. Second, for general LTI systems, they compute the stabilizing solution of the algebraic Riccati equation and hence the optimal linear-quadratic regulator (LQR). An initially proposed scheme guarantees practical convergence, while a subsequent augmented PI-type algorithm achieves exact convergence to the desired solution. We further establish robustness of the resulting LQR controller to uncertainty and measurement noise. The approach is illustrated through distributed Lyapunov certification of a quadruple-tank process and distributed LQR design for helicopter dynamics.

2603.10802 2026-03-12 cs.NI cs.AI cs.LG cs.SY eess.SY

Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks

Mohamad Alkadamani, Amir Ghasemi, Halim Yanikomeroglu

Comments 13 pages, 10 figures. Submitted to IEEE Transactions on Machine Learning in Communications and Networking

详情
英文摘要

The growing demand for wireless connectivity, combined with limited spectrum resources, calls for more efficient spectrum management. Spectrum sharing is a promising approach; however, regulators need accurate methods to characterize demand dynamics and guide allocation decisions. This paper builds and validates a spectrum demand proxy from public deployment records and uses a graph attention network in a hierarchical, multi-resolution setup (HR-GAT) to estimate spectrum demand at fine spatial scales. The model captures both neighborhood effects and cross-scale patterns, reducing spatial autocorrelation and improving generalization. Evaluated across five Canadian cities and against eight competitive baselines, HR-GAT reduces median RMSE by roughly 21% relative to the best alternative and lowers residual spatial bias. The resulting demand maps are regulator-accessible and support spectrum sharing and spectrum allocation in wireless networks.

2603.10800 2026-03-12 cs.LG cs.AI cs.SY eess.SY

AI-Enhanced Spatial Cellular Traffic Demand Prediction with Contextual Clustering and Error Correction for 5G/6G Planning

Mohamad Alkadamani, Colin Brown, Halim Yanikomeroglu

Comments 5 pages, 8 figures. Submitted to IEEE Wireless Communications Letters

详情
英文摘要

Accurate spatial prediction of cellular traffic demand is essential for 5G NR capacity planning, network densification, and data-driven 6G planning. Although machine learning can fuse heterogeneous geospatial and socio-economic layers to estimate fine-grained demand maps, spatial autocorrelation can cause neighborhood leakage under naive train/test splits, inflating accuracy and weakening planning reliability. This paper presents an AI-driven framework that reduces leakage and improves spatial generalization via a context-aware two-stage splitting strategy with residual spatial error correction. Experiments using crowdsourced usage indicators across five major Canadian cities show consistent mean absolute error (MAE) reductions relative to location-only clustering, supporting more reliable bandwidth provisioning and evidence-based spectrum planning and sharing assessments.

2603.10791 2026-03-12 eess.IV

Semantic Satellite Communications for Synchronized Audiovisual Reconstruction

Fangyu Liu, Peiwen Jiang, Wenjin Wang, Chao-Kai Wen, Xiao Li, Shi Jin

详情
英文摘要

Satellite communications face severe bottlenecks in supporting high-fidelity synchronized audiovisual services, as conventional schemes struggle with cross-modal coherence under fluctuating channel conditions, limited bandwidth, and long propagation delays. To address these limitations, this paper proposes an adaptive multimodal semantic transmission system tailored for satellite scenarios, aiming for high-quality synchronized audiovisual reconstruction under bandwidth constraints. Unlike static schemes with fixed modal priorities, our framework features a dual-stream generative architecture that flexibly switches between video-driven audio generation and audio-driven video generation. This allows the system to dynamically decouple semantics, transmitting only the most important modality while employing cross-modal generation to recover the other. To balance reconstruction quality and transmission overhead, a dynamic keyframe update mechanism adaptively maintains the shared knowledge base according to wireless scenarios and user requirements. Furthermore, a large language model based decision module is introduced to enhance system adaptability. By integrating satellite-specific knowledge, this module jointly considers task requirements and channel factors such as weather-induced fading to proactively adjust transmission paths and generation workflows. Simulation results demonstrate that the proposed system significantly reduces bandwidth consumption while achieving high-fidelity audiovisual synchronization, improving transmission efficiency and robustness in challenging satellite scenarios.

2603.10763 2026-03-12 cs.LG cs.IT eess.SP math.IT

Prioritizing Gradient Sign Over Modulus: An Importance-Aware Framework for Wireless Federated Learning

Yiyang Yue, Jiacheng Yao, Wei Xu, Zhaohui Yang, George K. Karagiannidis, Dusit Niyato

详情
英文摘要

Wireless federated learning (FL) facilitates collaborative training of artificial intelligence (AI) models to support ubiquitous intelligent applications at the wireless edge. However, the inherent constraints of limited wireless resources inevitably lead to unreliable communication, which poses a significant challenge to wireless FL. To overcome this challenge, we propose Sign-Prioritized FL (SP-FL), a novel framework that improves wireless FL by prioritizing the transmission of important gradient information through uneven resource allocation. Specifically, recognizing the importance of descent direction in model updating, we transmit gradient signs in individual packets and allow their reuse for gradient descent if the remaining gradient modulus cannot be correctly recovered. To further improve the reliability of transmission of important information, we formulate a hierarchical resource allocation problem based on the importance disparity at both the packet and device levels, optimizing bandwidth allocation across multiple devices and power allocation between sign and modulus packets. To make the problem tractable, the one-step convergence behavior of SP-FL, which characterizes data importance at both levels in an explicit form, is analyzed. We then propose an alternating optimization algorithm to solve this problem using the Newton-Raphson method and successive convex approximation (SCA). Simulation results confirm the superiority of SP-FL, especially in resource-constrained scenarios, demonstrating up to 9.96\% higher testing accuracy on the CIFAR-10 dataset compared to existing methods.

2603.10671 2026-03-12 cs.AR cs.CV eess.IV

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

Qiyue Chen, Yao Li, Jie Tao, Song Chen, Li Li, Dong Liu

详情
英文摘要

Recently, progress has been made on the Intra Pattern Copy (IPC) tool for JPEG XS, an image compression standard designed for low-latency and low-complexity coding. IPC performs wavelet-domain intra compensation predictions to reduce spatial redundancy in screen content. A key module of IPC is the displacement vector (DV) search, which aims to solve the optimal prediction reference offset. However, the DV search process is computationally intensive, posing challenges for practical hardware deployment. In this paper, we propose an efficient pipelined FPGA architecture design for the DV search module to promote the practical deployment of IPC. Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance. Experimental results show that our proposed architecture achieves a throughput of 38.3 Mpixels/s with a power consumption of 277 mW, demonstrating its feasibility for practical hardware implementation in IPC and other predictive coding tools, and providing a promising foundation for ASIC deployment.

2603.10670 2026-03-12 cs.RO cs.SY eess.SY

Dynamic Modeling and Attitude Control of a Reaction-Wheel-Based Low-Gravity Bipedal Hopper

Shriram Hari, M Venkata Sai Nikhil, R Prasanth Kumar

Comments Preprint. Under review

详情
英文摘要

Planetary bodies characterized by low gravitational acceleration, such as the Moon and near-Earth asteroids, impose unique locomotion constraints due to diminished contact forces and extended airborne intervals. Among traversal strategies, hopping locomotion offers high energy efficiency but is prone to mid-flight attitude instability caused by asymmetric thrust generation and uneven terrain interactions. This paper presents an underactuated bipedal hopping robot that employs an internal reaction wheel to regulate body posture during the ballistic flight phase. The system is modeled as a gyrostat, enabling analysis of the dynamic coupling between torso rotation and reaction wheel momentum. The locomotion cycle comprises three phases: a leg-driven propulsive jump, mid-air attitude stabilization via an active momentum exchange controller, and a shock-absorbing landing. A reduced-order model is developed to capture the critical coupling between torso rotation and reaction wheel dynamics. The proposed framework is evaluated in MuJoCo-based simulations under lunar gravity conditions (g = 1.625 m/s^2). Results demonstrate that activation of the reaction wheel controller reduces peak mid-air angular deviation by more than 65% and constrains landing attitude error to within 3.5 degrees at touchdown. Additionally, actuator saturation per hop cycle is reduced, ensuring sufficient control authority. Overall, the approach significantly mitigates in-flight attitude excursions and enables consistent upright landings, providing a practical and control-efficient solution for locomotion on irregular extraterrestrial terrains.

2603.10656 2026-03-12 eess.SY cs.SY

Distributed State Estimation of Discrete-Time LTI Systems via Jordan Canonical Representation

Giulio Fattore, Maria Elena Valcher, Rui Gao, Guang-Hong Yang

Comments Extended version of the conference paper accepted for presentation at the 24th European Control Conference (ECC) in Reykjavík, Iceland

详情
英文摘要

In this paper, we address the problem of distributed state estimation for a discrete-time, linear time-invariant system. Building on the framework proposed in [2], we exploit the Jordan canonical form of the system matrix to develop a distributed estimation scheme that ensures the asymptotic convergence of the local state estimates to the true system state. The proposed approach relies on the idea that each node reconstructs the components of the system state that are detectable for it through a local Luenberger observer, while employing a consensus-based strategy to estimate the undetectable components. Necessary and sufficient conditions for the existence of a distributed observer that guarantees asymptotic estimation accuracy are derived. Compared with the previous work [2], the proposed design offers greater flexibility in the selection of the coupling gains and leads to a less restrictive set of conditions for solvability.

2603.10635 2026-03-12 eess.SP cs.SY eess.SY

Propagation and Rate-Aware Cell Switching Optimization in HAPS-Assisted Wireless Networks

Mehmet Eren Uluçınar, Özgün Ersoy, Berk Ciloglu, Metin Ozturk, Ali Gorcin

详情
英文摘要

Cell switching is a promising approach for improving energy efficiency in wireless networks; however, existing studies largely rely on simplified models and energy-centric formulations that overlook key performance-limiting factors. This paper revisits the cell switching concept by redefining its modeling assumptions and mathematical formulation, explicitly incorporating realistic propagation effects such as building entry loss (BEL) and atmospheric losses relevant to non-terrestrial networks (NTN), particularly high-altitude platform station (HAPS). Beyond proposing a new cell switching strategy, the conventional energy-focused problem is reformulated as a multi-objective optimization framework that jointly minimizes power consumption, unconnected users, and data rate degradation. Through this reformulation, the proposed methods ensure that energy-efficient operation is achieved without compromising user connectivity and data rate performance, thereby inherently supporting sustainability objectives for sixth-generation (6G) networks. To solve this reformulated problem, two complementary approaches are employed: the weighted sum method (WSM), which enables flexible and adaptive weighting mechanism, and the {ε-constraint-inspired method (εCM), which converts connectivity and rate-related objectives into constraints within the conventional energy-focused problem. Moreover, unlike prior work relying only on simulations, this study combines system-level simulations with Sionna-OpenAirInterface (OAI) based emulation on a smaller network to validate the proposed cell switching concept under realistic conditions. The results show that, compared to the conventional approach, WSM reduces rate degradation for up to 70% for high-loss indoor users and eliminates the 44% drop for low-loss indoor users.

2603.10629 2026-03-12 eess.SP

Flexible Multi-Target Angular Emulation for Over-the-Air Testing of Large-Scale ISAC Base Stations: Principle and Experimental Verification

Chunhui Li, Hao Sun, Wei Fan

详情
英文摘要

Over-the-air (OTA) emulation of diverse sensing target characteristics in a controlled laboratory environment is pivotal for advancing integrated sensing and communication (ISAC) technology, as it facilitates the non-invasive performance evaluation of ISAC base stations (BSs) across complex scenarios. In this work, a flexible multi-target OTA emulation framework based on a wireless cable method is proposed to evaluate the sensing performance of large-scale ISAC BSs. The core concept leverages an amplitude and phase modulation (APM) network to simultaneously establish wireless cables and simulate target spatial characteristics without consuming additional resources on costly radar target emulators. For the wireless cable method, the condition number increases as the number of antennas scales up, which affects the performance of the wireless cable. Although the wireless cable concept has been established for devices-under-test (DUTs) with a limited number of antenna ports, establishing wireless cables for large-scale DUTs remains an open question in the community. We address this problem by optimizing the OTA probe array configuration based on the theoretical properties of strictly diagonally dominant matrices. Experimental results validate the proposed framework, demonstrating high-isolation wireless cables for a 32-element DUT and an extremely low condition number for a 128-element synthetic array. Furthermore, the OTA emulation of a dynamic dual-drone scenario confirms the method's effectiveness and practicality in reproducing complex sensing environments.

2603.10623 2026-03-12 eess.AS cs.LG cs.SD

Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context

Yuanbo Hou, Yanru Wu, Qiaoqiao Ren, Shengchen Li, Stephen Roberts, Dick Botteldooren

详情
英文摘要

Environmental sound understanding in computational auditory scene analysis (CASA) is often formulated as an audio-only recognition problem. This formulation leaves a persistent drawback in multi-label audio tagging (AT): acoustic similarity can make certain events difficult to separate from waveforms alone. In such cases, disambiguating cues often lie outside the waveform. Geospatial semantic context (GSC), derived from geographic information system data, e.g., points of interest (POI), provides location-tied environmental priors that can help reduce this ambiguity. A systematic study of this direction is enabled through the proposed geospatial audio tagging (Geo-AT) task, which conditions multi-label sound event tagging on GSC alongside audio. To benchmark Geo-AT, Geo-ATBench is introduced as a polyphonic audio benchmark with geographical annotations, containing 10.71 hours of audio across 28 event categories; each clip is paired with a GSC representation from 11 semantic context categories. GeoFusion-AT is proposed as a unified geo-audio fusion framework that evaluates feature-, representation-, and decision-level fusion on representative audio backbones, with audio- and GSC-only baselines. Results show that incorporating GSC improves AT performance, especially on acoustically confounded labels, indicating geospatial semantics provide effective priors beyond audio alone. A crowdsourced listening study with 10 participants on 579 samples shows that there is no significant difference in performance between models on Geo-ATBench labels and aggregated human labels, supporting Geo-ATBench as a human-aligned benchmark. The Geo-AT task, benchmark Geo-ATBench, and reproducible geo-audio fusion framework GeoFusion-AT provide a foundation for studying AT with geospatial semantic context within the CASA community. Dataset, code, models are on homepage (https://github.com/WuYanru2002/Geo-ATBench).

2603.10585 2026-03-12 eess.SP

Path Planning for Sound Speed Profile Estimation

Ludvig Lindström, Tadas Paskevicius, Andreas Jakobsson, Isaac Skog

Comments Submitted to FUSION 2026, Trondheim, 6 pages, 7 figures,

详情
英文摘要

Accurate estimation of the sound speed profile (SSP) is essential for underwater acoustic communication, sonar performance, and navigation, as the acoustic wave propagation depends strongly on the SSP. This work considers SSP estimation in a region of interest using an autonomous underwater vehicle (AUV) equipped with a conductivity-temperature-depth (CTD) sensor and an acoustic receiver measuring transmission loss (TL) from a sonar transmitter. The SSP is modeled using a linear basis-function expansion and is sequentially estimated with an unscented Kalman filter that fuses local CTD measurements with TL measurements. A receding-horizon path planning scheme is also employed to select future AUV positions by minimizing the predicted total sound speed variance. Simulations using the Bellhop acoustic wave propagation solver show that CTD measurements provide accurate local SSP estimates, whereas TL measurements are seen to capture the global characteristics of the SSP, with their joint use improving the reconstruction of both local variations and large-scale SSP behavior. The results also indicate that the proposed path planning strategy reduces the estimation uncertainty compared to constant-velocity motion, thereby enabling improved environmental characterization for underwater acoustic systems.

2603.10549 2026-03-12 cs.CV cs.AI eess.SP

Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Mohammed Salah, Eman Ouda, Giuseppe Dell'Avvocato, Fabrizio Sarasini, Ester D'Accardi, Jorge Dias, Davor Svetinovic, Stefano Sfarra, Yusra Abdulrahman

详情
英文摘要

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.

2603.10527 2026-03-12 cs.LG cs.SY eess.SY

World Model for Battery Degradation Prediction Under Non-Stationary Aging

Kai Chin Lim, Khay Wai See

Comments 18 pages, 3 figures

详情
英文摘要

Degradation prognosis for lithium-ion cells requires forecasting the state-of-health (SOH) trajectory over future cycles. Existing data-driven approaches can produce trajectory outputs through direct regression, but lack a mechanism to propagate degradation dynamics forward in time. This paper formulates battery degradation prognosis as a world model problem, encoding raw voltage, current, and temperature time-series from each cycle into a latent state and propagating it forward via a learned dynamics transition to produce a future trajectory spanning 80 cycles. To investigate whether electrochemical knowledge improves the learned dynamics, a Single Particle Model (SPM) constraint is incorporated into the training loss. Three configurations are evaluated on the Severson LiFePO4 (LFP) dataset of 138 cells. Iterative rollout halves the trajectory forecast error compared to direct regression from the same encoder. The SPM constraint improves prediction at the degradation knee where the resistance to SOH relationship is most applicable, without changing aggregate accuracy.

2603.10515 2026-03-12 eess.SP

A Harmony Composition-Inspired Tensor Modalization Method for Near-Field IRS Channel Estimation

Wenzhou Cao, Yashuai Cao, Tiejun Lv, Jie Zeng

Comments This work has been accepted for publication in IEEE Transactions on Vehicular Technology

详情
英文摘要

Intelligent reflecting surfaces (IRSs) are poised to revolutionize next-generation wireless communication systems by enhancing channel quality and spectrum efficiency through advanced wave manipulation. However, extremely large-scale IRS {(XL-IRS)} deployments face significant challenges in channel estimation due to multiplicative path loss and near-field (NF) effects, where spherical wavefronts couple distance and angle parameters. Existing polar-domain codebook-based compressive sensing methods for NF channel estimation suffer from low accuracy and high complexity, caused by the need for high-resolution grids of both distance and angle parameters. To address this, we propose a harmonic processing-inspired channel estimation framework for NF {XL-IRS} systems by leveraging tensor modalization to decouple channel parameters. Drawing an analogy to musical harmonic analysis, our approach decomposes the high-dimensional NF channel tensor into independent factor matrices, modeled as ``chords," representing distance and angle parameters. Through harmonic analysis-inspired distance parameter decoupling, we design a compact, distance-dependent codebook that enables high-resolution NF channel parameter estimation. This approach significantly reduces the codebook size compared to polar-domain methods. {Then, we} derive the Cramér-Rao lower bound (CRLB) to evaluate the estimators. Finally, simulation results show an 8.5 dB improvement in normalized mean square error (NMSE) compared to conventional methods, underscoring its low complexity and high accuracy.

2603.10443 2026-03-12 eess.SP

3D Spectrum Awareness for Radio Dynamic Zones Using Kriging and Matrix Completion

Mushfiqur Rahman, Sung Joon Maeng, Ismail Guvenc, Chau-Wai Wong

Comments Published in IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2024

详情
Journal ref
2024 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2024, pp. 439-446
英文摘要

Radio Dynamic Zones (RDZs) are geographically defined areas specifically allocated for testing new wireless technologies. It is essential to safeguard the regular spectrum users outside the zones from the interference caused by the deployed equipment within this zone. Previous works have utilized sparse reference signal received power (RSRP) measurements collected by unmanned aerial vehicles (UAVs) to construct a dense 3D radio map through ordinary Kriging. In this work, we illustrate that matrix completion can outperform ordinary Kriging. We partitioned a 2D area of interest into small square grids where each grid corresponds to a single entry of a matrix. The matrix completion algorithm learns the global structure of the radio environment map by leveraging the low-rank property of propagation maps. Additionally, we illustrate that the simple Kriging and trans-Gaussian Kriging yield better results when the density of known measurements is lower. Earlier works of RSRP prediction involved a training dataset at a single altitude. In this work, we also show that performance can be improved by utilizing a combined dataset from multiple altitudes.

2603.10426 2026-03-12 cs.IT eess.SP math.IT

3-D Trajectory Optimization for Robust Direction Sensing in Movable Antenna Systems

Wenyan Ma, Lipeng Zhu, Xiaodan Shao, Rui Zhang

详情
英文摘要

This paper presents a novel wireless sensing system where a movable antenna (MA) continuously moves and receives sensing signals within a three-dimensional (3-D) region to enhance sensing performance compared with conventional fixed-position antenna (FPA)-based sensing. We show that the performance of direction vector estimation for a target is fundamentally related to the 3-D MA trajectory in terms of the mean square angular error lower-bound (MSAEB), which is adopted as a coordinate-invariant performance metric. In particular, the closed-form expression of the MSAEB is derived as a function of the trajectory covariance matrix. Theoretical analysis shows that two-dimensional (2-D) antenna movement suffers from performance divergence for target direction close to the endfire direction of the 2-D MA plane, whereas 3-D movement can achieve isotropic sensing performance over the entire angular region. To achieve robust sensing performance, we formulate a min-max optimization problem to minimize the maximum (worst-case) MSAEB over a given continuous angular region wherein the target is located. An efficient successive convex approximation (SCA) algorithm is developed to optimize the 3-D MA trajectory and obtain a locally optimal solution. Numerical results demonstrate that the proposed 3-D MA sensing scheme is able to significantly reduce the worst-case mean square angular error (MSAE) compared with conventional arrays with FPAs and MA systems with 2-D movement only, thus achieving more accurate and robust direction estimation over the entire angular region.

2603.10421 2026-03-12 eess.SP cs.NI

Spyglass: Directional Spectrum Sensing with Single-shot AoA Estimation and Virtual Arrays

Raghav Subbaraman, Akshit Agarwal, Wenhao Chen, Dinesh Bharadia

详情
英文摘要

In this paper, we introduce Spyglass, a spectrum sensor designed to address the challenges of effective spectrum usage in dense wireless environments. Spyglass is capable of observing a frequency band and accurately estimating the Angle of Arrival (AoA) of any signal during a single transmission. This includes additional signal context such as center frequency, bandwidth, and I/Q samples. We overcome challenges such as the clutter of fleeting transmissions in common bands, the high cost of array processing for AoA estimation, and the difficulty of detecting and estimating channels for unknown signals. Our first contribution is the development of Searchlite, a protocol-agnostic signal detection and separation algorithm. We use a switched array to reduce cost and processing complexity, and we develop SSFP, a signal processing technique using Fourier transforms that is synchronized to switching boundaries. Spyglass performs multi-channel blind AoA estimation synchronized with the array. Implemented using commercially available hardware, Spyglass demonstrates a median AoA accuracy of 1.4$^\circ$ and the ability to separate simultaneous signals from multiple devices in an unconstrained RF environment, providing valuable tools for large-scale RF data collection and analysis.

2603.10420 2026-03-12 eess.AS cs.SD

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu

详情
英文摘要

We present FireRedASR2S, a state-of-the-art industrial-grade all-in-one automatic speech recognition (ASR) system. It integrates four modules in a unified pipeline: ASR, Voice Activity Detection (VAD), Spoken Language Identification (LID), and Punctuation Prediction (Punc). All modules achieve SOTA performance on the evaluated benchmarks: FireRedASR2: An ASR module with two variants, FireRedASR2-LLM (8B+ parameters) and FireRedASR2-AED (1B+ parameters), supporting speech and singing transcription for Mandarin, Chinese dialects and accents, English, and code-switching. Compared to FireRedASR, FireRedASR2 delivers improved recognition accuracy and broader dialect and accent coverage. FireRedASR2-LLM achieves 2.89% average CER on 4 public Mandarin benchmarks and 11.55% on 19 public Chinese dialects and accents benchmarks, outperforming competitive baselines including Doubao-ASR, Qwen3-ASR, and Fun-ASR. FireRedVAD: An ultra-lightweight module (0.6M parameters) based on the Deep Feedforward Sequential Memory Network (DFSMN), supporting streaming VAD, non-streaming VAD, and multi-label VAD (mVAD). On the FLEURS-VAD-102 benchmark, it achieves 97.57% frame-level F1 and 99.60% AUC-ROC, outperforming Silero-VAD, TEN-VAD, FunASR-VAD, and WebRTC-VAD. FireRedLID: An Encoder-Decoder LID module supporting 100+ languages and 20+ Chinese dialects and accents. On FLEURS (82 languages), it achieves 97.18% utterance-level accuracy, outperforming Whisper and SpeechBrain. FireRedPunc: A BERT-style punctuation prediction module for Chinese and English. On multi-domain benchmarks, it achieves 78.90% average F1, outperforming FunASR-Punc (62.77%). To advance research in speech processing, we release model weights and code at https://github.com/FireRedTeam/FireRedASR2S.