arXivDaily arXiv每日学术速递 周一至周五更新
重置
2602.21163 2026-02-25 eess.IV eess.SP

A Light Fixture Color Temperature and Color Rendering Index Measuring Device

Gianluca Hiss Garbim, Luis Carlos Mathias, André Massami Assakawa, Taufik Abrão

Comments 11 pages, 12 figures, full paper

详情
英文摘要

The correlated color temperature (CCT) and color rendering index (CRI) of artificial light sources are important because they have implications for human biology and professional applications. Although CCT information is generally available for commercial lamps, CRI is commonly not reported. In addition, devices measuring these parameters are difficult to access as they require a spectrophotometer, a commonly expensive device. In this context, the present work designs and builds a meter in detail, from the structural part of the equipment, interface with sensors, and the calculation to the compensation algorithms implementation, aiming to build the dedicated functionalities of a spectrophotometer, which is designed without the use of optical lenses. In addition to simplifying the device, this approach allows measurements free from dispersions caused by chromatic aberrations typical of optical lenses. The prototype obtained proved to be effective, capturing the spectral power distributions of various light sources and calculating their CCT and CRI.

2602.21128 2026-02-25 eess.IV

Vision-Inspired Image Quality Assessment for Radar-Based Human Activity Representations

Huy Trinh, Davis Liu, Munia Humaira, Peter Lee, Zhou Wang

详情
英文摘要

Radar-based human activity recognition has gained attention as a privacy-preserving alternative to vision and wearable sensors, especially in sensitive environments like long-term care facilities. Micro-Doppler spectrograms derived from FMCW radar signals are central to recognizing dynamic activities, but their effectiveness is limited by noise and clutter. In this work, we use a benchmark radar dataset to reimplement and assess three recent denoising and preprocessing techniques: adaptive preprocessing, adaptive thresholding, and entropy-based denoising. To illustrate the shortcomings of conventional metrics in low-SNR regimes, we evaluate performance using both perceptual image quality measures and standard error-based metrics. We additionally propose a novel framework for static activity recognition using range-angle feature maps to expand HAR beyond dynamic activities. We present two important contributions: a temporal tracking algorithm to enforce consistency and a no-reference quality scoring algorithm to assess RA-map fidelity. According to experimental findings, our suggested techniques enhance classification performance and interpretability for both dynamic and static activities, opening the door for more reliable radar-based HAR systems.

2602.21116 2026-02-25 eess.SP cs.AI

Attention-Based SINR Estimation in User-Centric Non-Terrestrial Networks

Bruno De Filippo, Alessandro Guidotti, Alessandro Vanelli-Coralli

Comments Paper accepted for presentation at IEEE International Conference on Machine Learning in Communications and Networking (ICMLCN) 2026

详情
英文摘要

The signal-to-interference-plus-noise ratio (SINR) is central to performance optimization in user-centric beamforming for satellite-based non-terrestrial networks (NTNs). Its assessment either requires the transmission of dedicated pilots or relies on computing the beamforming matrix through minimum mean squared error (MMSE)-based formulations beforehand, a process that introduces significant computational overhead. In this paper, we propose a low-complexity SINR estimation framework that leverages multi-head self-attention (MHSA) to extract inter-user interference features directly from either channel state information or user location reports. The proposed dual MHSA (DMHSA) models evaluate the SINR of a scheduled user group without requiring explicit MMSE calculations. The architecture achieves a computational complexity reduction by a factor of three in the CSI-based setting and by two orders of magnitude in the location-based configuration, the latter benefiting from the lower dimensionality of user reports. We show that both DMHSA models maintain high estimation accuracy, with the root mean squared error typically below 1 dB with priority-queuing-based scheduled users. These results enable the integration of DMHSA-based estimators into scheduling procedures, allowing the evaluation of multiple candidate user groups and the selection of those offering the highest average SINR and capacity.

2602.21113 2026-02-25 eess.SY cs.SY

A Survey of Recent Developments in SYCL Compiler Implementations

Huy Trinh

详情
英文摘要

This survey discusses recent advancements in SYCL compiler implementations, one of the crucial aspects of compiler construction for heterogeneous computing systems. We explore the transition from traditional compiler construction, from Single-Source Multiple Compiler Passes (SMCP) to a more advanced approach to Single-Source Single Compiler Pass (SSCP). The survey analyzes multiple papers that researched the different developments of SYCL implementation based on SSCP and their approach to enhancing performance and addressing separate challenges.

2602.21107 2026-02-25 eess.SP

Resilient Cell-Free Massive MIMO Networks

Junbin Yu, Tianyu Lu, Mohammadali Mohammadi, Michail Matthaiou

Comments ICC 2026, Accepted

详情
英文摘要

This paper proposes a novel optimization framework for enhancing the security resilience of cell-free massive multiple-input multiple-output (CF-mMIMO) networks with multi-antenna access points (APs) and protective partial zero-forcing (PPZF) under active eavesdropping. Based on the main principles of absorption, adaptation, and recovery, we formulate a security-aware resilience metric to quantify the system performance during and after a security outage. A multi-user service priority-aware power allocation problem is formulated to minimize the mean squared error (MSE) between real-time and desired security efficiency, thereby enabling a trade-off between the target user's secrecy performance and multi-user quality of service (QoS). To solve this non-convex problem, a security-aware iterative algorithm based on the successive convex approximation (SCA) is employed. The proposed algorithm determines the optimal power allocation strategy by balancing solution quality against recovery time. At each iteration, it evaluates the overall resilience score and selects the strategy that achieves the highest value. Simulation results confirm that the proposed framework significantly improves the resilience of CF-mMIMO networks, allowing flexible adaptation between rapid recovery and high-quality recovery, depending on system requirements.

2602.21102 2026-02-25 eess.SP

BRISC: A Dataset of Channel Measurements at 5 GHz With a Reflective Intelligent Surface

Mattia Piana, Giovanni Angelo Alghisi, Anna Valeria Guglielmi, Giovanni Perin, Francesco Gringoli, Stefano Tomasin

详情
英文摘要

We introduce the broadband reconfigurable intelligent surface (RIS) channel (BRISC) dataset. The dataset comprises measurements of channel state information (CSI) collected at 5.53 GHz using a 256-element RIS with binary states. In the measurement campaign, the transmitter and receiver are two software defined radios (SDRs), phase-synchronized via an OctoClock, where the transmitter (receiver) is equipped with one (two) antenna(s). To manage complexity, the RIS elements are grouped into blocks of different sizes, where all elements within a block share the same state. CSIs have been captured for multiple a) transmitter positions (and fixed receiver location), b) pilot block sizes, and c) state configurations. Furthermore, we calibrated the parameters of state-of-the-art RIS channel models to fit the measured CSI. With approximately 10000 configurations explored per transmitting position, BRISC serves as a robust benchmark in communication applications. We also show here an example of its use for physical-layer authentication.

2602.21096 2026-02-25 eess.SP

Distributed Continuous Aperture Arrays for Multiuser SWIPT

Muhammad Zeeshan Mumtaz, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou

Comments The paper has been accepted for presentation at IEEE International Conference on Communications (ICC), 24 to 28 May 2026

详情
英文摘要

This paper proposes a distributed continuous aperture array (D CAPA) to support simultaneous wireless information and power transfer (SWIPT) to multiple information users (IUs) and energy users (EUs). Each metasurface supports continuous surface currents that radiate electromagnetic (EM) waves for information and energy transmission to the users. These waves propagate through continuous EM channels characterized by the dyadic Green function. We formulate a system power consumption (PC) minimization problem subject to spectral efficiency and energy harvesting quality of service (QoS) requirements, where the QoS requirements are derived under the equal power allocation (EPA) scheme. An efficient two layer optimization algorithm is developed to solve this problem by optimizing the power allocation subject to the QoS violation penalties using augmented Lagrangian transformation. Our numerical results show that well optimized current distributions over each metasurface in the proposed D CAPA achieve up to 65% and 61% reductions in overall system PC compared to the EPA and colocated CAPA (C CAPA) cases, while maintaining the same total aperture size and transmission power.

2602.21090 2026-02-25 math.OC cs.SY eess.SY

Robustness certificates in data-driven non-convex optimization with additively-uncertain constraints

Alexander J Gallo, Massimiliano Zoggia, Alessandro Falsone, Maria Prandini, Simone Garatti

Comments 11 pages, 8 figures. The manuscript has been submitted to the IEEE Transactions on Automatic Control for possible publication

详情
英文摘要

We consider decision-making problems that are formulated as non-convex optimization programs where uncertainty enters the constraints through an additive term, independent of the decision variables, and robustness is imposed using a finite data-set, according to the scenario robust optimization paradigm. By exploiting the structure of the constraints, we show that both a priori and a posteriori distribution-free probabilistic robustness certificates for a possibly sub-optimal solution to the resulting data-driven optimization problem can be obtained with minimal computational effort. Building on these results, we also discuss a one-shot and an incremental procedure to determine the size of the data-set so as to guarantee a user-chosen robustness level. Notably, both the a posteriori robustness assessment and incremental data-set sizing do not require to solve the non-convex scenario program. A comparative analysis performed on the unit commitment problem using real data reveals a limited increase in conservativeness with a significant computational saving with respect to the application of scenario theory results for general, non necessarily structured, non-convex problems.

2602.21081 2026-02-25 cs.LG eess.SP

Scaling Vision Transformers: Evaluating DeepSpeed for Image-Centric Workloads

Huy Trinh, Rebecca Ma, Zeqi Yu, Tahsin Reza

详情
英文摘要

Vision Transformers (ViTs) have demonstrated remarkable potential in image processing tasks by utilizing self-attention mechanisms to capture global relationships within data. However, their scalability is hindered by significant computational and memory demands, especially for large-scale models with many parameters. This study aims to leverage DeepSpeed, a highly efficient distributed training framework that is commonly used for language models, to enhance the scalability and performance of ViTs. We evaluate intra- and inter-node training efficiency across multiple GPU configurations on various datasets like CIFAR-10 and CIFAR-100, exploring the impact of distributed data parallelism on training speed, communication overhead, and overall scalability (strong and weak scaling). By systematically varying software parameters, such as batch size and gradient accumulation, we identify key factors influencing performance of distributed training. The experiments in this study provide a foundational basis for applying DeepSpeed to image-related tasks. Future work will extend these investigations to deepen our understanding of DeepSpeed's limitations and explore strategies for optimizing distributed training pipelines for Vision Transformers.

2602.21027 2026-02-25 eess.SP cs.IT math.IT

Optimal QAM Constellation for Over-the-Air Computation in the Presence of Heavy-Tailed Channel Noise

Saeed Razavikia, Deniz Gündüz, Carlo Fischione

详情
英文摘要

Over-the-air computation (OAC) enables low-latency aggregation over multiple-access channels (MACs) by exploiting the superposition property of the wireless medium to compute functions efficiently in distributed networks. A critical but often overlooked challenge is that electromagnetic interference in practical radio channels frequently exhibits heavy-tailed behavior, causing strong impulsive noise that severely degrades computation performance. This work studies digital OAC with QAM-based signaling under heavy-tailed interference modeled by a Cauchy distribution (lacking a finite second moment). We seek QAM-like constellations that minimize the mean-squared error (MSE) of sum aggregation subject to an average-power constraint. The problem is formulated as a constrained optimization, whose solution yields unique optimality conditions. Numerical results confirm the effectiveness of the proposed design. Notably, the framework extends naturally to nomographic functions, broader constellation families, and alternative noise models.

2602.20994 2026-02-25 eess.IV cs.AI cs.CL cs.CV cs.LG

Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures

Yubin Ge, Yongsong Huang, Xiaofeng Liu

Comments IEEE International Symposium on Biomedical Imaging (ISBI) 2026

详情
英文摘要

Report-supervised (RSuper) learning seeks to alleviate the need for dense tumor voxel labels with constraints derived from radiology reports (e.g., volumes, counts, sizes, locations). In MRI studies of brain tumors, however, we often involve multi-parametric scans and substructures. Here, fine-grained modality/parameter-wise reports are usually provided along with global findings and are correlated with different substructures. Moreover, the reports often describe only the largest lesion and provide qualitative or uncertain cues (``mild,'' ``possible''). Classical RSuper losses (e.g., sum volume consistency) can over-constrain or hallucinate unreported findings under such incompleteness, and are unable to utilize these hierarchical findings or exploit the priors of varied lesion types in a merged dataset. We explicitly parse the global quantitative and modality-wise qualitative findings and introduce a unified, one-sided, uncertainty-aware formulation (MS-RSuper) that: (i) aligns modality-specific qualitative cues (e.g., T1c enhancement, FLAIR edema) with their corresponding substructures using existence and absence losses; (ii) enforces one-sided lower-bounds for partial quantitative cues (e.g., largest lesion size, minimal multiplicity); and (iii) adds extra- vs. intra-axial anatomical priors to respect cohort differences. Certainty tokens scale penalties; missing cues are down-weighted. On 1238 report-labeled BraTS-MET/MEN scans, our MS-RSuper largely outperforms both a sparsely-supervised baseline and a naive RSuper method.

2602.20983 2026-02-25 eess.SP cs.IT math.IT

Cell-Free Massive MIMO-Assisted SWIPT Using Stacked Intelligent Metasurfaces

Thien Duc Hua, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou

Comments Accepted in IEEE TWC, Feb. 2026

详情
英文摘要

This study explores a next-generation multiple access (NGMA) framework for cell-free massive MIMO (CF-mMIMO) systems enhanced by stacked intelligent metasurfaces (SIMs), aiming to improve simultaneous wireless information and power transfer (SWIPT) performance. A fundamental challenge lies in optimally selecting the operating modes of access points (APs) to jointly maximize the received energy and satisfy spectral efficiency (SE) quality-of-service constraints. Practical system impairments, including a non-linear harvested energy model, pilot contamination (PC), channel estimation errors, and reliance on long-term statistical channel state information (CSI), are considered. We derive closed-form expressions for both the achievable SE and the average sum harvested energy (sum-HE). A mixed-integer non-convex optimization problem is formulated to jointly optimize the SIM phase shifts, APs mode selection, and power allocation to maximize average sum-HE under SE and average harvested energy constraints. To solve this problem, we propose a centralized training, decentralized execution (CTDE) framework based on deep reinforcement learning (DRL), which efficiently handles high-dimensional decision spaces. A Markovian environment and a normalized joint reward function are introduced to enhance the training stability across on-policy and off-policy DRL algorithms. Additionally, we provide a two-phase convex-based solution as a theoretical robust performance. Numerical results demonstrate that the proposed DRL-based CTDE framework achieves SWIPT performance comparable to convexification-based solution, while significantly outperforming baselines.

2602.20953 2026-02-25 eess.SP

Timing Recovery and Sequence Detection for Integrate-and-Fire Time Encoding Receivers

Neil Irwin Bernardo

Comments 6 pages, 3 figures, accepted in 2026 IEEE Wireless Communications and Networking Conference (WCNC 2026)

详情
英文摘要

Recent advances in neuromorphic signal processing have introduced time encoding machines as a promising alternative to conventional uniform sampling for low-power communication receivers. In this paradigm, analog signals are converted into event timings by an integrate-and-fire circuit, allowing information to be represented through spike times rather than amplitude samples. While event-driven sampling eliminates the need for a fixed-rate clock, receivers equipped with integrate-and-fire time encoding machines, called time encoding receivers, often assume perfect symbol synchronization, leaving the problem of symbol timing recovery unresolved. This paper presents a joint timing recovery and data detection framework for integrate-and-fire time encoding receivers. The log-likelihood function is derived to capture the dependence between firing times, symbol timing offset, and transmitted sequence, leading to a maximum likelihood formulation for joint timing estimation and sequence detection. A practical two-stage receiver is developed, consisting of a timing recovery algorithm followed by a zero-forcing detector. Simulation results demonstrate accurate symbol timing offset estimation and improved symbol error rate performance compared to existing time encoding receivers.

2602.20932 2026-02-25 cs.LG cs.HC eess.SP

Hierarchic-EEG2Text: Assessing EEG-To-Text Decoding across Hierarchical Abstraction Levels

Anupam Sharma, Harish Katti, Prajwal Singh, Shanmuganathan Raman, Krishna Miyapuram

详情
英文摘要

An electroencephalogram (EEG) records the spatially averaged electrical activity of neurons in the brain, measured from the human scalp. Prior studies have explored EEG-based classification of objects or concepts, often for passive viewing of briefly presented image or video stimuli, with limited classes. Because EEG exhibits a low signal-to-noise ratio, recognizing fine-grained representations across a large number of classes remains challenging; however, abstract-level object representations may exist. In this work, we investigate whether EEG captures object representations across multiple hierarchical levels, and propose episodic analysis, in which a Machine Learning (ML) model is evaluated across various, yet related, classification tasks (episodes). Unlike prior episodic EEG studies that rely on fixed or randomly sampled classes of equal cardinality, we adopt hierarchy-aware episode sampling using WordNet to generate episodes with variable classes of diverse hierarchy. We also present the largest episodic framework in the EEG domain for detecting observed text from EEG signals in the PEERS dataset, comprising $931538$ EEG samples under $1610$ object labels, acquired from $264$ human participants (subjects) performing controlled cognitive tasks, enabling the study of neural dynamics underlying perception, decision-making, and performance monitoring. We examine how the semantic abstraction level affects classification performance across multiple learning techniques and architectures, providing a comprehensive analysis. The models tend to improve performance when the classification categories are drawn from higher levels of the hierarchy, suggesting sensitivity to abstraction. Our work highlights abstraction depth as an underexplored dimension of EEG decoding and motivates future research in this direction.

2602.20874 2026-02-25 eess.SP cs.IT math.IT

Symbol-Aware Precoder Design for Physical-Layer Anonymous Communications

Yu Li, Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang

Comments This paper has been submitted for possible journal publication

详情
英文摘要

Physical-layer characteristics, such as channel state information (CSI) and transmitter noise induced by hardware impairments, are often uniquely associated with a transmitter. This paper investigates transmitter anonymity at the physical layer from a signal design perspective. We consider an anonymous communication problem where the receiver should reliably decode the signal from the transmitter but should not make use of the signal to infer the transmitter's identity.Transmitter anonymity is quantified using a Kullback-Leibler divergence (KLD)-based metric, which enables the formulation of explicit anonymity constraints in the precoder design.We then propose an anonymous symbol-level precoding strategy that preserves reliable communication under spatial multiplexing while preventing transmitter identification. The proposed framework employs a partitioned equal-gain combining (P-EGC) scheme that leverages receiver diversity without requiring transmitter-specific CSI. Simulation results demonstrate anonymity-reliability tradeoffs across different signal-to-noise ratios (SNRs) and numbers of data streams. Moreover, the results reveal opposite trends of anonymity with respect to transmitter-dependent noise variations in the low-SNR and high-SNR regimes.

2511.22144 2026-02-25 eess.SP

Bistatic Passive Sensing via CSI Power

Zhongqin Wang, J. Andrew Zhang, Kai Wu, Kuangda Chen, Min Xu, Y. Jay Guo

详情
英文摘要

Passive object sensing with communication signals is a key enabler of perceptive mobile networks and integrated sensing and communication. In practical bistatic deployments, transmitter-receiver asynchrony and hardware impairments introduce time-varying random phase offsets in Channel State Information (CSI). Together with limited bandwidth and small antenna arrays, these effects degrade sensing accuracy. This work proposes a lightweight bistatic passive tracking and sensing framework that operates in the CSI-power domain. CSI power suppresses these offsets without explicit phase calibration, while preserving target-induced sensing cues. We show that physically admissible constraints in the spatial-frequency domain induced by transmitter-receiver geometry can resolve the mirror ambiguity inherent to real-valued CSI power. Building on these properties, we develop a real-time 3D Fourier-domain processing pipeline that jointly recovers spectral (delay), spatial (angle), and temporal (Doppler) signatures. The resulting features are integrated into an online framework with adaptive motion detection, outlier suppression, and extended Kalman filter tracking with deterministic initialization, followed by position-refined micro-Doppler feature extraction for micro-motion sensing. Extensive experiments, including simulations, a real-world prototype using 3.1 GHz LTE signals, and an open-source gait recognition dataset, demonstrate the effectiveness of the proposed CSI-power-based framework for bistatic passive tracking and sensing.

2510.24332 2026-02-25 cs.SD cs.CV eess.AS eess.IV

Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes

Jonas Hein, Lazaros Vlachopoulos, Maurits Geert Laurent Olthof, Bastian Sigrist, Philipp Fürnstahl, Matthias Seibold

详情
英文摘要

Purpose: Surgical scene understanding is key to advancing computer-aided and intelligent surgical systems. Current approaches predominantly rely on visual data or end-to-end learning, which limits fine-grained contextual modeling. This work aims to enhance surgical scene representations by integrating 3D acoustic information, enabling temporally and spatially aware multimodal understanding of surgical environments. Methods: We propose a novel framework for generating 4D audio-visual representations of surgical scenes by projecting acoustic localization information from a phased microphone array onto dynamic point clouds from an RGB-D camera. A transformer-based acoustic event detection module identifies relevant temporal segments containing tool-tissue interactions which are spatially localized in the audio-visual scene representation. The system was experimentally evaluated in a realistic operating room setup during simulated surgical procedures performed by experts. Results: The proposed method successfully localizes surgical acoustic events in 3D space and associates them with visual scene elements. Experimental evaluation demonstrates accurate spatial sound localization and robust fusion of multimodal data, providing a comprehensive, dynamic representation of surgical activity. Conclusion: This work introduces the first approach for spatial sound localization in dynamic surgical scenes, marking a significant advancement toward multimodal surgical scene representations. By integrating acoustic and visual data, the proposed framework enables richer contextual understanding and provides a foundation for future intelligent and autonomous surgical systems.

2509.12261 2026-02-25 cs.SD eess.AS

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

Marko Djukanovic, Christian Blum, Aleksandar Kartelj, Ana Nikolikj, Guenther Raidl

详情
英文摘要

This paper addresses the Longest Filled Common Subsequence (LFCS) problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction. Existing approaches, including exact, metaheuristic, and approximation algorithms, have primarily been evaluated on small-sized instances, which offer limited insights into their scalability. In this work, we introduce a new benchmark dataset with significantly larger instances and demonstrate that existing datasets lack the discriminative power needed to meaningfully assess algorithm performance at scale. To solve large instances efficiently, we utilize an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that iteratively generates promising subproblems via component-based construction and refines them using feedback from prior iterations. Subproblems are solved using an external black-box solver. Extensive experiments on both standard and newly introduced benchmarks show that the proposed adaptive CMSA achieves state-of-the-art performance, outperforming five leading methods. Notably, on 1,510 problem instances with known optimal solutions, our approach solves 1,486 of them -- achieving over 99.9% optimal solution quality and demonstrating exceptional scalability. We additionally propose a novel application of LFCS for song identification from degraded audio excerpts as an engineering contribution, using real-world energy-profile instances from popular music. Finally, we conducted an empirical explainability analysis to identify critical feature combinations influencing algorithm performance, i.e., the key problem features contributing to success or failure of the approaches across different instance types are revealed.

2507.19369 2026-02-25 eess.AS cs.SD

Binaural Target Speaker Extraction using Individualized HRTF

Yoav Ellinson, Sharon Gannot

详情
英文摘要

In this work, we address the problem of binaural target-speaker extraction in the presence of multiple simultane-ous talkers. We propose a novel approach that leverages the individual listener's Head-Related Transfer Function (HRTF) to isolate the target speaker. The proposed method is speaker-independent, as it does not rely on speaker embeddings. We employ a fully complex-valued neural network that operates directly on the complex-valued Short-Time Fourier transform (STFT) of the mixed audio signals, and compare it to a Real-Imaginary (RI)-based neural network, demonstrating the advantages of the former. We first evaluate the method in an anechoic, noise-free scenario, achieving excellent extraction performance while preserving the binaural cues of the target signal. We then extend the evaluation to reverberant conditions. Our method proves robust, maintaining speech clarity and source directionality while simultaneously reducing reverberation. A comparative analysis with existing binaural Target Speaker Extraction (TSE) methods shows that the proposed approach achieves performance comparable to state-of-the-art techniques in terms of noise reduction and perceptual quality, while providing a clear advantage in preserving binaural cues. Demo-page: https://bi-ctse-hrtf.github.io

2507.04002 2026-02-25 cs.CV cs.RO eess.IV

NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

Siyu Li, Fei Teng, Yihong Cao, Kailun Yang, Zhiyong Li, Yaonan Wang

Comments Accepted to IEEE Transactions on Image Processing (TIP). The source code will be made publicly available at https://github.com/lynn-yu/NRSeg

详情
英文摘要

Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of labeled data for robustifying BEV segmentation. Yet, our preliminary findings reveal that generation noise in synthetic data compromises efficient BEV model learning. To fully harness the potential of synthetic data from world models, this paper proposes NRSeg, a noise-resilient learning framework for BEV semantic segmentation. Specifically, a Perspective-Geometry Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance capability of generated data for model learning. This metric originates from the alignment measure between the perspective road mask of generated data and the mask projected from the BEV labels. Moreover, a Bi-Distribution Parallel Prediction (BiDPP) is designed to enhance the inherent robustness of the model, where the learning process is constrained through parallel prediction of multinomial and Dirichlet distributions. The former efficiently predicts semantic probabilities, whereas the latter adopts evidential deep learning to realize uncertainty quantification. Furthermore, a Hierarchical Local Semantic Exclusion (HLSE) module is designed to address the non-mutual exclusivity inherent in BEV semantic segmentation tasks. Experimental results demonstrate that NRSeg achieves state-of-the-art performance, yielding the highest improvements in mIoU of 13.8% and 11.4% in unsupervised and semi-supervised BEV segmentation tasks, respectively. The source code will be made publicly available at https://github.com/lynn-yu/NRSeg.

2507.03854 2026-02-25 cs.LG cs.SD cs.SY eess.AS eess.SY nlin.AO stat.ML

Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters

Kanad Sarkar, Austin Lu, Manan Mittal, Yongjie Zhuang, Ryan Corey, Andrew Singer

Comments 8 pages, Submitted at Forum Acousticum Euronoise 2025

详情
Journal ref
10.61782/fa.2025.0565
英文摘要

Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC), wherein the soundfield is minimized at a desired location. Given prior knowledge of the spatial region of the noise or control sources, we could improve FxLMS by adapting along the low-dimensional manifold of possible adaptive filter weights. We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region and constrain the weights of the adaptive filter to be the output of the decoder for a given state of latent variables. Then, we perform updates in the latent space and use the decoder to generate the cancellation filter. We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error. Under certain conditions, our Latent FxLMS model converges in fewer steps with comparable steady-state error to the standard FxLMS.

2506.14571 2026-02-25 eess.SP cs.LG eess.AS

The Perception of Phase Intercept Distortion and its Application in Data Augmentation

Venkatakrishnan Vaidyanathapuram Krishnan, Nathaniel Condit-Schultz

Comments Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025. Camera-ready version

详情
英文摘要

Phase distortion refers to the alteration of the phase relationships between frequencies in a signal, which can be perceptible. In this paper, we discuss a special case of phase distortion known as phase-intercept distortion, which is created by a frequency-independent phase shift. We hypothesize that, though this form of distortion changes a signal's waveform significantly, the distortion is imperceptible. Human-subject experiment results are reported which are consistent with this hypothesis. Furthermore, we discuss how the imperceptibility of phase-intercept distortion can be useful for machine learning, specifically for data augmentation. We conducted multiple experiments using phase-intercept distortion as a novel approach to data augmentation, and obtained improved results for audio machine learning tasks.

2506.10167 2026-02-25 cs.LG cs.SY eess.SY

Wasserstein Barycenter Soft Actor-Critic

Zahra Shahrooei, Ali Baheri

详情
英文摘要

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.

2505.16404 2026-02-25 eess.AS cs.SD

UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

Kishan Gupta, Srikanth Korse, Andreas Brendel, Nicola Pia, Guillaume Fuchs

详情
英文摘要

In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance the perceptual quality of coded speech, bandwidth extension (BWE) of the transmitted speech is an attractive and popular technique in conventional speech coding. In contrast, neural speech codecs are typically trained end-to-end to a specific set of requirements and are often not easily adaptable. In particular, they are typically trained to operate at a single fixed sampling rate. With the Universal Bandwidth Extension Generative Adversarial Network (UBGAN), we propose a modular and lightweight GAN-based solution that increases the operational flexibility of a wide range of conventional and neural codecs. Our model operates in the subband domain and extends the bandwidth of WB signals from 8 kHz to 16 kHz, resulting in super-wideband (SWB) signals. We further introduce two variants, guided-UBGAN and blind-UBGAN, where the guided version transmits quantized learned representation as a side information at a very low bitrate additional to the bitrate of the codec, while blind-BWE operates without such side-information. Our subjective assessments demonstrate the advantage of UBGAN applied to WB codecs and highlight the generalization capacity of our proposed method across multiple codecs and bitrates.

2503.17791 2026-02-25 eess.SP

Single-Satellite-Based Geolocation of Broadcast GNSS Spoofers from Low Earth Orbit

Zachary L. Clements, Patrick B. Ellis, Iain Goodridge, Matthew J. Murrian, Mark L. Psiaki, Todd E. Humphreys

详情
英文摘要

This paper presents an analysis and experimental demonstration of single-satellite single-pass geolocation of a terrestrial broadcast Global Navigation Satellite System (GNSS) spoofer from Low Earth Orbit (LEO). The proliferation of LEO-based GNSS receivers offers the prospect of unprecedented spectrum awareness, enabling persistent GNSS interference detection and geolocation. Accurate LEO-based single-receiver emitter geolocation is possible when a range-rate time history can be extracted for the emitter. This paper presents a technique crafted specifically for indiscriminate broadcast-type GNSS spoofing signals. Furthermore, it explores how unmodeled oscillator instability and worst-case spoofer-introduced signal variations degrade the geolocation estimate. The proposed geolocation technique is validated by a controlled experiment, in partnership with Spire Global, in which a LEO-based receiver captures broadcast GNSS spoofing signals transmitted from a known ground station on a non-GNSS frequency band.

2502.17457 2026-02-25 eess.SP cs.AI cs.LG

MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition

Mehran Shabanpour, Kasra Rad, Sadaf Khademi, Arash Mohammadi

详情
英文摘要

High-Density surface Electromyography (HDsEMG) has emerged as a pivotal resource for Human-Computer Interaction (HCI), offering direct insights into muscle activities and motion intentions. However, a significant challenge in practical implementations of HD-sEMG-based models is the low accuracy of inter-session and inter-subject classification. Variability between sessions can reach up to 40% due to the inherent temporal variability of HD-sEMG signals. Targeting this challenge, the paper introduces the MoEMba framework, a novel approach leveraging Selective StateSpace Models (SSMs) to enhance HD-sEMG-based gesture recognition. The MoEMba framework captures temporal dependencies and cross-channel interactions through channel attention techniques. Furthermore, wavelet feature modulation is integrated to capture multi-scale temporal and spatial relations, improving signal representation. Experimental results on the CapgMyo HD-sEMG dataset demonstrate that MoEMba achieves a balanced accuracy of 56.9%, outperforming its state-of-the-art counterparts. The proposed framework's robustness to session-to-session variability and its efficient handling of high-dimensional multivariate time series data highlight its potential for advancing HD-sEMG-powered HCI systems.

2407.00575 2026-02-25 cs.MA cs.LG cs.SY eess.SY

Learning to Control Unknown Strongly Monotone Games

Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

Comments Accepted for publication at IEEE Transactions on Control of Network Systems (TCNS)

详情
英文摘要

Consider a strongly monotone game where the players' utility functions include a reward function and a linear term for each dimension, with coefficients that are controlled by the manager. Gradient play converges to a unique Nash equilibrium (NE) that does not optimize the global objective. The global performance at NE can be improved by imposing linear constraints on the NE, also known as a generalized Nash equilibrium (GNE). We therefore want the manager to control the coefficients such that they impose the desired constraint on the NE. However, this requires knowing the players' rewards and action sets. Obtaining this game information is infeasible in a large-scale network and violates user privacy. To overcome this, we propose a simple algorithm that learns to shift the NE to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm converges with probability 1 to the set of GNE given by coupled linear constraints. We then prove an L2 convergence rate of near-$O(t^{-1/4})$.

2602.20857 2026-02-25 eess.SP cs.LG

Functional Continuous Decomposition

Teymur Aghayev

Comments 16 pages, 9 figures, 6 tables

详情
英文摘要

The analysis of non-stationary time-series data requires insight into its local and global patterns with physical interpretability. However, traditional smoothing algorithms, such as B-splines, Savitzky-Golay filtering, and Empirical Mode Decomposition (EMD), lack the ability to perform parametric optimization with guaranteed continuity. In this paper, we propose Functional Continuous Decomposition (FCD), a JAX-accelerated framework that performs parametric, continuous optimization on a wide range of mathematical functions. By using Levenberg-Marquardt optimization to achieve up to $C^1$ continuous fitting, FCD transforms raw time-series data into $M$ modes that capture different temporal patterns from short-term to long-term trends. Applications of FCD include physics, medicine, financial analysis, and machine learning, where it is commonly used for the analysis of signal temporal patterns, optimized parameters, derivatives, and integrals of decomposition. Furthermore, FCD can be applied for physical analysis and feature extraction with an average SRMSE of 0.735 per segment and a speed of 0.47s on full decomposition of 1,000 points. Finally, we demonstrate that a Convolutional Neural Network (CNN) enhanced with FCD features, such as optimized function values, parameters, and derivatives, achieved 16.8% faster convergence and 2.5% higher accuracy over a standard CNN.

2602.20842 2026-02-25 eess.SY cs.SY

Fast-Response Balancing Capacity of Alkaline Electrolyzers

Marvin Dorn, Julian Hoffmann, André Weber, Veit Hagenmeyer

详情
英文摘要

The energy transition requires flexible technologies to maintain grid stability, and electrolyzers are playing an increasingly important role in meeting this need. While previous studies often question the dynamic capabilities of large-scale alkaline electrolyzer systems, we assess their potential to provide balancing services using real manufacturer data. Unlike common approaches, we propose the decoupling between the total electrolyzer power and a smaller fractions of power actually offered on balancing markets. Adapting an existing methodology, we analyze alkaline electrolyzer systems and extend the assessment to Germany and Europe. Our results show that large-scale electrolyzers are technically capable of delivering fast-response balancing services, with significantly lower dynamic requirements than previously assumed. The planned electrolyzers in Germany could cover the entire balancing capacity market, potentially saving around 13 % of their electricity costs, excluding energy balancing revenues. The decoupling also resolves part of the trade-off for electrolyzer manufacturers, enabling the design of less dynamic but more stable systems.

2602.20823 2026-02-25 cs.SD eess.AS

Geometric Analysis of Speech Representation Spaces: Topological Disentanglement and Confound Detection

Bipasha Kashyap, Pubudu N. Pathirana

Comments Submitted to INTERSPEECH 2026

详情
英文摘要

Speech-based clinical tools are increasingly deployed in multilingual settings, yet whether pathological speech markers remain geometrically separable from accent variation remains unclear. Systems may misclassify healthy non-native speakers or miss pathology in multilingual patients. We propose a four-metric clustering framework to evaluate geometric disentanglement of emotional, linguistic, and pathological speech features across six corpora and eight dataset combinations. A consistent hierarchy emerges: emotional features form the tightest clusters (Silhouette 0.250), followed by pathological (0.141) and linguistic (0.077). Confound analysis shows pathological-linguistic overlap remains below 0.21, which is above the permutation null but bounded for clinical deployment. Trustworthiness analysis confirms embedding fidelity and robustness of the geometric conclusions. Our framework provides actionable guidelines for equitable and reliable speech health systems across diverse populations.