arXivDaily arXiv每日学术速递 周一至周五更新
2601.23201 2026-02-02 eess.IV cs.CV cs.LG

Scale-Cascaded Diffusion Models for Super-Resolution in Medical Imaging

Darshan Thaker, Mahmoud Mostapha, Radu Miron, Shihan Qiu, Mariappan Nadar

Comments Accepted at IEEE International Symposium for Biomedical Imaging (ISBI) 2026

详情
英文摘要

Diffusion models have been increasingly used as strong generative priors for solving inverse problems such as super-resolution in medical imaging. However, these approaches typically utilize a diffusion prior trained at a single scale, ignoring the hierarchical scale structure of image data. In this work, we propose to decompose images into Laplacian pyramid scales and train separate diffusion priors for each frequency band. We then develop an algorithm to perform super-resolution that utilizes these priors to progressively refine reconstructions across different scales. Evaluated on brain, knee, and prostate MRI data, our approach both improves perceptual quality over baselines and reduces inference time through smaller coarse-scale networks. Our framework unifies multiscale reconstruction and diffusion priors for medical image super-resolution.

2601.23196 2026-02-02 eess.AS

Beyond Omnidirectional: Neural Ambisonics Encoding for Arbitrary Microphone Directivity Patterns using Cross-Attention

Mikko Heikkinen, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen

Comments Accepted to ICASSP 2026

详情
英文摘要

We present a deep neural network approach for encoding microphone array signals into Ambisonics that generalizes to arbitrary microphone array configurations with fixed microphone count but varying locations and frequency-dependent directional characteristics. Unlike previous methods that rely only on array geometry as metadata, our approach uses directional array transfer functions, enabling accurate characterization of real-world arrays. The proposed architecture employs separate encoders for audio and directional responses, combining them through cross-attention mechanisms to generate array-independent spatial audio representations. We evaluate the method on simulated data in two settings: a mobile phone with complex body scattering, and a free-field condition, both with varying numbers of sound sources in reverberant environments. Evaluations demonstrate that our approach outperforms both conventional digital signal processing-based methods and existing deep neural network solutions. Furthermore, using array transfer functions instead of geometry as metadata input improves accuracy on realistic arrays.

2601.23160 2026-02-02 eess.SY cs.SY math.OC

Robust Control of Constrained Linear Systems using Online Convex Optimization and a Reference Governor

Marko Nonhoff, Mohammad Taher Al Torshan, Matthias A. Müller

Comments Presented at 2024 IEEE 63rd Conference on Decision and Control (CDC)

Journal ref 2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 6553-6559

详情
英文摘要

This article develops a control method for linear time-invariant systems subject to time-varying and a priori unknown cost functions, that satisfies state and input constraints, and is robust to exogenous disturbances. To this end, we combine the online convex optimization framework with a reference governor and a constraint tightening approach. The proposed framework guarantees recursive feasibility and robust constraint satisfaction. Its closed-loop performance is studied in terms of its dynamic regret, which is bounded linearly by the variation of the cost functions and the magnitude of the disturbances. The proposed method is illustrated by a numerical case study of a tracking control problem.

2601.23148 2026-02-02 eess.IV cs.LG

Compressed BC-LISTA via Low-Rank Convolutional Decomposition

Han Wang, Yhonatan Kvich, Eduardo Pérez, Florian Römer, Yonina C. Eldar

Comments Inverse Problems, Model Compression, Compressed Sensing, Deep Unrolling, Computational Imaging

详情
英文摘要

We study Sparse Signal Recovery (SSR) methods for multichannel imaging with compressed {forward and backward} operators that preserve reconstruction accuracy. We propose a Compressed Block-Convolutional (C-BC) measurement model based on a low-rank Convolutional Neural Network (CNN) decomposition that is analytically initialized from a low-rank factorization of physics-derived forward/backward operators in time delay-based measurements. We use Orthogonal Matching Pursuit (OMP) to select a compact set of basis filters from the analytic model and compute linear mixing coefficients to approximate the full model. We consider the Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) network as a representative example for which the C-BC-LISTA extension is presented. In simulated multichannel ultrasound imaging across multiple Signal-to-Noise Ratios (SNRs), C-BC-LISTA requires substantially fewer parameters and smaller model size than other state-of-the-art (SOTA) methods while improving reconstruction accuracy. In ablations over OMP, Singular Value Decomposition (SVD)-based, and random initializations, OMP-initialized structured compression performs best, yielding the most efficient training and the best performance.

2601.23119 2026-02-02 eess.SP cs.SY eess.SY

Interpolation Techniques for Fast Channel Estimation in Ray Tracing

Ruibin Chen, Jayadev Joy, Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Sundeep Rangan

Comments This is the authors accepted version of a paper published in the Proceedings of the 2024 58th Asilomar Conference on Signals, Systems, and Computers

Journal ref Proc. IEEE 58th Asilomar Conference on Signals, Systems, and Computers, 2024, pp. 1383-1388

详情
英文摘要

Ray tracing is increasingly utilized in wireless system simulations to estimate channel paths. In large-scale simulations with complex environments, ray tracing at high resolution can be computationally demanding. To reduce the computation, this paper presents a novel method for conducting ray tracing at a coarse set of reference points and interpolating the channels at other locations. The key insight is to interpolate the images of reflected points. In addition to the computational savings, the method directly captures the spherical nature of each wavefront enabling fast and accurate computation of channels using line-of-sight MIMO and other wide aperture techniques. Through empirical validation and comparison with exhaustive ray tracing, we demonstrate the efficacy and practicality of our approach in achieving high-fidelity channel predictions with reduced computational resources.

2601.23108 2026-02-02 eess.SY cs.SY

Energy Management Strategies for Electric Aircraft Charging Leveraging Active Landside Vehicle-to-Grid

Finn Vehlhaber, Mauro Salazar

详情
英文摘要

The deployment of medium-range battery electric aircraft is a promising pathway to improve the environmental footprint of air mobility. Yet such a deployment would be accompanied by significant electric power requirements at airports due to aircraft charging. Given the growing prevalence of electric vehicles and their bi-directional charging capabilities--so-called vehicle-to-grid (V2G)--we study energy buffer capabilities of parked electric vehicles to alleviate pressure on grid connections. To this end, we present energy management strategies for airports providing cost-optimal apron and landside V2G charge scheduling. Specifically, we first formulate the optimal energy management problem of joint aircraft charging and landside V2G coordination as a linear program, whereby we use partial differential equations to model the aggregated charging dynamics of the electric vehicle fleet. Second, we consider a shuttle flight network with a single hub of a large Dutch airline, real-world grid prices, and synthetic parking garage occupancy data to test our framework. Our results show that V2G at even a single airport can indeed reduce energy costs to charge the aircraft fleet: Compared to a baseline scenario without V2G, the proposed concept yields cost savings of up to 32%, depending on the schedule and amount of participating vehicles, and has other potential beneficial effects on the local power grid, e.g., the reduction of potential power peaks.

2601.23103 2026-02-02 eess.IV cs.CV

Vision-Language Controlled Deep Unfolding for Joint Medical Image Restoration and Segmentation

Ping Chen, Zicheng Huang, Xiangming Wang, Yungeng Liu, Bingyu Liang, Haijin Zeng, Yongyong Chen

Comments 18 pages, medical image

详情
英文摘要

We propose VL-DUN, a principled framework for joint All-in-One Medical Image Restoration and Segmentation (AiOMIRS) that bridges the gap between low-level signal recovery and high-level semantic understanding. While standard pipelines treat these tasks in isolation, our core insight is that they are fundamentally synergistic: restoration provides clean anatomical structures to improve segmentation, while semantic priors regularize the restoration process. VL-DUN resolves the sub-optimality of sequential processing through two primary innovations. (1) We formulate AiOMIRS as a unified optimization problem, deriving an interpretable joint unfolding mechanism where restoration and segmentation are mathematically coupled for mutual refinement. (2) We introduce a frequency-aware Mamba mechanism to capture long-range dependencies for global segmentation while preserving the high-frequency textures necessary for restoration. This allows for efficient global context modeling with linear complexity, effectively mitigating the spectral bias of standard architectures. As a pioneering work in the AiOMIRS task, VL-DUN establishes a new state-of-the-art across multi-modal benchmarks, improving PSNR by 0.92 dB and the Dice coefficient by 9.76\%. Our results demonstrate that joint collaborative learning offers a superior, more robust solution for complex clinical workflows compared to isolated task processing. The codes are provided in https://github.com/cipi666/VLDUN.

2601.23076 2026-02-02 eess.SP cs.SY eess.SY

Learning-Based Signal Recovery in Nonlinear Systems with Spectrally Separated Interference

Jayadev Joy, Sundeep Rangan

详情
英文摘要

Upper Mid-Band (FR3, 7-24 GHz) receivers for 6G must operate over wide bandwidths in dense spectral environments, making them particularly vulnerable to strong adjacent-band interference and front-end nonlinearities. While conventional linear receivers can suppress spectrally separated interferers under ideal hardware assumptions, receiver saturation and finite-resolution quantization cause nonlinear spectral leakage that severely degrades performance in practical wideband radios. We study the recovery of a desired signal from nonlinear receiver observations corrupted by a high-power out-of-band interferer. The receiver front-end is modeled as a smooth, memoryless nonlinearity followed by additive noise and optional quantization. To mitigate these nonlinear and quantization-induced distortions, we propose a learned multi-layer Vector Approximate Message Passing (LMLVAMP) algorithm that incorporates spectral priors with neural network based denoising. Simulation results demonstrate significant performance gains over conventional methods, particularly in high-interference regimes representative of FR3 coexistence scenarios.

2601.23004 2026-02-02 eess.AS

Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification

Krystof Novotny, Laureano Moro-Velázquez, Jiri Mekyska

Comments 5 pages, 3 figures, paper accepted for ICASSP 2026 conference

详情
英文摘要

Speech contains both acoustic and linguistic patterns that reflect cognitive decline, and therefore models describing only one domain cannot fully capture such complexity. This study investigates how early fusion (EF) of speech and its corresponding transcription text embeddings, with attention to encoder layer depth, can improve cognitive status classification. Using a DementiaBank-derived collection of recordings (1,629 speakers; cognitively normal controls$\unicode{x2013}$CN, Mild Cognitive Impairment$\unicode{x2013}$MCI, and Alzheimer's Disease and Related Dementias$\unicode{x2013}$ADRD), we extracted frame-aligned embeddings from different internal layers of wav2vec 2.0 or Whisper combined with DistilBERT or RoBERTa. Unimodal, EF and late fusion (LF) models were trained with a transformer classifier, optimized, and then evaluated across 10 seeds. Performance consistently peaked in mid encoder layers ($\sim$8$\unicode{x2013}$10), with the single best F1 at Whisper + RoBERTa layer 9 and the best log loss at Whisper + DistilBERT layer 10. Acoustic-only models consistently outperformed text-only variants. EF boosts discrimination for genuinely acoustic embeddings, whereas LF improves probability calibration. Layer choice critically shapes clinical multimodal synergy.

2601.22989 2026-02-02 eess.SP

Fluid Antenna Systems under Channel Uncertainty and Hardware Impairments: Trends, Challenges, and Future Research Directions

Saeid Pakravan, Mohsen Ahmadzadeh, Ming Zeng, Wessam Ajib, Ji Wang, Xingwang Li

Comments 12 pages

详情
英文摘要

Fluid antenna systems (FAS) have recently emerged as a promising paradigm for achieving spatially reconfigurable, compact, and energy-efficient wireless communications in beyond fifth-generation (B5G) and sixth-generation (6G) networks. By dynamically repositioning a liquid-based radiating element within a confined physical structure, FAS can exploit spatial diversity without relying on multiple fixed antenna elements. This spatial mobility provides a new degree of freedom for mitigating channel fading and interference, while maintaining low hardware complexity and power consumption. However, the performance of FAS in realistic deployments is strongly affected by channel uncertainty, hardware nonidealities, and mechanical constraints, all of which can substantially deviate from idealized analytical assumptions. This paper presents a comprehensive survey of the operation and design of FAS under such practical considerations. Key aspects include the characterization of spatio-temporal channel uncertainty, analysis of hardware and mechanical impairments such as RF nonlinearity, port coupling, and fluid response delay, as well as the exploration of robust design and learning-based control strategies to enhance system reliability. Finally, open research directions are identified, aiming to guide future developments toward robust, adaptive, and cross-domain FAS design for next-generation wireless networks.

2601.22938 2026-02-02 cs.CR cs.AI eess.IV eess.SP

A Real-Time Privacy-Preserving Behavior Recognition System via Edge-Cloud Collaboration

Huan Song, Shuyu Tian, Junyi Hao, Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li

详情
英文摘要

As intelligent sensing expands into high-privacy environments such as restrooms and changing rooms, the field faces a critical privacy-security paradox. Traditional RGB surveillance raises significant concerns regarding visual recording and storage, while existing privacy-preserving methods-ranging from physical desensitization to traditional cryptographic or obfuscation techniques-often compromise semantic understanding capabilities or fail to guarantee mathematical irreversibility against reconstruction attacks. To address these challenges, this study presents a novel privacy-preserving perception technology based on the AI Flow theoretical framework and an edge-cloud collaborative architecture. The proposed methodology integrates source desensitization with irreversible feature mapping. Leveraging Information Bottleneck theory, the edge device performs millisecond-level processing to transform raw imagery into abstract feature vectors via non-linear mapping and stochastic noise injection. This process constructs a unidirectional information flow that strips identity-sensitive attributes, rendering the reconstruction of original images impossible. Subsequently, the cloud platform utilizes multimodal family models to perform joint inference solely on these abstract vectors to detect abnormal behaviors. This approach fundamentally severs the path to privacy leakage at the architectural level, achieving a breakthrough from video surveillance to de-identified behavior perception and offering a robust solution for risk management in high-sensitivity public spaces.

2601.22915 2026-02-02 eess.SP

Intrinsic MIMO Particle Communication Channel with Random Advection

Fatih Merdan, Ozgur B. Akan

Comments 6 pages, 5 figures

详情
英文摘要

In this work, receiver diversity in advection-dominated diffusion-advection channels is investigated. Strong directed flow fundamentally alters the communication-theoretic properties of molecular communication systems (MC). Specifically, advection preserves the temporal ordering and shape of transmitted pulses, enabling pulse-based and higher-order modulation schemes that are typically infeasible in purely diffusive environments. Focusing on a single transmitter and a single type of information molecule, it is demonstrated that spatially distributed receivers can observe distinct realizations of the same transmitted signal, giving rise to diversity gain. Several receiver combining strategies are evaluated and shown to improve detection performance compared to single-receiver operation, particularly in low-to-moderate signal-to-noise ratio (SNR) regimes. The results provide a structured framework for understanding receiver-side diversity in molecular communication, highlighting the role of advection as a key enabler for reliable pulse-based signaling. This perspective establishes a foundation for future studies on advanced modulation, joint equalization and detection, and multi-molecule MIMO extensions that can further enhance the performance and physical applicability of MC systems.

2601.22878 2026-02-02 eess.IV cs.CV

Development of Domain-Invariant Visual Enhancement and Restoration (DIVER) Approach for Underwater Images

Rajini Makam, Sharanya Patil, Dhatri Shankari T M, Suresh Sundaram, Narasimhan Sundararajan

Comments Submitted to IEEE Journal of Oceanic Engineering

详情
英文摘要

Underwater images suffer severe degradation due to wavelength-dependent attenuation, scattering, and illumination non-uniformity that vary across water types and depths. We propose an unsupervised Domain-Invariant Visual Enhancement and Restoration (DIVER) framework that integrates empirical correction with physics-guided modeling for robust underwater image enhancement. DIVER first applies either IlluminateNet for adaptive luminance enhancement or a Spectral Equalization Filter for spectral normalization. An Adaptive Optical Correction Module then refines hue and contrast using channel-adaptive filtering, while Hydro-OpticNet employs physics-constrained learning to compensate for backscatter and wavelength-dependent attenuation. The parameters of IlluminateNet and Hydro-OpticNet are optimized via unsupervised learning using a composite loss function. DIVER is evaluated on eight diverse datasets covering shallow, deep, and highly turbid environments, including both naturally low-light and artificially illuminated scenes, using reference and non-reference metrics. While state-of-the-art methods such as WaterNet, UDNet, and Phaseformer perform reasonably in shallow water, their performance degrades in deep, unevenly illuminated, or artificially lit conditions. In contrast, DIVER consistently achieves best or near-best performance across all datasets, demonstrating strong domain-invariant capability. DIVER yields at least a 9% improvement over SOTA methods in UCIQE. On the low-light SeaThru dataset, where color-palette references enable direct evaluation of color restoration, DIVER achieves at least a 4.9% reduction in GPMAE compared to existing methods. Beyond visual quality, DIVER also improves robotic perception by enhancing ORB-based keypoint repeatability and matching performance, confirming its robustness across diverse underwater environments.

2601.22873 2026-02-02 eess.AS cs.AI cs.CL cs.SD

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Li Zhou, Hao Jiang, Junjie Li, Tianrui Wang, Haizhou Li

Comments Activation Steering; Emotion-Aware TTS; Speech Synthesis; Accepted by ICASSP 2026

详情
英文摘要

Achieving precise and controllable emotional expression is crucial for producing natural and context-appropriate speech in text-to-speech (TTS) synthesis. However, many emotion-aware TTS systems, including large language model (LLM)-based designs, rely on scaling fixed emotion embeddings or external guidance, limiting their ability to model emotion-specific latent characteristics. To address this gap, we present EmoShift, a lightweight activation-steering framework incorporating a EmoSteer layer, which learns a steering vector for each target emotion in the output embedding space to capture its latent offset and maintain stable, appropriate expression across utterances and categories. With only 10M trainable parameters,less than 1/30 of full fine-tuning, EmoShift outperforms zero-shot and fully fine-tuned baselines in objective and subjective evaluations, enhancing emotional expressiveness while preserving naturalness and speaker similarity. Further analysis confirms the proposed EmoSteer layer's effectiveness and reveals its potential for controllable emotional intensity in speech synthesis.

2601.22779 2026-02-02 eess.AS cs.SD

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization

Genshun Wan, Wenhui Zhang, Jing-Xuan Zhang, Shifu Xiong, Jianqing Gao, Zhongfu Ye

Comments accepted to ICASSP 2026

详情
英文摘要

Recent advances have demonstrated the potential of decoderonly large language models (LLMs) for automatic speech recognition (ASR). However, enabling streaming recognition within this framework remains a challenge. In this work, we propose a novel streaming ASR approach that integrates a read/write policy network with monotonic chunkwise attention (MoChA) to dynamically segment speech embeddings. These segments are interleaved with label sequences during training, enabling seamless integration with the LLM. During inference, the audio stream is buffered until the MoChA module triggers a read signal, at which point the buffered segment together with the previous token is fed into the LLM for the next token prediction. We also introduce a minimal-latency training objective to guide the policy network toward accurate segmentation boundaries. Furthermore, we adopt a joint training strategy in which a non-streaming LLM-ASR model and our streaming model share parameters. Experiments on the AISHELL-1 and AISHELL-2 Mandarin benchmarks demonstrate that our method consistently outperforms recent streaming ASR baselines, achieving character error rates of 5.1% and 5.5%, respectively. The latency optimization results in a 62.5% reduction in average token generation delay with negligible impact on recognition accuracy

2601.22765 2026-02-02 eess.SP cs.LG

Bayesian Matrix Completion Under Geometric Constraints

Rohit Varma Chiluvuri, Santosh Nannuru

Comments 4 pages, 3 figures, Accepted to ICASSP 2026

详情
英文摘要

The completion of a Euclidean distance matrix (EDM) from sparse and noisy observations is a fundamental challenge in signal processing, with applications in sensor network localization, acoustic room reconstruction, molecular conformation, and manifold learning. Traditional approaches, such as rank-constrained optimization and semidefinite programming, enforce geometric constraints but often struggle under sparse or noisy conditions. This paper introduces a hierarchical Bayesian framework that places structured priors directly on the latent point set generating the EDM, naturally embedding geometric constraints. By incorporating a hierarchical prior on latent point set, the model enables automatic regularization and robust noise handling. Posterior inference is performed using a Metropolis-Hastings within Gibbs sampler to handle coupled latent point posterior. Experiments on synthetic data demonstrate improved reconstruction accuracy compared to deterministic baselines in sparse regimes.

2601.22732 2026-02-02 eess.IV cs.CV

Active Learning-Driven Lightweight YOLOv9: Enhancing Efficiency in Smart Agriculture

Hung-Chih Tu, Bo-Syun Chen, Yun-Chien Cheng

详情
英文摘要

This study addresses the demand for real-time detection of tomatoes and tomato flowers by agricultural robots deployed on edge devices in greenhouse environments. Under practical imaging conditions, object detection systems often face challenges such as large scale variations caused by varying camera distances, severe occlusion from plant structures, and highly imbalanced class distributions. These factors make conventional object detection approaches that rely on fully annotated datasets difficult to simultaneously achieve high detection accuracy and deployment efficiency. To overcome these limitations, this research proposes an active learning driven lightweight object detection framework, integrating data analysis, model design, and training strategy. First, the size distribution of objects in raw agricultural images is analyzed to redefine an operational target range, thereby improving learning stability under real-world conditions. Second, an efficient feature extraction module is incorporated to reduce computational cost, while a lightweight attention mechanism is introduced to enhance feature representation under multi-scale and occluded scenarios. Finally, an active learning strategy is employed to iteratively select high-information samples for annotation and training under a limited labeling budget, effectively improving the recognition performance of minority and small-object categories. Experimental results demonstrate that, while maintaining a low parameter count and inference cost suitable for edge-device deployment, the proposed method effectively improves the detection performance of tomatoes and tomato flowers in raw images. Under limited annotation conditions, the framework achieves an overall detection accuracy of 67.8% mAP, validating its practicality and feasibility for intelligent agricultural applications.

2601.21706 2026-02-02 cs.LG cs.SY eess.SY

SmartMeterFM: Unifying Smart Meter Data Generative Tasks Using Flow Matching Models

Nan Lin, Yanbo Wang, Jacco Heres, Peter Palensky, Pedro P. Vergara

Comments 10 pages, 6 figures, 6 tables

详情
英文摘要

Smart meter data is the foundation for planning and operating the distribution network. Unfortunately, such data are not always available due to privacy regulations. Meanwhile, the collected data may be corrupted due to sensor or transmission failure, or it may not have sufficient resolution for downstream tasks. A wide range of generative tasks is formulated to address these issues, including synthetic data generation, missing data imputation, and super-resolution. Despite the success of machine learning models on these tasks, dedicated models need to be designed and trained for each task, leading to redundancy and inefficiency. In this paper, by recognizing the powerful modeling capability of flow matching models, we propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation. The proposed flow matching models are trained to generate challenging, high-dimensional time series data, specifically monthly smart meter data at a 15 min resolution. By viewing different generative tasks as distinct forms of partial data observations and injecting them into the generation process, we unify tasks such as imputation and super-resolution with a single model, eliminating the need for re-training. The data generated by our model not only are consistent with the given observations but also remain realistic, showing better performance against interpolation and other machine learning based baselines dedicated to the tasks.

2601.12526 2026-02-02 eess.IV cs.CV

Deep Lightweight Unrolled Network for High Dynamic Range Modulo Imaging

Brayan Monroy, Jorge Bacca

详情
英文摘要

Modulo-Imaging (MI) offers a promising alternative for expanding the dynamic range of images by resetting the signal intensity when it reaches the saturation level. Subsequently, high-dynamic range (HDR) modulo imaging requires a recovery process to obtain the HDR image. MI is a non-convex and ill-posed problem where recent recovery networks suffer in high-noise scenarios. In this work, we formulate the HDR reconstruction task as an optimization problem that incorporates a deep prior and subsequently unrolls it into an optimization-inspired deep neural network. The network employs a lightweight convolutional denoiser for fast inference with minimal computational overhead, effectively recovering intensity values while mitigating noise. Moreover, we introduce the Scaling Equivariance term that facilitates self-supervised fine-tuning, thereby enabling the model to adapt to new modulo images that fall outside the original training distribution. Extensive evaluations demonstrate the superiority of our method compared to state-of-the-art recovery algorithms in terms of performance and quality.

2601.00734 2026-02-02 eess.SP

Conformal Reconfigurable Intelligent Surfaces: A Cylindrical Geometry Perspective

Filippo Pepe, Ivan Iudice, Giuseppe Castaldi, Marco Di Renzo, Vincenzo Galdi

Comments 20 pages, 9 figures

详情
英文摘要

Curved reconfigurable intelligent surfaces (RISs) represent a promising frontier for next-generation wireless communication, enabling adaptive wavefront control on nonplanar platforms such as unmanned aerial vehicles and urban infrastructure. This work presents a systematic investigation of cylindrical RISs, progressing from idealized surface-impedance synthesis to practical implementations based on simple one-bit meta-atoms. Exact analytical and geometrical-optics-based models are first developed to explore fundamental design limits, followed by a semi-analytical formulation tailored to discrete, reconfigurable architectures. This model enables efficient beam synthesis using both evolutionary optimization and low-complexity strategies, including the minimum power distortionless response method, and is validated through full-wave simulations. Results confirm that one-bit RISs can achieve directive scattering with manageable sidelobe levels and minimal hardware complexity. These findings establish the viability of cylindrical RISs and open the door to their integration into dual-use wireless platforms for real-world communication scenarios.

2601.00159 2026-02-02 eess.SP

AI-Driven Channel State Information (CSI) Extrapolation for 6G: Current Situations, Challenges and Future Research

Yuan Gao, Zichen Lu, Xinyi Wu, Wenjun Yu, Shengli Liu, Jianbo Du, Yanliang Jin, Shunqing Zhang, Xiaoli Chu, Shugong Xu

Comments This manuscript has been accepted by IEEE Communications Surveys and Tutorials

详情
英文摘要

CSI extrapolation is an effective method for acquiring channel state information (CSI), essential for optimizing performance of sixth-generation (6G) communication systems. Traditional channel estimation methods face scalability challenges due to the surging overhead in emerging high-mobility, extremely large-scale multiple-input multiple-output (EL-MIMO), and multi-band systems. CSI extrapolation techniques mitigate these challenges by using partial CSI to infer complete CSI, significantly reducing overhead. Despite growing interest, a comprehensive review of state-of-the-art (SOTA) CSI extrapolation techniques is lacking. This paper addresses this gap by comprehensively reviewing the current status, challenges, and future directions of CSI extrapolation for the first time. Firstly, we analyze the performance metrics specific to CSI extrapolation in 6G, including extrapolation accuracy, adaption to dynamic scenarios and algorithm costs. We then review both model-driven and artificial intelligence (AI)-driven approaches for time, frequency, antenna, and multi-domain CSI extrapolation. Key insights and takeaways from these methods are summarized. Given the promise of AI-driven methods in meeting performance requirements, we also examine the open-source channel datasets and simulators that could be used to train high-performance AI-driven CSI extrapolation models. Finally, we discuss the critical challenges of the existing research and propose perspective research opportunities.

2512.21572 2026-02-02 cs.LG eess.SP

RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models

Anthony Bolton, Wuyang Zhou, Zehua Chen, Giorgos Iacovides, Danilo Mandic

详情
英文摘要

Financial time series forecasting is particularly challenging for transformer-based time series foundation models (TSFMs) due to non-stationarity, heavy-tailed distributions, and high-frequency noise present in data. Low-rank adaptation (LoRA) has become a popular parameter-efficient method for adapting pre-trained TSFMs to downstream data domains. However, it still underperforms in financial data, as it preserves the network architecture and training objective of TSFMs rather than complementing the foundation model. To further enhance TSFMs, we propose a novel refinement module, RefineBridge, built upon a tractable Schrödinger Bridge (SB) generative framework. Given the forecasts of TSFM as generative prior and the observed ground truths as targets, RefineBridge learns context-conditioned stochastic transport maps to improve TSFM predictions, iteratively approaching the ground-truth target from even a low-quality prior. Simulations on multiple financial benchmarks demonstrate that RefineBridge consistently improves the performance of state-of-the-art TSFMs across different prediction horizons.

2512.17937 2026-02-02 eess.AS cs.SD

LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

Ram C. M. C. Shekar, Iván López-Espejo

Comments Accepted to ICASSP 2026

详情
英文摘要

We present LIWhiz, a non-intrusive lyric intelligibility prediction system submitted to the ICASSP 2026 Cadenza Challenge. LIWhiz leverages Whisper for robust feature extraction and a trainable back-end for score prediction. Tested on the Cadenza Lyric Intelligibility Prediction (CLIP) evaluation set, LIWhiz achieves a root mean square error (RMSE) of 27.07%, a 22.4% relative RMSE reduction over the STOI-based baseline, yielding a substantial improvement in normalized cross-correlation.

2511.05771 2026-02-02 eess.SP

Environment-Aware MIMO Channel Estimation in Pilot-Constrained Upper Mid-Band Systems

Seyed Alireza Javid, Nuria González-Prelcic

Comments Accepted from ICASSP 2026

详情
英文摘要

Accurate multiple-input multiple-output (MIMO) channel estimation is critical for next-generation wireless systems, enabling enhanced communication and sensing performance. Traditional model-based channel estimation methods suffer, however, from performance degradation in complex environments with a limited number of pilots, while purely data-driven approaches lack physical interpretability, require extensive data collection, and are usually site-specific. This paper presents a novel physics-informed neural network (PINN) framework that combines model-based channel estimation with a deep network to exploit prior information about the propagation environment and achieve superior performance under pilot-constrained scenarios. The proposed approach employs an enhanced U-Net architecture with cross-attention mechanisms to fuse initial channel estimates with received signal strength (RSS) maps to provide refined channel estimates. Comprehensive evaluation using realistic ray-tracing data from urban environments demonstrates significant performance improvements, achieving over 5 dB gain in normalized mean squared error (NMSE) compared to state-of-the-art methods, with particularly strong performance in pilot-limited scenarios and robustness across different frequencies and environments with only minimal fine-tuning. The proposed framework maintains practical computational complexity, making it viable for massive MIMO systems in upper mid-band frequencies.

2511.01431 2026-02-02 eess.SP

Robust Radar Mounting Angle Estimation in Operational Driving Conditions

Simin Zhu, Satish Ravindran, Lihui Chen, Alexander Yarovoy, Francesco Fioranelli

Comments 11 pages, 6 figures, under review at IEEE Transactions on Radar Systems

详情
英文摘要

The robust estimation of the mounting angle for millimeter-wave automotive radars installed on moving vehicles is investigated. We propose a novel signal processing pipeline that combines radar and inertial measurement unit (IMU) data to achieve accurate and reliable performance in realistic driving scenarios. Unlike previous studies, the method employs neural networks to process sparse and noisy radar measurements, reject detections from moving objects, and estimate radar motion. In addition, a measurement model is introduced to correct IMU bias and scale factor errors. Using vehicle kinematics, the radar mounting angle is then computed from the estimated radar motion and the vehicle's yaw rate. To benchmark performance, the proposed approach is comprehensively compared with two problem formulations and four estimation techniques reported in the literature. Validation is carried out on the challenging RadarScenes dataset, covering over 79 km of real-world driving. Results show that the proposed method achieves state-of-the-art accuracy and robustness, with reliable estimates obtained within approximately 25 seconds of driving. To the best of our knowledge, this is the first study to demonstrate that automotive radar mounting angles can be accurately estimated in complex, real traffic conditions, without requiring controlled environments, dedicated targets, or specially designed driving routes.

2509.17797 2026-02-02 eess.SP

SSNet: Flexible and robust channel extrapolation for fluid antenna systems enabled by an self-supervised learning framework

Yuan Gao, Yiming Liu, Runze Yu, Shengli Liu, Yanliang Jin, Shunqing Zhang, Shugong Xu, Xiaoli Chu

详情
英文摘要

Fluid antenna systems (FAS) signify a pivotal advancement in 6G communication by enhancing spectral efficiency and robustness. However, obtaining accurate channel state information (CSI) in FAS poses challenges due to its complex physical structure. Traditional methods, such as pilot-based interpolation and compressive sensing, are not only computationally intensive but also lack adaptability. Current extrapolation techniques relying on rigid parametric models do not accommodate the dynamic environment of FAS, while data-driven deep learning approaches demand extensive training and are vulnerable to noise and hardware imperfections. To address these challenges, this paper introduces a novel self-supervised learning network (SSNet) designed for efficient and adaptive channel extrapolation in FAS. We formulate the problem of channel extrapolation in FAS as an image reconstruction task. Here, a limited number of unmasked pixels (representing the known CSI of the selected ports) are used to extrapolate the masked pixels (the CSI of unselected ports). SSNet capitalizes on the intrinsic structure of FAS channels, learning generalized representations from raw CSI data, thus reducing dependency on large labelled datasets. For enhanced feature extraction and noise resilience, we propose a mix-of-expert (MoE) module. In this setup, multiple feedforward neural networks (FFNs) operate in parallel. The outputs of the MoE module are combined using a weighted sum, determined by a gating function that computes the weights of each FFN using a softmax function. Extensive simulations validate the superiority of the proposed model. Results indicate that SSNet significantly outperforms benchmark models, such as AGMAE and long short-term memory (LSTM) networks by using a much smaller labelled dataset.

2509.15804 2026-02-02 cs.SD eess.AS

CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures

Xueping Zhang, Yechen Wang, Linxi Li, Liwei Jin, Ming Li

Comments accepted at ICASSP 2026

详情
英文摘要

Component-level audio Spoofing (Comp-Spoof) targets a new form of audio manipulation where only specific components of a signal, such as speech or environmental sound, are forged or substituted while other components remain genuine. Existing anti-spoofing datasets and methods treat an utterance or a segment as entirely bona fide or entirely spoofed, and thus cannot accurately detect component-level spoofing. To address this, we construct a new dataset, CompSpoof, covering multiple combinations of bona fide and spoofed speech and environmental sound. We further propose a separation-enhanced joint learning framework that separates audio components apart and applies anti-spoofing models to each one. Joint learning is employed, preserving information relevant for detection. Extensive experiments demonstrate that our method outperforms the baseline, highlighting the necessity of separate components and the importance of detecting spoofing for each component separately. Datasets and code are available at: https://github.com/XuepingZhang/CompSpoof.

2509.01125 2026-02-02 eess.SP

Enabling 6G Through Multi-Domain Channel Extrapolation: Opportunities and Challenges of Generative Artificial Intelligence

Yuan Gao, Zichen Lu, Yifan Wu, Yanliang Jin, Shunqing Zhang, Xiaoli Chu, Shugong Xu, Cheng-Xiang Wang

详情
英文摘要

Channel extrapolation has attracted wide attention due to its potential to acquire channel state information (CSI) with high accuracy and minimal overhead. This is becoming increasingly crucial as the sixth-generation (6G) mobile networks aim to support complex scenarios, for example, high-mobility communications utilizing ultra-massive multiple-input multiple-output (MIMO) technologies and broad spectrum bands, necessitating multi-domain channel extrapolation. Current research predominantly addresses channel extrapolation within a single domain, lacking a comprehensive approach to multi-domain channel extrapolation. To bridge the gap, we propose the concept of multi-domain channel extrapolation, detailing the essential performance requirements for 6G networks. These include precise channel extrapolation, adaptability to varying scenarios, and manageable computational complexity during both training and inference stages. In light of these requirements, we elaborate the potential and challenges of incorporating generative artificial intelligence (GAI)-based models for effective multi-domain channel extrapolation. Given the ability of the Transformer to capture long-range dependencies and hidden patterns, we propose a novel Transformer encoder-like model by eliminating the positional encoding module and replacing the original multi-head attention with a multilayer perceptron (MLP) for multi-domain channel extrapolation. Simulation results indicate that this model surpasses existing baseline models in terms of extrapolation accuracy and inference speed. Ablation studies further demonstrate the effectiveness of the module design of the proposed design. Finally, we pose several open questions for the development of practical GAI-based multi-domain channel extrapolation models, including the issues of explainability, generalization, and dataset collection.

2508.14798 2026-02-02 cs.AR eess.SP

ListenToJESD204B: A Lightweight Open-Source JESD204B IP Core for FPGA-Based Ultrasound Acquisition systems

Soumyo Bhattacharjee, Federico Villani, Christian Vogt, Andrea Cossettini, Luca Benini

Comments This work has been accepted for publication in IEEE IWASI Conference proceedings. The final published version will be available via IEEE Xplore

详情
英文摘要

The demand for hundreds of tightly synchronized channels operating at tens of MSPS in ultrasound systems exceeds conventional low-voltage differential signaling links' bandwidth, pin count, and latency. Although the JESD204B serial interface mitigates these limitations, commercial FPGA IP cores are proprietary, costly, and resource-intensive. We present ListenToJESD204B, an open-source receiver IP core released under a permissive Solderpad 0.51 license for AMD Xilinx Zynq UltraScale+ devices. Written in synthesizable SystemVerilog, the core supports four GTH/GTY lanes at 12.8 Gb/s and provides cycle-accurate AXI-Stream data alongside deterministic Subclass~1 latency. It occupies only 107 configurable logic blocks (approximately 437 LUTs), representing a 79\% reduction compared to comparable commercially available IP. A modular data path featuring per-lane elastic buffers, SYSREF-locked LMFC generation, and optional LFSR descrambling facilitates scaling to high lane counts. We verified protocol compliance through simulation against the Xilinx JESD204C IP in JESD204B mode and on hardware using TI AFE58JD48 ADCs. Block stability was verified by streaming 80 MSPS, 16-bit samples over two 12.8 Gb/s links for 30 minutes with no errors.

2506.10754 2026-02-02 cs.SD cs.AI eess.AS

BNMusic: Blending Environmental Noises into Personalized Music

Chi Zuo, Martin B. Møller, Pablo Martínez-Nuevo, Huayang Huang, Yu Wu, Ye Zhu

Comments This paper has been accepted by NeurIPS 2025

详情
英文摘要

While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other dominant yet less intrusive sounds. However, misalignment between the dominant sound and the noise-such as mismatched downbeats-often requires an excessive volume increase to achieve effective masking. Motivated by recent advances in cross-modal generation, in this work, we introduce an alternative method to acoustic masking, aiming to reduce the noticeability of environmental noises by blending them into personalized music generated based on user-provided text prompts. Following the paradigm of music generation using mel-spectrogram representations, we propose a Blending Noises into Personalized Music (BNMusic) framework with two key stages. The first stage synthesizes a complete piece of music in a mel-spectrogram representation that encapsulates the musical essence of the noise. In the second stage, we adaptively amplify the generated music segment to further reduce noise perception and enhance the blending effectiveness, while preserving auditory quality. Our experiments with comprehensive evaluations on MusicBench, EPIC-SOUNDS, and ESC-50 demonstrate the effectiveness of our framework, highlighting the ability to blend environmental noise with rhythmically aligned, adaptively amplified, and enjoyable music segments, minimizing the noticeability of the noise, thereby improving overall acoustic experiences. Project page: https://d-fas.github.io/BNMusic_page/.