arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 215
2603.15566 2026-03-17 cs.SE cs.AI cs.SY eess.SY

Lore: Repurposing Git Commit Messages as a Structured Knowledge Protocol for AI Coding Agents

Ivan Stetsenko

Comments 8 pages, 1 figure, 1 table. Preprint available at https://doi.org/10.5281/zenodo.19051840

详情
英文摘要

As AI coding agents become both primary producers and consumers of source code, the software industry faces an accelerating loss of institutional knowledge. Each commit captures a code diff but discards the reasoning behind it - the constraints, rejected alternatives, and forward-looking context that shaped the decision. I term this discarded reasoning the Decision Shadow. This paper proposes Lore, a lightweight protocol that restructures commit messages - using native git trailers - into self-contained decision records carrying constraints, rejected alternatives, agent directives, and verification metadata. Lore requires no infrastructure beyond git, is queryable via a standalone CLI tool, and is discoverable by any agent capable of running shell commands. The paper formalizes the protocol, compares it against five competing approaches, stress-tests it against its strongest objections, and outlines an empirical validation path.

2603.15516 2026-03-17 eess.AS

spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender

Simon Devauchelle, David Doukhan, Rémi Uro, Lucas Ondel Yang, Valentin Pelloin, Olympia Imbert-Brégégère, Véronique Lefort, Kévin Picard, Emeline Seignobos, Albert Rilliard

Comments 16 pages, 3 figures, to be published in the Fifteenth International Conference on Language Resources and Evaluation (LREC 2026)

详情
英文摘要

We present spINAch, a large diachronic corpus of French speech from radio and television archives, balanced by speakers' gender, age (20-95 years old), and spanning 60 years from 1955 to 2015. The dataset includes over 320 hours of recordings from more than two thousand speakers. The methodology for building the corpus is described, focusing on the quality of collected samples in acoustic terms. The data were automatically transcribed and phonetically aligned to allow studies at a phonemic level. More than 3 million oral vowels have been analyzed to propose their fundamental frequency and formants. The corpus, available to the community for research purposes, is valuable for describing the evolution of Parisian French through the representation of gender and age. The presented analyses also demonstrate that the diachronic nature of the corpus allows the observation of various phonetic phenomena, such as the evolution of voice pitch over time (which does not differ by gender in our data) and the neutralization of the /a/-/$a$/ opposition in Parisian French during this period.

2603.15475 2026-03-17 cs.CV cs.LG cs.RO eess.IV

Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation

Yuanfan Zheng, Kunyu Peng, Xu Zheng, Kailun Yang

Comments Accepted to CVPR 2026. The code is available at https://github.com/zyfone/EDA-PSeg

详情
英文摘要

Cross-domain panoramic semantic segmentation has attracted growing interest as it enables comprehensive 360° scene understanding for real-world applications. However, it remains particularly challenging due to severe geometric Field of View (FoV) distortions and inconsistent open-set semantics across domains. In this work, we formulate an open-set domain adaptation setting, and propose Extrapolative Domain Adaptive Panoramic Segmentation (EDA-PSeg) framework that trains on local perspective views and tests on full 360° panoramic images, explicitly tackling both geometric FoV shifts across domains and semantic uncertainty arising from previously unseen classes. To this end, we propose the Euler-Margin Attention (EMA), which introduces an angular margin to enhance viewpoint-invariant semantic representation, while performing amplitude and phase modulation to improve generalization toward unseen classes. Additionally, we design the Graph Matching Adapter (GMA), which builds high-order graph relations to align shared semantics across FoV shifts while effectively separating novel categories through structural adaptation. Extensive experiments on four benchmark datasets under camera-shift, weather-condition, and open-set scenarios demonstrate that EDA-PSeg achieves state-of-the-art performance, robust generalization to diverse viewing geometries, and resilience under varying environmental conditions. The code is available at https://github.com/zyfone/EDA-PSeg.

2603.15468 2026-03-17 cs.IT eess.SP math.IT

DMD Prediction of MIMO Channel Using Tucker Decomposition

Irina Kopnina, Dmitry Artemasov, Sergey Matveev

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Accurate channel state information (CSI) prediction is crucial for next-generation multiple-input multiple-output (MIMO) communication systems. Classical prediction methods often become inefficient for high-dimensional and rapidly time-varying channels. To improve prediction efficiency, it is essential to exploit the inherent low-rank tensor structure of the MIMO channel. Motivated by this observation, we propose a dynamic mode decomposition (DMD)-based prediction framework operating on the low-dimensional core tensors obtained via a Tucker decomposition. The proposed method predicts reduced-order channel cores, significantly lowering computational complexity. Simulation results demonstrate that the proposed approach preserves the dominant channel dynamics and achieves high prediction accuracy.

2603.15440 2026-03-17 cs.SD cs.AI eess.AS

Music Genre Classification: A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches

Sachin Prajuli, Abhishek Karna, OmPrakash Dhakl

Comments 8 pages

详情
英文摘要

Automatic music genre classification is a long-standing challenge in Music Information Retrieval (MIR); work on non-Western music traditions remains scarce. Nepali music encompasses culturally rich and acoustically diverse genres--from the call-and-response duets of Lok Dohori to the rhythmic poetry of Deuda and the distinctive melodies of Tamang Selo--that have not been addressed by existing classification systems. In this paper, we construct a novel dataset of approximately 8,000 labeled 30-second audio clips spanning eight Nepali music genres and conduct a systematic comparison of nine classification models across two paradigms. Five classical machine learning classifiers (Logistic Regression, SVM, KNN, Random Forest, and XGBoost) are trained on 51 hand-crafted audio features extracted via Librosa, while four deep learning architectures (CNN, RNN, parallel CNN-RNN, and sequential CNN followed by RNN) operate on Mel spectrograms of dimension 640 x 128. Our experiments reveal that the sequential Convolutional Recurrent Neural Network (CRNN)--in which convolutional layers feed into an LSTM--achieves the highest accuracy of 84%, substantially outperforming both the best classical models (Logistic Regression and XGBoost, both at 71%) and all other deep architectures. We provide per-class precision, recall, F1-score, confusion matrices, and ROC analysis for every model, and offer a culturally grounded interpretation of misclassification patterns that reflects genuine overlaps in Nepal's musical traditions.

2603.15399 2026-03-17 eess.SY cs.SY

Spatial Characterization of Sub-Synchronous Oscillations Using Black-Box IBR Models

Muhammad Sharjeel Javaid, Gabriel Covarrubias Maureira, Ambuj Gupta, Debraj Bhattacharjee, Jianli Gao, Balarko Chaudhuri, Mark O'Malley

Comments Accepted for IEEE PES General Meeting 2026, Montreal

详情
英文摘要

Power systems with high penetration of inverter-based resources (IBRs) are prone to sub-synchronous oscillations (SSO). The opaqueness of vendor-specific IBR models limits the ability to predict the severity and the spread of SSO. This paper demonstrates that black-box IBR models estimated through frequency-domain identification techniques, along with dynamic network model can replicate the actual oscillatory behavior. The estimated IBR models are validated against actual IBR models in a closed-loop multi-IBR test system through modal analysis by comparing closed-loop eigenvalues, and participation factors. Furthermore, using output-observable right eigenvectors, spatial heatmaps are developed to visualize the spread and severity of dominant SSO modes. The case studies on the 11-bus and 39-bus test systems confirm that even with the estimated IBR models, the regions susceptible to SSO can be identified in IBR-dominated power systems.

2603.15394 2026-03-17 eess.SY cs.SY

Matched Filter-Based Molecule Source Localization in Advection-Diffusion-Driven Pipe Networks with Known Topology

Timo Jakumeit, Bastian Heinlein, Vukašin Spasojević, Vahid Jamali, Robert Schober, Maximilian Schäfer

Comments 8 pages, 6 figures; This paper has been submitted to the 13th ACM International Conference on Nanoscale Computing and Communication (ACM NanoCom 2026)

详情
英文摘要

Synthetic molecular communication (MC) has emerged as a powerful framework for modeling, analyzing, and designing communication systems where information is encoded into properties of molecules. Among the envisioned applications of MC is the localization of molecule sources in pipe networks (PNs) like the human cardiovascular system (CVS), sewage networks (SNs), and industrial plants. While existing algorithms mostly focus on simplified scenarios, in this paper, we propose the first framework for source localization in complex PNs with known topology, by leveraging the mixture of inverse Gaussians for hemodynamic transport (MIGHT) model as a closed-form representation for advection-diffusion-driven MC in PNs. We propose a matched filter (MF)-based approach to identify molecule sources under realistic conditions such as unknown release times, random numbers of released molecules, sensor noise, and limited sensor sampling rate. We apply the algorithm to localize a source of viral markers in a real-world SN and show that the proposed scheme outperforms randomly guessing sources even at low signal-to-noise ratios (SNRs) at the sensor and achieves error-free localization under favorable conditions, i.e., high SNRs and sampling rates. Furthermore, by identifying clusters of frequently confused sources, reliable cluster-level localization is possible at substantially lower SNRs and sampling rates.

2603.15393 2026-03-17 math.OC cs.SY eess.SY math.DS

Unimodal self-oscillations and their sign-symmetry for discrete-time relay feedback systems with dead zone

Kang Tong, Christian Grussler, Michelle S. Chong

详情
英文摘要

This paper characterizes self-oscillations in discrete-time linear time-invariant (LTI) relay feedback systems with nonnegative dead zone. Specifically, we aim to establish existence criteria for unimodal self-oscillations, defined as periodic solutions where the output exhibits a single-peaked period. Assuming that the linear part of system is stable, with a strictly monotonically decreasing impulse response on its infinite support, we propose a novel analytical framework based on the theory of total positivity to address this problem. We demonstrate that unimodal self-oscillations subject to mild variation-based constraints exist only if the number of positive and negative values of the system's loop gain coincides within a given strictly positive period, i.e., the self-oscillation is sign-symmetric. Building upon these findings, we derive conditions for the existence of such self-oscillations, establish tight bounds on their periods, and address the question of their uniqueness.

2603.15360 2026-03-17 math.OC cs.SY eess.SY

Mitigating Renewable-Induced Risks for Green and Conventional Ammonia Producers through Coordinated Production and Futures Trading

Huayan Geng, Yangjun Zeng, Yiwei Qiu

详情
英文摘要

Renewable power-to-ammonia (ReP2A), which uses hydrogen produced from renewable electricity as feedstock, is a promising pathway for decarbonizing the energy, transportation, and chemical sectors. However, variability in renewable generation causes fluctuations in hydrogen supply and ammonia production, leading to revenue instability for both ReP2A producers and conventional fossil-based gray ammonia (GA) producers in the market. Existing studies mainly rely on engineering measures, such as production scheduling, to manage this risk, but their effectiveness is constrained by physical system limits. To address this challenge, this paper proposes a financial instrument termed \emph{renewable ammonia futures} and integrates it with production decisions to hedge ammonia output risk. Production and trading models are developed for both ReP2A and GA producers, with conditional value-at-risk (CVaR) used to represent risk preferences under uncertainty. A game-theoretic framework is established in which the two producers interact in coupled ammonia spot and futures markets, and a Nash bargaining mechanism coordinates their production and trading strategies. Case studies based on a real-world system show that introducing renewable ammonia futures increases the CVaR utilities of ReP2A and GA producers by 5.103% and 10.14%, respectively, improving profit stability under renewable uncertainty. Sensitivity analysis further confirms the effectiveness of the mechanism under different levels of renewable variability and capacity configurations.

2603.15346 2026-03-17 math.OC cs.SY eess.SY math.DS

A superposition approach for the ISS Lyapunov-Krasovskii theorem with pointwise dissipation

Andrii Mironchenko, Fabian Wirth, Antoine Chaillet, Lucas Brivadis

详情
英文摘要

We show that the existence of a Lyapunov-Krasovskii functional (LKF) with pointwise dissipation (i.e. dissipation in terms of the current solution norm) suffices for input-to-state stability, provided that uniform global stability can also be ensured using the same LKF. To this end, we develop a stability theory, in which the behavior of solutions is not assessed through the classical norm but rather through a specific LKF, which may provide significantly tighter estimates. We discuss the advantages of our approach by means of an example.

2603.15311 2026-03-17 eess.SP

Near-field Boundary Distance in mmWave and THz Communications with Misaligned Antenna Arrays

Peng Zhang, Vitaly Petrov, Emil Björnson

详情
英文摘要

Wireless communications in the millimeter wave (mmWave) and terahertz (THz) spectrum allow harnessing large frequency bands, thus achieving ultra-high data rates. However, the inherently short wavelengths of mmWave and THz signals lead to an extended radiative near-field region, where certain canonical far-field assumptions fail. Most prior works aimed to characterize this radiative near-field region either do not consider antenna arrays on both communicating nodes or, if they do, assume perfect alignment between the arrays. However, such assumptions break down in many realistic deployments, where both sides must employ large-scale mmWave/THz antenna arrays to maintain the desired communication range, while perfect antenna alignment cannot be guaranteed particularly under nodes mobility. In this work, a generalized mathematical framework is presented to characterize the radiative near-field distance in directional mmWave and THz communication systems under various realistic array rotations and misalignments. With the use of the developed framework, compact closed-form expressions are derived for the near-field boundary distance in a wide range of antenna configurations, including array-to-array and array-to-point setups, considering both linear and planar arrays. Our numerical study reveals that the presence of antenna misalignment may significantly adjust the boundaries of the near-field region in mmWave and THz communication systems.

2603.15310 2026-03-17 eess.SP

On the CRLB for Blind Receiver I/Q Imbalance Estimation in OFDM Systems: Efficient Computation and Closed-Form Bounds

Moritz Tockner, Oliver Lang, Andreas Meingassner-Lang, Mario Huemer

Comments 12 pages, 4 figures, extended version of our work presented at the 2025 Asilomar Conference on Signals, Systems, and Computers

详情
英文摘要

Modern mobile communication receivers are often implemented with a direct-conversion architecture, which features a number of advantages over competing designs. A notable limitation of direct-conversion architectures, however, is their sensitivity to amplitude and phase mismatches between the in-phase and quadrature signal paths. Such in-phase and quadrature-phase (I/Q) imbalances introduce undesired image components in the baseband signal, degrading link performance -- most notably by increasing the bit-error ratio. Considerable research effort has therefore been devoted to digital techniques for estimating and mitigating these impairments. Existing approaches generally fall into two categories: data-aided methods that exploit known pilots, preambles, or training sequences, and blind techniques that operate without such prior information. For data-aided estimation, Cramér-Rao lower bounds (CRLBs) have been established in the literature. In contrast, the derivation of a CRLB for the blind I/Q-imbalance estimation case is considerably more challenging, since the received data is random and typically non-Gaussian in the frequency domain. This work extends our earlier conference contribution, which introduced a CRLB derivation for the blind estimation of frequency-independent (FID) receiver I/Q imbalance using central limit theorem (CLT) arguments. The extensions include a computationally efficient method for calculating the bound, reducing complexity from cubic in the number of samples to linear in the fast-Fourier transform (FFT) size, along with a simplified closed-form approximation. This approximation provides new insights into the allocation dependent performances of existing estimation methods, motivating a pre-estimation filtering modification that drastically improves their estimation performance in certain scenarios.

2603.15288 2026-03-17 eess.AS

Neural Network-Based Time-Frequency-Bin-Wise Linear Combination of Beamformers for Underdetermined Target Source Extraction

Changda Chen, Yichen Yang, Wei Liu, Shoji Makino

Comments Accepted by ICASSP 2026

详情
英文摘要

Extracting a target source from underdetermined mixtures is challenging for beamforming approaches. Recently proposed time-frequency-bin-wise switching (TFS) and linear combination (TFLC) strategies mitigate this by combining multiple beamformers in each time-frequency (TF) bin and choosing combination weights that minimize the output power. However, making this decision independently for each TF bin can weaken temporal-spectral coherence, causing discontinuities and consequently degrading extraction performance. In this paper, we propose a novel neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework that constructs minimum power distortionless response (MPDR) beamformers without explicit noise covariance estimation. The network encodes the mixture and beamformer outputs, and predicts temporally and spectrally coherent linear combination weights via a cross-attention mechanism. On dual-microphone mixtures with multiple interferers, NN-TFLC-MPDR consistently outperforms TFS/TFLC-MPDR and achieves competitive performance with TFS/TFLC built on the minimum variance distortionless response (MVDR) beamformers that require noise priors.

2603.15286 2026-03-17 eess.SY cs.SY

ReLU Barrier Functions for Nonlinear Systems with Constrained Control: A Union of Invariant Sets Approach

Pouya Samanipour, Hasan A. Poonawala

Comments Accepted to ACC 2026

详情
英文摘要

Certifying safety for nonlinear systems with polytopic input constraints is challenging because CBF synthesis must ensure control admissibility under saturation. We propose an approximation--verification pipeline that performs convex barrier synthesis on piecewise-affine (PWA) surrogates and certifies safety for the original nonlinear system via facet-wise verification. To reduce conservatism while preserving tractability, we use a two-slope Leaky ReLU surrogate for the extended class-$\mathcal{K}$ function $α(\cdot)$ and combine multiple certificates using a Union of Invariant Sets (UIS). Counterexamples are handled through local uncertainty updates. Simulations on pendulum and cart-pole systems with input saturation show larger certified invariant sets than linear-$α$ designs with tractable computation time.

2603.15285 2026-03-17 eess.SP

Fast Volume Alignment by Frequency-Marched Newton

Fabian Kruse, Valentin Debarnot, Vinith Kishore, Ivan Dokmanić

详情
英文摘要

We develop a fast and accurate method for 3D alignment, recovering the rotation and translation that best align a reference volume with a noisy observation. Classical matched filtering evaluates cross-correlation over a large discretized transformation space; we show that high-precision alignment can be achieved far more efficiently by treating pose estimation as a continuous optimization problem. Our starting point is a band-limited Wigner-$D$ expansion of the rotational correlation, which enables rapid evaluation and efficient closed-form gradients and Hessians. Combined with analytical control of the complexity of trigonometric-polynomial landscapes, this makes second-order optimization practical in a setting where it is often avoided due to nonconvexity and noise sensitivity. We show that Newton-type refinement is stable and effective when initialized at low angular bandwidth: a coarse low-resolution $\mathrm{SO}(3)$ search provides robust candidates, which are then refined by iterative frequency marching and Newton steps, with translations updated via FFT in an alternating scheme. We provide a deterministic convergence guarantee showing that, under verifiable spectral-decay and gap conditions, the frequency-marching scheme returns a near-optimal solution whose suboptimality is controlled by the Newton tolerance. On synthetic rotation-estimation benchmarks, the method attains sub-degree accuracy while substantially reducing runtime relative to exhaustive $\mathrm{SO}(3)$ search. Integrated into the subtomogram-averaging pipeline of RELION5, it matches the baseline reconstruction quality, reaching local resolution at the Nyquist limit, while reducing pose-refinement time by more than an order of magnitude.

2603.15234 2026-03-17 eess.SP

RIS-Aided RSMA Improves the Latency vs. Energy Trade-off in the Finite Block Length MIMO Downlink

Mohammad Soleymani, Bruno Clerckx, Robert Schober, Lajos Hanzo

Comments Accepted at IEEE Open Journal of Vehicular Technology

详情
英文摘要

We simultaneously minimize the latency and improve energy efficiency (EE) of the multi-user multiple-input multiple-output (MU-MIMO) rate splitting multiple access (RSMA) downlink, aided by a reconfigurable intelligent surface (RIS). Our results show that RSMA improves the EE and may reduce the delay to 13\% of that of spatial division multiple access (SDMA). Moreover, RIS and RSMA support each other synergistically, while an RIS operating without RSMA provides limited benefits in terms of latency and cannot effectively mitigate interference. {Furthermore, increasing the RIS size amplifies the gains of RSMA more significantly than those of SDMA, without altering the fundamental EE-latency trade-offs.} Results also show that latency increases with more stringent reliability requirements, and RSMA yields more significant gains under such conditions, making it eminently suitable for energy-efficient ultra-reliable low-latency communication (URLLC) scenarios.

2603.15184 2026-03-17 cs.LG cs.AI cs.NE eess.IV

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi Nagabhushana, Kartikay Agrawal, Ayon Borthakur

Comments Accepted for publication in the proceedings of the Neuro for AI & AI for Neuro Workshop at AAAI 2026 (PMLR)

详情
英文摘要

Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data distribution that may or may not follow the initial distribution. Previously acquired knowledge is lost during subsequent updates based on new data. a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity but also in modulating neuronal excitability. At the core of CATFormer is the Dynamic Threshold Leaky Integrate-and-Fire (DTLIF) neuron model, which leverages context-adaptive thresholds as the primary mechanism for knowledge retention. This is paired with a Gated Dynamic Head Selection (G-DHS) mechanism for task-agnostic inference. Extensive evaluation on both static (CIFAR-10/100/Tiny-ImageNet) and neuromorphic (CIFAR10-DVS/SHD) datasets reveals that CATFormer outperforms existing rehearsal-free CIL algorithms across various task splits, establishing it as an ideal architecture for energy-efficient, true-class incremental learning.

2603.15180 2026-03-17 eess.SY cs.AI cs.SY

Iterative Learning Control-Informed Reinforcement Learning for Batch Process Control

Runze Lin, Ziqi Zhuo, Junghui Chen, Lei Xie, Hongye Su

详情
英文摘要

A significant limitation of Deep Reinforcement Learning (DRL) is the stochastic uncertainty in actions generated during exploration-exploitation, which poses substantial safety risks during both training and deployment. In industrial process control, the lack of formal stability and convergence guarantees further inhibits adoption of DRL methods by practitioners. Conversely, Iterative Learning Control (ILC) represents a well-established autonomous control methodology for repetitive systems, particularly in batch process optimization. ILC achieves desired control performance through iterative refinement of control laws, either between consecutive batches or within individual batches, to compensate for both repetitive and non-repetitive disturbances. This study introduces an Iterative Learning Control-Informed Reinforcement Learning (IL-CIRL) framework for training DRL controllers in dual-layer batch-to-batch and within-batch control architectures for batch processes. The proposed method incorporates Kalman filter-based state estimation within the iterative learning structure to guide DRL agents toward control policies that satisfy operational constraints and ensure stability guarantees. This approach enables the systematic design of DRL controllers for batch processes operating under multiple disturbance conditions.

2603.15160 2026-03-17 eess.SY cs.SY math.DS

Multi-Scale Control of Large Agent Populations: From Density Dynamics to Individual Actuation

Mario di Bernardo

详情
英文摘要

We review a body of recent work by the author and collaborators on controlling the spatial organisation of large agent populations across multiple scales. A central theme is the systematic bridging of microscopic agent-level dynamics and macroscopic density descriptions, enabling control design at the most natural level of abstraction and subsequent translation across scales. We show how this multi-scale perspective provides a unified approach to both \emph{direct control}, where every agent is actuated, and \emph{indirect control}, where few leaders or herders steer a larger uncontrolled population. The review covers continuification-based control with robustness under limited sensing and decentralised implementation via distributed density estimation; leader--follower density regulation with dual-feedback stability guarantees and bio-inspired plasticity; optimal-transport methods for coverage control and macro-to-micro discretisation; nonreciprocal field theory for collective decision-making; mean-field control barrier functions for population-level safety; and hierarchical reinforcement learning for settings where closed-form solutions are intractable. Together, these results demonstrate the breadth and versatility of a multi-scale control framework that integrates analytical methods, learning, and physics-inspired approaches for large agent populations.

2603.15120 2026-03-17 eess.AS

How Attention Shapes Emotion: A Comparative Study of Attention Mechanisms for Speech Emotion Recognition

Marc Casals-Salvador, Federico Costa, Rodolfo Zevallos, Javier Hernando

详情
英文摘要

Speech Emotion Recognition (SER) plays a key role in advancing human-computer interaction. Attention mechanisms have become the dominant approach for modeling emotional speech due to their ability to capture long-range dependencies and emphasize salient information. However, standard self-attention suffers from quadratic computational and memory complexity, limiting its scalability. In this work, we present a systematic benchmark of optimized attention mechanisms for SER, including RetNet, LightNet, GSA, FoX, and KDA. Experiments on both MSP-Podcast benchmark versions show that while standard self-attention achieves the strongest recognition performance across test sets, efficient attention variants dramatically improve scalability, reducing inference latency and memory usage by up to an order of magnitude. These results highlight a critical trade-off between accuracy and efficiency, providing practical insights for designing scalable SER systems.

2603.15105 2026-03-17 eess.SP

Dual-Domain Sparse Adaptive Filtering: Exploiting Error Memory for Improved Performance

Mohammad Salman, Hadi Zayyani, Felipe A. P. de Figueiredo, Hasan Abu Hilal, Mostafa Rashdan

详情
英文摘要

Many signal processing applications such as acoustic echo cancellation and wireless channel estimation require identifying systems where only a small fraction of coefficients are actually active, i.e. sparse systems. Zero-attracting adaptive filters tackle this by adding a penalty that pulls inactive coefficients toward zero, speeding up convergence. However, these algorithms determine which coefficients to penalize based solely on their current size. This creates a problem during early adaptation since active coefficients that should eventually grow large start out small, making them look identical to truly inactive coefficients. The algorithm ends up applying strong penalties to the very coefficients it needs to develop, slowing down the initial convergence. This paper provides a solution to this problem by introducing a dual-domain approach that looks at coefficients from two perspectives simultaneously. Beyond just tracking coefficient magnitude, we introduce an error-memory vector that monitors how persistently each coefficient contributes to the adaptation error over time. If a coefficient keeps showing up in the error signal, it is probably active even if it is still small. By combining both views, the proposed dual-domain sparse adaptive filter (DD-SAF) can identify active coefficients early and eliminate penalties accordingly. Moreover, complete theoretical analysis is derived. The analysis shows that DD-SAF maintains the same stability properties as standard least-mean-square (LMS) while achieves provably better steady-state performance than existing methods. Simulations demonstrate that the DD-SAF converges to the steady-state faster and/or convergences to a lower mean-square-deviation (MSD) than the standard LMS and the reweighted zero-attracting LMS (RZA-LMS) algorithms for sparse system identification settings.

2603.15068 2026-03-17 eess.SP cs.LG

Generative Semantic HARQ: Latent-Space Text Retransmission and Combining

Bin Han, Yulin Hu, Hans D. Schotten

Comments Submitted to IEEE PIMRC 2026

详情
英文摘要

Semantic communication conveys meaning rather than raw bits, but reliability at the semantic level remains an open challenge. We propose a semantic-level hybrid automatic repeat request (HARQ) framework for text communication, in which a Transformer-variational autoencoder (VAE) codec operates as a lightweight overlay on the conventional protocol stack. The stochastic encoder inherently generates diverse latent representations across retransmissions-providing incremental knowledge (IK) from a single model without dedicated protocol design. On the receiver side, a soft quality estimator triggers retransmissions and a quality-aware combiner merges the received latent vectors within a consistent latent space. We systematically benchmark six semantic quality metrics and four soft combining strategies under hybrid semantic distortion that mixes systematic bias with additive noise. The results suggest combining Weighted-Average or MRC-Inspired combining with self-consistency-based HARQ triggering for the best performance.

2603.14986 2026-03-17 eess.AS

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Ui-Hyeop Shin, Jun Hyung Kim, Jangyeon Kim, Wooseok Kim, Hyung-Min Park

Comments Submitted for review to Interspeech

详情
英文摘要

Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.

2603.14959 2026-03-17 eess.SP

Cyclic Delay-Doppler Shift: A Simple Transmit Diversity Technique for Ultra-Reliable Communications in Doubly Selective Channels

Haoran Yin, Yu Zhou, Yanqun Tang, Di Zhang, Chi Zhang, Xizhang Wei, Jiaojiao Xiong, Fan Liu, Marwa Chafii, Mérouane Debbah

Comments Under revision in an IEEE Journal

详情
英文摘要

Affine frequency division multiplexing (AFDM) and orthogonal time frequency space (OTFS) are two promising advanced waveforms proposed for reliable communications in high-mobility scenarios. In this paper, we introduce a simple transmit diversity technique, termed cyclic delay-Doppler shift (CDDS), for these two advanced waveforms to achieve ultra-reliable communications in doubly selective channels (DSCs). Two simple CDDS schemes, named modulation-domain CDDS (MD-CDDS) and time-domain CDDS (TD-CDDS), are proposed, which perform CDDS in advance at the transmitter before and after the modulation, respectively. We demonstrate that both of the two proposed CDDS schemes can be implemented efficiently and flexibly by multiplying the transmit vector with a well-designed precoding matrix, which is nothing but a sparse phase-compensated permutation matrix. Moreover, we theoretically and numerically prove that CDDS can provide MIMO-AFDM and MIMO-OTFS with optimal transmit diversity gain when a proper CDDS step is adopted. Compared to the conventional transmit diversity techniques, the proposed CDDS scheme enjoys the advantages of lower channel estimation overhead, implementation complexity, and signal processing latency, making it particularly suitable for ultra-reliable communications in high-mobility scenarios.

2603.14943 2026-03-17 eess.SP

RF-Fencing: A Novel RIS-Based Service for Proactive Covert Communications

Alexandros I. Papadopoulos, Dimitrios Tyrovolas, Alexandros Pitilakis, Panagiotis D. Diamantoulakis, Antonios Lalas, Konstantinos Votis, Nikolaos V. Kantartzis, Sotiris Ioannidis, Christos Liaskos

详情
英文摘要

Programmable wireless environments (PWEs), empowered by reconfigurable intelligent surfaces (RISes), have emerged as a transformative paradigm for next-generation networks, enabling deterministic control over electromagnetic (EM) propagation to enhance both performance and security. In this work, we introduce RF-Fencing, a novel RIS-enabled PWE service that enforces spatially selective control over wireless transmissions, simultaneously suppressing unwanted signal exposure while sustaining robust connectivity for legitimate users. To realize this vision, we develop SHIELD, a lightweight and scalable algorithm that orchestrates multiple RIS units by multiplexing precompiled codebook entries with real-time, low-complexity optimization. Through extensive evaluations across diverse frequencies, RIS configurations, and deployment scenarios, SHIELD demonstrates both far-field directional control and near-field quiet-zone creation, thereby enhancing network security. Our findings reveal that SHIELD effectively balances proactive covert communication with service delivery by dynamically managing multiple signal suppression and delivery areas, while enabling the realization of EM quiet zones with minimal impact on surrounding regions, ultimately establishing RF-Fencing as a practical RIS-based foundation for privacy-preserving and adaptive wireless environments in future 6G networks.

2603.14940 2026-03-17 eess.SY cs.LG cs.RO cs.SY

Intelligent Control of Differential Drive Robots Subject to Unmodeled Dynamics with EKF-based State Estimation

Amos Alwala, Yuchen Hu, Gabriel da Silva Lima, Wallace Moreira Bessa

详情
英文摘要

Reliable control and state estimation of differential drive robots (DDR) operating in dynamic and uncertain environments remains a challenge, particularly when system dynamics are partially unknown and sensor measurements are prone to degradation. This work introduces a unified control and state estimation framework that combines a Lyapunov-based nonlinear controller and Adaptive Neural Networks (ANN) with Extended Kalman Filter (EKF)-based multi-sensor fusion. The proposed controller leverages the universal approximation property of neural networks to model unknown nonlinearities in real time. An online adaptation scheme updates the weights of the radial basis function (RBF), the architecture chosen for the ANN. The learned dynamics are integrated into a feedback linearization (FBL) control law, for which theoretical guarantees of closed-loop stability and asymptotic convergence in a trajectory-tracking task are established through a Lyapunov-like stability analysis. To ensure robust state estimation, the EKF fuses inertial measurement unit (IMU) and odometry from monocular, 2D-LiDAR and wheel encoders. The fused state estimate drives the intelligent controller, ensuring consistent performance even under drift, wheel slip, sensor noise and failure. Gazebo simulations and real-world experiments are done using DDR, demonstrating the effectiveness of the approach in terms of improved velocity tracking performance with reduction in linear and angular velocity errors up to $53.91\%$ and $29.0\%$ in comparison to the baseline FBL.

2603.14917 2026-03-17 eess.AS cs.AI cs.LG eess.SP

Spectrogram features for audio and speech analysis

Ian McLoughlin, Lam Pham, Yan Song, Xiaoxiao Miao, Huy Phan, Pengfei Cai, Qing Gu, Jiang Nan, Haoyu Song, Donny Soh

Comments 30 pages

详情
Journal ref
Analysis. Appl. Sci. 2026, 16, 572
英文摘要

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was their ability to present sound as a two dimensional signal in the time-frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a wide range of machine learning techniques such as convolutional neural networks, that had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its two dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks.

2603.14912 2026-03-17 eess.SP

Integrated Channel Sounding and Communication: Requirements, Architecture, Challenges, and Key Technologies

Nanhao Zhou, Chao Zou, Yu Zhou, Yanqun Tang, Xiaoying Zhang, Haoran Yin, Xuefeng Yin, Yuxiang Zhang, Dan Fei, Fan Jiang

Comments Under revision in an IEEE Magazine

详情
英文摘要

Channel models are essential for the design, evaluation, and optimization of wireless communication systems. The emerging space-air-ground-sea integrated network (SAGSIN), characterized by diverse service applications and extended-spectrum operations, places even greater demands on highly accurate channel models. However, conventional channel sounding is limited by generalized measurement campaigns, inadequate cross-band consistency, and insufficient real-time adaptability, making it unable to meet the needs of SAGSIN for scenario-specific and high-precision channel modeling. To address this challenge, we propose a novel technological framework, termed integrated channel sounding and communication (ICSC). By deeply integrating sounding and communication, the ICSC enables efficient and real-time acquisition of dynamic channel characteristics during communication processes, supporting fine-grained site- and scenario-specific measurements. Furthermore, leveraging artificial intelligence techniques, ICSC can identify channel conditions and adapt waveform parameters in real-time according to scenario variations, which in turn enhances communication performance. This article first introduces the fundamental principles of the ICSC framework, elaborates on its core concepts and key advantages, and demonstrates its feasibility through the development of an integrated verification system (IVS). Subsequently, the potential applications and opportunities of the ICSC are analyzed in depth, followed by a discussion of its future development directions and remaining challenges.

2603.14910 2026-03-17 eess.SY cs.RO cs.SY

Transformers As Generalizable Optimal Controllers

Turki Bin Mohaya, Maitham F. AL-Sunni, John M. Dolan, Peter Seiler

Comments 6 pages

详情
英文摘要

We study whether optimal state-feedback laws for a family of heterogeneous Multiple-Input, Multiple-Output (MIMO) Linear Time-Invariant (LTI) systems can be captured by a single learned controller. We train one transformer policy on LQR-generated trajectories from systems with different state and input dimensions, using a shared representation with standardization, padding, dimension encoding, and masked loss. The policy maps recent state history to control actions without requiring plant matrices at inference time. Across a broad set of systems, it achieves empirically small sub-optimality relative to Linear Quadratic Regulator (LQR), remains stabilizing under moderate parameter perturbations, and benefits from lightweight fine-tuning on unseen systems. These results support transformer policies as practical approximators of near-optimal feedback laws over structured linear-system families.

2603.14877 2026-03-17 eess.AS

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Ruiqi Yan, Wenxi Chen, Zhanxun Liu, Ziyang Ma, Haopeng Lin, Hanlin Wen, Hanke Xie, Jun Wu, Yuzhe Liang, Yuxiang Zhao, Pengchao Feng, Jiale Qian, Hao Meng, Yuhang Dai, Shunshun Yin, Ming Tao, Lei Xie, Kai Yu, Xinsheng Wang, Xie Chen

Comments submitted to Interspeech 2026, under review

详情
英文摘要

Recent advances in spoken dialogue systems have brought increased attention to human-like full-duplex voice interactions. However, our comprehensive review of this field reveals several challenges, including the difficulty in obtaining training data, catastrophic forgetting, and limited scalability. In this work, we propose SoulX-Duplug, a plug-and-play streaming state prediction module for full-duplex spoken dialogue systems. By jointly performing streaming ASR, SoulX-Duplug explicitly leverages textual information to identify user intent, effectively serving as a semantic VAD. To promote fair evaluation, we introduce SoulX-Duplug-Eval, extending widely used benchmarks with improved bilingual coverage. Experimental results show that SoulX-Duplug enables low-latency streaming dialogue state control, and the system built upon it outperforms existing full-duplex models in overall turn management and latency performance. We have open-sourced SoulX-Duplug and SoulX-Duplug-Eval.