arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 112
2603.09952 2026-03-11 cs.LG cs.NA cs.SY eess.SY math.NA math.OC stat.ML

On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer

Ruihan Xu, Jiajin Li, Yiping Lu

详情
英文摘要

A central question in modern deep learning is how to design optimizers whose behavior remains stable as the network width $w$ increases. We address this question by interpreting several widely used neural-network optimizers, including \textrm{AdamW} and \textrm{Muon}, as instances of steepest descent under matrix operator norms. This perspective links optimizer geometry with the Lipschitz structure of the network forward map, and enables width-independent control of both Lipschitz and smoothness constants. However, steepest-descent rules induced by standard $p \to q$ operator norms lack layerwise composability and therefore cannot provide width-independent bounds in deep architectures. We overcome this limitation by introducing a family of mean-normalized operator norms, denoted $\pmean \to \qmean$, that admit layerwise composability, yield width-independent smoothness bounds, and give rise to practical optimizers such as \emph{rescaled} \textrm{AdamW}, row normalization, and column normalization. The resulting learning rate width-aware scaling rules recover $μ$P scaling~\cite{yang2021tensor} as a special case and provide a principled mechanism for cross-width learning-rate transfer across a broad class of optimizers. We further show that \textrm{Muon} can suffer an $\mathcal{O}(\sqrt{w})$ worst-case growth in the smoothness constant, whereas a new family of row-normalized optimizers we propose achieves width-independent smoothness guarantees. Based on the observations, we propose MOGA (Matrix Operator Geometry Aware), a width-aware optimizer based only on row/column-wise normalization that enables stable learning-rate transfer across model widths. Large-scale pre-training on GPT-2 and LLaMA shows that MOGA, especially with row normalization, is competitive with Muon while being notably faster in large-token and low-loss regimes.

2603.09942 2026-03-11 eess.SY cs.AI cs.NI cs.SY

Towards Flexible Spectrum Access: Data-Driven Insights into Spectrum Demand

Mohamad Alkadamani, Amir Ghasemi, Halim Yanikomeroglu

Comments 7 pages, 5 figures. Presented at IEEE VTC 2024, Washington, DC. Published in the IEEE conference proceedings

详情
英文摘要

In the diverse landscape of 6G networks, where wireless connectivity demands surge and spectrum resources remain limited, flexible spectrum access becomes paramount. The success of crafting such schemes hinges on our ability to accurately characterize spectrum demand patterns across space and time. This paper presents a data-driven methodology for estimating spectrum demand variations over space and identifying key drivers of these variations in the mobile broadband landscape. By leveraging geospatial analytics and machine learning, the methodology is applied to a case study in Canada to estimate spectrum demand dynamics in urban regions. Our proposed model captures 70\% of the variability in spectrum demand when trained on one urban area and tested on another. These insights empower regulators to navigate the complexities of 6G networks and devise effective policies to meet future network demands.

2603.09918 2026-03-11 eess.SY cs.SY

Emergency Locator Transmitters in the Era of More Electric Aircraft: A Comprehensive Review of Energy, Integration and Safety Challenges

Juana M. Martínez-Heredia, Adrián Portos, Marcel Štěpánek, Francisco Colodro

详情
英文摘要

The progressive electrification of aircraft systems under the more electric aircraft (MEA) paradigm is reshaping the design and qualification constraints of safety-critical avionics. Emergency locator transmitters (ELTs), which are essential for post-accident localization and search and rescue (SAR) operations, have evolved from legacy 121.5/243 MHz beacons to digitally encoded 406 MHz systems, typically retaining 121.5 MHz as a homing signal in combined units. In parallel, the modernization of the Cospas-Sarsat infrastructure, especially MEOSAR, together with multi-constellation global navigation satellite system (GNSS) integration and second-generation beacon capabilities, is reducing detection latency and enabling richer distress messaging. However, MEA platforms impose stricter constraints on available power, thermal management, wiring density, and electromagnetic compatibility (EMC). As a result, ELT performance increasingly depends not only on the device itself, but also on its installation conditions and on the aircraft's overall electrical environment. This review summarizes the ELT architectures and activation/operational cycles, outlines key technological milestones, and consolidates the main integration challenges for MEA, with emphasis on energy autonomy, battery qualification frameworks, EMC and installation practices, and survivability-driven failure modes (e.g., antenna/feedline damage, mounting, and post-impact shielding). Finally, emerging trends include ELT for distress tracking (DT), energy-based designs, advanced health monitoring, and certification-ready pathways for next-generation SAR services are discussed, highlighting research directions that can deliver demonstrable, certifiable gains in reliability, energy efficiency, and robust integration for future electrified aircraft.

2603.09916 2026-03-11 eess.SY cs.AI cs.SY

AI-Enabled Data-driven Intelligence for Spectrum Demand Estimation

Colin Brown, Mohamad Alkadamani, Halim Yanikomeroglu

Comments Presented at an IEEE ICC 2025 Workshop and published in the conference proceedings

详情
英文摘要

Accurately forecasting spectrum demand is a key component for efficient spectrum resource allocation and management. With the rapid growth in demand for wireless services, mobile network operators and regulators face increasing challenges in ensuring adequate spectrum availability. This paper presents a data-driven approach leveraging artificial intelligence (AI) and machine learning (ML) to estimate and manage spectrum demand. The approach uses multiple proxies of spectrum demand, drawing from site license data and derived from crowdsourced data. These proxies are validated against real-world mobile network traffic data to ensure reliability, achieving an R$^2$ value of 0.89 for an enhanced proxy. The proposed ML models are tested and validated across five major Canadian cities, demonstrating their generalizability and robustness. These contributions assist spectrum regulators in dynamic spectrum planning, enabling better resource allocation and policy adjustments to meet future network demands.

2603.09908 2026-03-11 cs.RO cs.SY eess.SY

NanoBench: A Multi-Task Benchmark Dataset for Nano-Quadrotor System Identification, Control, and State Estimation

Syed Izzat Ullah, Jose Baca

Comments 9 pages, 6 figures

详情
英文摘要

Existing aerial-robotics benchmarks target vehicles from hundreds of grams to several kilograms and typically expose only high-level state data. They omit the actuator-level signals required to study nano-scale quadrotors, where low-Reynolds number aerodynamics, coreless DC motor nonlinearities, and severe computational constraints invalidate models and controllers developed for larger vehicles. We introduce NanoBench, an open-source multi-task benchmark collected on the commercially available Crazyflie 2.1 nano-quadrotor (takeoff weight 27 g) in a Vicon motion capture arena. The dataset contains over 170 flight trajectories spanning hover, multi-frequency excitation, standard tracking, and aggressive maneuvers across multiple speed regimes. Each trajectory provides synchronized Vicon ground truth, raw IMU data, onboard extended Kalman filter estimates, PID controller internals, and motor PWM commands at 100 Hz, alongside battery telemetry at 10 Hz, aligned with sub-0.5 ms consistency. NanoBench defines standardized evaluation protocols, train/test splits, and open-source baselines for three tasks: nonlinear system identification, closed-loop controller benchmarking, and onboard state estimation assessment. To our knowledge, it is the first public dataset to jointly provide actuator commands, controller internals, and estimator outputs with millimeter-accurate ground truth on a commercially available nano-scale aerial platform.

2603.09904 2026-03-11 eess.SY cs.SY

Dynamic Average Consensus with Privacy Guarantees and Its Application to Battery Energy Storage Systems

Mihitha Maithripala, Chenyang Qiu, Zongli Lin

详情
英文摘要

A privacy-preserving dynamic average consensus (DAC) algorithm is proposed that achieves consensus while preventing external eavesdroppers from inferring the reference signals and their derivatives. During the initialization phase, each agent generates a set of sinusoidal signals with randomly selected frequencies and exchanges them with its neighboring agents to construct a masking signal. Each agent masks its reference signals using this composite masking signal before executing the DAC update rule. It is shown that the developed scheme preserves the convergence properties of the conventional DAC framework while preventing information leakage to external eavesdroppers. Furthermore, the developed algorithm is applied to state-of-charge (SoC) balancing in a networked battery energy storage system to demonstrate its practical applicability. Simulation results validate the theoretical findings.

2603.09894 2026-03-11 eess.SY cs.SY

A Survey on Cloud-Based 6G Deployments: Current Solutions, Future Directions and Open Challenges

Tolga O. Atalay, Alireza Famili, Amirreza Ghafoori, Angelos Stavrou

Comments 47 pages, 403 citations, 21 figures, journal

详情
英文摘要

The next generation of cellular networks is designed to provide ubiquitous connectivity to a wide range of devices. As Telecommunication Service Providers (TSPs) increasingly collaborate with public cloud providers to deploy 5G and beyond networks, a fundamental shift is underway, from hardware-bound Physical Network Functions (PNFs) to cloud-native, containerized deployments managed through platforms like Kubernetes. While this transition promises greater scalability, flexibility, and cost efficiency, it also introduces a complex set of technical and operational challenges that must be thoroughly understood before large-scale cellular deployments can take place in cloud environments. In this survey, we present a structured taxonomy that categorizes the design space of cloud-based cellular deployments across four dimensions: deployment architecture, resource management and orchestration, multi-tenancy and isolation, and economic and ownership models. Using this taxonomy as a foundation, we critically analyze six key investigation areas, security and privacy, scalability and elasticity, performance and latency, cost optimization, resilience and fault management, and compliance and sovereignty, examining each through a cloud-native lens. To benchmark the state of industry adoption, we examine the deployment strategies of leading Infrastructure-as-a-Service (IaaS) providers, namely Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Finally, we identify emerging trends such as AI-driven orchestration, quantum-safe protocols for virtualized network functions, and serverless networking for 6G, while articulating the open challenges that remain in realizing robust, scalable cloud-based cellular networks.

2603.09893 2026-03-11 eess.SP

Efficient, Adaptive Near-Field Beam Training based on Linear Bandit

Junchi Liu, Zijun Wang, Rui Zhang

Comments This paper is submitted to IEEE Wireless Communication Letter

详情
英文摘要

This letter proposes a linear bandit-based beam training framework for near-field communication under multi-path channels. By leveraging Thompson Sampling (TS), the framework adaptively balances exploration and exploitation to maximize cumulative beamforming gain under limited pilot overhead. To ensure data-efficient learning, we incorporate a correlated Gaussian prior in the DFT domain, using a Gaussian kernel to capture spatial correlations and near-field energy leakage. We develop three TS strategies: codebook-constrained search for rapid convergence via structural regularization, continuous-space search to achieve near-optimal performance, and a two-stage hybrid refinement scheme that balances convergence speed and estimation accuracy. Simulation results show that the proposed framework reduces pilot overhead by up to 90\% while achieving more than a 2dB SNR gain over baselines in multipath environments. Furthermore, the continuous-space search is shown to be asymptotically optimal, approaching the full-CSI bound when the pilot overhead is unconstrained.

2603.09878 2026-03-11 eess.SY cs.SY

Field Free Novel Architecture for Spintronic Flash Analog to Digital Converter

Abin Francis, Nikhil Kumar, Prince Philip

Comments 9 pages incluinding 2 pages of reference, 11 figures and 2 tables. Invited and presented at conference(ICMAGMA,2024)

详情
英文摘要

A 3 bit Analog to Digital Converter (ADC) is designed using perpendicular Spin Orbit Torque Magnetic Tunnel Junction (SOT MTJ). A sampled analog input signal is transmitted as a spin orbit torque current (Iin) to a perpendicular SOT MTJ, and deterministic switching is supported by the Voltage Controlled Magnetic Anisotropy (VCMA) and Spin Transfer Torque (STT) switching methods. Analog to digital conversion is done by comparing input signal with varied critical current of SOT MTJs. The critical current of each is SOT MTJ governed by varying widths of Heavy Metal (HM). In the 3 bit ADC, there are two sets of 7 SOT MTJs for quantizing input value, a conversion set and dummy set for comparing the change in resistance state. As input signal passed through conversion set SOT MTJs switches from Parallel (P) to AntiParallel (AP) state if the input signal exceeds its critical current. The conversion set change in state is converted to thermometer codes by StrongARM latch comparator by comparing the resistance with dummy set SOT MTJs, where all the in P state or low resistance. A novel architecture is proposed for increasing speed of throughput, by utilizing the dummy set of as a conversion set and conversion set as dummy set, thus eliminating the reset step from analog to digital conversion. And by improving SOT-MTJ and timing blocks a field free spin flash ADC has a power consumption of 476 uW with a conversion rate of 304.1 MHz is produced.

2603.09859 2026-03-11 cs.LG cs.AI cs.NI cs.SY eess.SY

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

Mohamad Alkadamani, Halim Yanikomeroglu, Amir Ghasemi

Comments 7 pages, 6 figures. Presented at IEEE GLOBECOM 2025, Taiwan. To appear in the conference proceedings

详情
英文摘要

The surge in wireless connectivity demand, coupled with the finite nature of spectrum resources, compels the development of efficient spectrum management approaches. Spectrum sharing presents a promising avenue, although it demands precise characterization of spectrum demand for informed policy-making. This paper introduces HR-GAT, a hierarchical resolution graph attention network model, designed to predict spectrum demand using geospatial data. HR-GAT adeptly handles complex spatial demand patterns and resolves issues of spatial autocorrelation that usually challenge standard machine learning models, often resulting in poor generalization. Tested across five major Canadian cities, HR-GAT improves predictive accuracy of spectrum demand by 21% over eight baseline models, underscoring its superior performance and reliability.

2603.09840 2026-03-11 eess.IV cs.CV

CycleULM: A unified label-free deep learning framework for ultrasound localisation microscopy

Su Yan, Clara Rodrigo Gonzalez, Vincent C. H. Leung, Herman Verinaz-Jadan, Jiakang Chen, Matthieu Toulemonde, Kai Riemer, Jipeng Yan, Clotilde Vié, Qingyuan Tan, Peter D. Weinberg, Pier Luigi Dragotti, Kevin G. Murphy, Meng-Xing Tang

Comments 43 pages, 14 figures, 2 tables, journal

详情
英文摘要

Super-resolution ultrasound via microbubble (MB) localisation and tracking, also known as ultrasound localisation microscopy (ULM), can resolve microvasculature beyond the acoustic diffraction limit. However, significant challenges remain in localisation performance and data acquisition and processing time. Deep learning methods for ULM have shown promise to address these challenges, however, they remain limited by in vivo label scarcity and the simulation-to-reality domain gap. We present CycleULM, the first unified label-free deep learning framework for ULM. CycleULM learns a physics-emulating translation between the real contrast-enhanced ultrasound (CEUS) data domain and a simplified MB-only domain, leveraging the power of CycleGAN without requiring paired ground truth data. With this translation, CycleULM removes dependence on high-fidelity simulators or labelled data, and makes MB localisation and tracking substantially easier. Deployed as modular plug-and-play components within existing pipelines or as an end-to-end processing framework, CycleULM delivers substantial performance gains across both in silico and in vivo datasets. Specifically, CycleULM improves image contrast (contrast-to-noise ratio) by up to 15.3 dB and sharpens CEUS resolution with a 2.5{\times} reduction in the full width at half maximum of the point spread function. CycleULM also improves MB localisation performance, with up to +40% recall, +46% precision, and a -14.0 μm mean localisation error, yielding more faithful vascular reconstructions. Importantly, CycleULM achieves real-time processing throughput at 18.3 frames per second with order-of-magnitude speed-ups (up to ~14.5{\times}). By combining label-free learning, performance enhancement, and computational efficiency, CycleULM provides a practical pathway toward robust, real-time ULM and accelerates its translation to clinical applications.

2603.09814 2026-03-11 eess.SY cs.SY

Learning-Augmented Primal-Dual Control Design for Secondary Frequency Regulation

Yixuan Yu, Rajni K. Bansal, Yan Jiang, Pengcheng You

详情
英文摘要

Frequency stability is fundamental to the secure operation of power systems. With growing uncertainty and volatility introduced by renewable generation, secondary frequency regulation must now deliver enhanced performance not only in the steady state but also during transients. This paper presents a systematic framework to embed learning in the design of a primal-dual controller that provides provable (potentially exponential) stability and steady-state optimality, while simultaneously improving key transient metrics, including frequency nadir and control effort, in a data-driven manner. In particular, we employ the primal-dual dynamics of an optimization problem that encodes steady-state objectives to realize secondary frequency control with asymptotic stability guarantee. To augment transient performance of the controller via learning, a change of variables on control inputs, which will be deployed by neural networks, is proposed such that under mild conditions, stability and steady-state optimality are preserved. It further allows us to define a learning goal that accounts for the exponential convergence rate, frequency nadir and accumulated control effort, and use sample trajectories to enhance these metrics. Simulation results validate the theories and demonstrate superior transient performance of the learning-augmented primal-dual controller.

2603.09808 2026-03-11 eess.SP

A Hybrid Model-Assisted Approach for Path Loss Prediction in Suburban Scenarios

Chenlong Wang, Bo Ai, Ruiming Chen, Ruisi He, Mi Yang, Yuxin Zhang, Weirong Liu, Liu Liu

详情
英文摘要

Accurate path loss prediction is crucial for wireless network planning and optimization in suburban environments with complex terrain variation and diverse land cover. This paper proposes a model assisted hybrid path loss prediction method that introduces an environment adaptive compensation on top of the classic close-in free-space reference distance (CI) path loss model. By jointly predicting the path loss exponent and a compensation term, the proposed approach dynamically adjusts the empirical trend. To improve the effectiveness of environmental representation, three environmental image organization schemes are constructed and evaluated. Experiments on measurement data collected in Pingtan Island show that the proposed method outperforms the CI model and a conventional model assisted baseline, achieving a test root mean square error of 4.04 dB.

2603.09807 2026-03-11 physics.optics cs.ET cs.SY eess.SY

Experimental Characterization of Biological Tissue Dielectric Properties through THz Time-Domain Spectroscopy

Elisabetta Marini, Silvia Mura, Marco Hernandez, Matti Hamalainen, Maurizio Magarini

Comments To be published in EAI BODYNETS 2025

详情
英文摘要

Terahertz (THz) radiation provides a non-ionizing, highly sensitive probe of the dielectric properties of biological tissues. In this study, we present a comprehensive experimental characterization of dielectric properties using pork skin tissue, a widely used surrogate for human tissue, as a biological sample. Measurements are conducted employing THz time-domain spectroscopy in the 0.1-11 THz frequency range with photoconductive antennas for both signal generation and detection. Frequency-dependent refractive indices, absorption, and complex permittivity are extracted from transmitted time-domain signals. Our results confirm strong absorption and low transmittance at low THz frequencies due to water content, while highlighting frequency-dependent dispersion and narrowband transmission features at higher frequencies. This work provides one of the first extended-frequency datasets of biological tissue dielectric properties, supporting realistic channel modeling for the design and development of intra-body nanosensor networks in the THz band.

2603.09791 2026-03-11 cs.ET eess.SP

Trade-Offs in FMCW Radar-Based Respiration and Heart Rate Variability

Silvia Mura, Davide Scazzoli, Lorenzo Fineschi, Maurizio Magarini

Comments to be published in EAI BODYNETS 2025

详情
英文摘要

This study presents a comprehensive experimental assessment of a low-cost frequency-modulated continuous-wave (FMCW) multiple-input multiple-output (MIMO) radar for non-contact vital sign monitoring, focusing on respiratory rate (RR) and heart rate (HR) estimation. The influence of sensing distance and number of transmitted chirps on measurement accuracy is systematically quantified. Results exhibit a U-shaped error profile with optimal performance near $70~cm$, achieving mean absolute errors of $0.8~bpm$ for RR and $3.2~bpm$ for HR. Accuracy deteriorates at short ($<60~cm$) and long ($>100~cm$) distances due to multipath, near-field, and signal-to-noise effects. Increasing chirp count enhances performance: RR errors converge asymptotically for $\geq96$ chirps, while HR requires at least 96 chirps for stable detection. Variability metrics, including heart and respiratory rate variability, remain less accurate ($>15$--$30\%$ error), indicating limited capability in capturing instantaneous fluctuations. These findings define a fundamental trade-off: the radar ensures robust estimation of average RR and HR but exhibits restricted precision in high-resolution beat-to-beat and breath-to-breath monitoring.

2603.09760 2026-03-11 cs.CV cs.RO eess.IV

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments

Guoliang Zhu, Wanjun Jia, Caoyang Shao, Yuheng Zhang, Zhiyong Li, Kailun Yang

Comments The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet

详情
英文摘要

Global perception is essential for embodied agents in 360° spaces, yet current affordance grounding remains largely object-centric and restricted to perspective views. To bridge this gap, we introduce a novel task: Holistic Affordance Grounding in 360° Indoor Environments. This task faces unique challenges, including severe geometric distortions from Equirectangular Projection (ERP), semantic dispersion, and cross-scale alignment difficulties. We propose PanoAffordanceNet, an end-to-end framework featuring a Distortion-Aware Spectral Modulator (DASM) for latitude-dependent calibration and an Omni-Spherical Densification Head (OSDH) to restore topological continuity from sparse activations. By integrating multi-level constraints comprising pixel-wise, distributional, and region-text contrastive objectives, our framework effectively suppresses semantic drift under low supervision. Furthermore, we construct 360-AGD, the first high-quality panoramic affordance grounding dataset. Extensive experiments demonstrate that PanoAffordanceNet significantly outperforms existing methods, establishing a solid baseline for scene-level perception in embodied intelligence. The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet.

2603.09737 2026-03-11 cs.CV cs.RO eess.IV

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

Kaixin Lin, Kunyu Peng, Di Wen, Yufan Chen, Ruiping Liu, Kailun Yang

Comments The source code will be publicly released at https://github.com/qixi7up/M2-Occ

详情
英文摘要

Semantic occupancy prediction enables dense 3D geometric and semantic understanding for autonomous driving. However, existing camera-based approaches implicitly assume complete surround-view observations, an assumption that rarely holds in real-world deployment due to occlusion, hardware malfunction, or communication failures. We study semantic occupancy prediction under incomplete multi-camera inputs and introduce $M^2$-Occ, a framework designed to preserve geometric structure and semantic coherence when views are missing. $M^2$-Occ addresses two complementary challenges. First, a Multi-view Masked Reconstruction (MMR) module leverages the spatial overlap among neighboring cameras to recover missing-view representations directly in the feature space. Second, a Feature Memory Module (FMM) introduces a learnable memory bank that stores class-level semantic prototypes. By retrieving and integrating these global priors, the FMM refines ambiguous voxel features, ensuring semantic consistency even when observational evidence is incomplete. We introduce a systematic missing-view evaluation protocol on the nuScenes-based SurroundOcc benchmark, encompassing both deterministic single-view failures and stochastic multi-view dropout scenarios. Under the safety-critical missing back-view setting, $M^2$-Occ improves the IoU by 4.93%. As the number of missing cameras increases, the robustness gap further widens; for instance, under the setting with five missing views, our method boosts the IoU by 5.01%. These gains are achieved without compromising full-view performance. The source code will be publicly released at https://github.com/qixi7up/M2-Occ.

2603.09729 2026-03-11 q-bio.NC cs.RO cs.SY eess.SY

Efficient and robust control with spikes that constrain free energy

André Urbano, Pablo Lanillos, Sander Keemink

详情
英文摘要

Animal brains exhibit remarkable efficiency in perception and action, while being robust to both external and internal perturbations. The means by which brains accomplish this remains, for now, poorly understood, hindering our understanding of animal and human cognition, as well as our own implementation of efficient algorithms for control of dynamical systems.A potential candidate for a robust mechanism of state estimation and action computation is the free energy principle, but existing implementations of this principle have largely relied on conventional, biologically implausible approaches without spikes. We propose a novel, efficient, and robust spiking control framework with realistic biological characteristics. The resulting networks function as free energy constrainers, in which neurons only fire if they reduce the free energy of their internal representation. The networks offer efficient operation through highly sparse activity while matching performance with other similar spiking frameworks, and have high resilience against both external (e.g. sensory noise or collisions) and internal perturbations (e.g. synaptic noise and delays or neuron silencing) that such a network would be faced with when deployed by either an organism or an engineer. Overall, our work provides a novel mathematical account for spiking control through constraining free energy, providing both better insight into how brain networks might leverage their spiking substrate and a new route for implementing efficient control algorithms in neuromorphic hardware.

2603.09725 2026-03-11 eess.AS

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition

Dimme de Groot, Yuanyuan Zhang, Jorge Martinez, Odette Scharenborg

Comments Submitted to Interspeech 2026

详情
英文摘要

We present DRES: a 1.5-hour Dutch realistic elicited (semi-spontaneous) speech dataset from 80 speakers recorded in noisy, public indoor environments. DRES was designed as a test set for the evaluation of state-of-the-art (SOTA) automatic speech recognition (ASR) and speech enhancement (SE) models in a real-world scenario: a person speaking in a public indoor space with background talkers and noise. The speech was recorded with a four-channel linear microphone array. In this work we evaluate the speech quality of five well-known single-channel SE algorithms and the recognition performance of eight SOTA off-the-shelf ASR models before and after applying SE on the speech of DRES. We found that five out of the eight ASR models have WERs lower than 22% on DRES, despite the challenging conditions. In contrast to recent work, we did not find a positive effect of modern single-channel SE on ASR performance, emphasizing the importance of evaluating in realistic conditions.

2603.09714 2026-03-11 cs.SD cs.AI cs.CL eess.AS

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi Lee

Comments 6 pages, 3 figures, 3 tables. Dataset: https://huggingface.co/Multi-Audio-Grounding

详情
英文摘要

While multi-audio understanding is critical for large audio-language models (LALMs), it remains underexplored. We introduce MUGEN, a comprehensive benchmark evaluating this capability across speech, general audio, and music. Our experiments reveal consistent weaknesses in multi-audio settings, and performance degrades sharply as the number of concurrent audio inputs increases, identifying input scaling as a fundamental bottleneck. We further investigate training-free strategies and observe that Audio-Permutational Self-Consistency, which diversifies the order of audio candidates, helps models form more robust aggregated predictions, yielding up to 6.28% accuracy gains. Combining this permutation strategy with Chain-of-Thought further improves performance to 6.74%. These results expose blind spots in current LALMs and provide a foundation for evaluating complex auditory comprehension.

2603.09671 2026-03-11 eess.SY cs.SY

Embedded Model Predictive Control for EMS-type Maglev Vehicles

Arnim Kargl, Mario Hermle, Zhiqiang Zhang, Yanmin Li, Dainan Zhao, Yong Cui, Peter Eberhard

详情
英文摘要

Current developments of high-speed magnetic levitation technology using the principle of the electromagnet suspension (EMS) focus on reaching vehicle speeds of more than 600 km/h. With increasing vehicle speeds, however, updated control algorithms need to be investigated to reliably stabilize the system and meet the demands in terms of ride comfort. This article examines the modern and popular approach of model predictive control and its application to the magnetic levitation control system. Investigated key aspects are the parameterization of the model predictive controller and its implementation on embedded, resource constrained hardware. The results reveal that model predictive control is capable to robustly stabilize the highly nonlinear and constrained system even at very high speed. Furthermore, processor-in-the-loop studies are carried out to validate the designed control algorithms on a microcontroller.

2603.09657 2026-03-11 cs.CV cs.AI cs.ET eess.IV

When to Lock Attention: Training-Free KV Control in Video Diffusion

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian Wang

Comments 18 pages, 9 figures, 3 tables

详情
英文摘要

Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model's capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. Our core insight is that the hallucination metric (variance of denoising prediction) directly quantifies generation diversity, which is inherently linked to the classifier-free guidance (CFG) scale. Building upon this, KV-Lock leverages diffusion hallucination detection to dynamically schedule two key components: the fusion ratio between cached background key-values (KVs) and newly generated KVs, and the CFG scale. When hallucination risk is detected, KV-Lock strengthens background KV locking and simultaneously amplifies conditional guidance for foreground generation, thereby mitigating artifacts and improving generation fidelity. As a training-free, plug-and-play module, KV-Lock can be easily integrated into any pre-trained DiT-based models. Extensive experiments validate that our method outperforms existing approaches in improved foreground quality with high background fidelity across various video editing tasks.

2603.09644 2026-03-11 eess.SP cs.IT math.IT

Site-Specific Finetuning of Neural Receivers with Real-World 5G NR Measurements

Nuri Berke Baytekin, Reinhard Wiesmayr, Sebastian Cammerer, Chris Dick, Christoph Studer

Comments This work has been submitted to the 2026 IEEE 27th International Workshop on Signal Processing and Artificial Intelligence in Wireless Communications (IEEE SPAWC 2026)

详情
英文摘要

Finetuning wireless receivers to a specific deployment scenario can yield significant error-rate performance improvements without increasing processing complexity. However, site-specific finetuning has so far only been demonstrated on synthetic channel data and lacks real-world benchmarks. In this work, we empirically study site-specific finetuning of neural receivers using real-world 5G NR physical uplink shared channel (PUSCH) data collected with an over-the-air testbed at ETH Zurich across three scenarios: (i) a small laboratory, (ii) a large office floor, and (iii) a high-mobility outdoor environment. Our results confirm substantial error-rate performance improvements from site-specific finetuning, consistent with earlier findings based on synthetic channel data. Moreover, we demonstrate that these improvements generalize across different user-equipment hardware and deployment scenarios.

2603.09627 2026-03-11 eess.AS

Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models

Dehua Tao, Xuan Luo, Daxin Tan, Kai Chen, Lanqing Hong, Jing Li, Ruifeng Xu, Xiao Chen

详情
英文摘要

While large-scale omni-models have demonstrated impressive capabilities across various modalities, their strong performance heavily relies on massive multimodal data and incurs substantial computational costs. This work introduces Speech-Omni-Lite, a cost-efficient framework for extending pre-trained Visual-Language (VL) backbones with speech understanding and generation capabilities, while fully preserving the backbones' vision-language performance. Specifically, the VL backbone is equipped with two lightweight, trainable plug-and-play modules, a speech projector and a speech token generator, while keeping the VL backbone fully frozen. To mitigate the scarcity of spoken QA corpora, a low-cost data construction strategy is proposed to generate Question-Text Answer-Text-Speech (QTATS) data from existing ASR speech-text pairs, facilitating effective speech generation training. Experimental results show that, even with only thousands of hours of speech training data, Speech-Omni-Lite achieves excellent spoken QA performance, which is comparable to omni-models trained on millions of hours of speech data. Furthermore, the learned speech modules exhibit strong transferability across VL backbones.

2603.09617 2026-03-11 eess.SY cs.SY

Constrained finite-time stabilization by model predictive control: an infinite control horizon framework

Bing Zhu, Xiaozhuoer Yuan, Zewei Zheng, Zongyu Zuo

Comments 10 pages, 5 figures

详情
英文摘要

Existing results on finite-time model predictive control (MPC) often rely on terminal equality constraint, switching inside one-step region, or terminal cost with short control horizon, leading to limited initial feasibility. This paper proposes an infinite-horizon Model Predictive Control (MPC) framework for the constrained finite-time stabilization of discrete-time systems, overcoming limitations found in existing finite-time MPC results. The proposed framework is built upon a terminal cost strategy, but expands it by replacing the short-horizon terminal cost with the sum of stage costs over an infinite control horizon. This design choice significantly enlarges the initial feasibility region and avoids the need for terminal equality constraints or switching strategies during implementation. It is proved that the proposed finite-time MPC guarantees finite-time stabilization performance once the state trajectory enters the predefined terminal set. The infinite-horizon finite-time MPC is shown to be equivalently implementable as a finite-horizon MPC with a terminal cost, thereby ensuring computational tractability. The proposed finite-time MPC is systematically extended and shown to be applicable to both constrained multi-input linear systems and a class of constrained nonlinear systems that are feedback linearizable.

2603.09590 2026-03-11 cs.CR eess.SP

Benchmarking Dataset for Presence-Only Passive Reconnaissance in Wireless Smart-Grid Communications

Bochra Al Agha, Razane Tajeddine

详情
英文摘要

Benchmarking presence-only passive reconnaissance in smart-grid communications is challenging because the adversary is receive-only, yet nearby observers can still alter propagation through additional shadowing and multipath that reshapes channel coherence. Public smart-grid cybersecurity datasets largely target active protocol- or measurement-layer attacks and rarely provide propagation-driven observables with tiered topology context, which limits reproducible evaluation under strictly passive threat models. This paper introduces an IEEE-inspired, literature-anchored benchmark dataset generator for passive reconnaissance over a tiered Home Area Network (HAN), Neighborhood Area Network (NAN), and Wide Area Network (WAN) communication graph with heterogeneous wireless and wireline links. Node-level time series are produced through a physically consistent channel-to-metrics mapping where channel state information (CSI) is represented via measurement-realistic amplitude and phase proxies that drive inferred signal-to-noise ratio (SNR), packet error behavior, and delay dynamics. Passive attacks are modeled only as windowed excess attenuation and coherence degradation with increased channel innovation, so reliability and latency deviations emerge through the same causal mapping without labels or feature shortcuts. The release provides split-independent realizations with burn-in removal, strictly causal temporal descriptors, adjacency-weighted neighbor aggregates and deviation features, and federated-ready per-node train, validation, and test partitions with train-only normalization metadata. Baseline federated experiments highlight technology-dependent detectability and enable standardized benchmarking of graph-temporal and federated detectors for passive reconnaissance.

2603.09579 2026-03-11 eess.SP

Low-Rank Cyclostationarity Predictive Routing Is Almost as Good as Real-Time Data-based Routing

Oriel-Singer, Ilai-Bistritz, Giseung-Park, Woohyeon-Byeon, Youngchul-Sung, Amir-Leshem

Comments 4 figures, 2 tables

详情
英文摘要

Dynamic shortest-path routing, using real-time traffic data, enables path selection responsive to evolving conditions. Nevertheless, transportation planning tasks such as adaptive congestion pricing, fleet routing, and long-term operational decisions rely on offline traffic estimators. To address this problem, we develop a spatiotemporal predictor based on a low-rank decomposition of the traffic matrix and the temporal subspace coefficients. Using a recent large-scale measurement campaign over the Seoul road network, we show that our proposed predictor incurs an average excess travel time of less than 1.5 minutes. Moreover, our predictor's tail of the excess travel time distribution matches that of a near-real-time predictor. Results based on one year of traffic data are also demonstrated in simulations.

2603.09577 2026-03-11 cs.IT cs.CR cs.DC cs.SC eess.SP math.IT

Randomized Distributed Function Computation (RDFC): Ultra-Efficient Semantic Communication Applications to Privacy

Onur Günlü

详情
英文摘要

We establish the randomized distributed function computation (RDFC) framework, in which a sender transmits just enough information for a receiver to generate a randomized function of the input data. Describing RDFC as a form of semantic communication, which can be essentially seen as a generalized remote-source-coding problem, we show that security and privacy constraints naturally fit this model, as they generally require a randomization step. Using strong coordination metrics, we ensure (local differential) privacy for every input sequence and prove that such guarantees can be met even when no common randomness is shared between the transmitter and receiver. This work provides lower bounds on Wyner's common information (WCI), which is the communication cost when common randomness is absent, and proposes numerical techniques to evaluate the other corner point of the RDFC rate region for continuous-alphabet random variables with unlimited shared randomness. Experiments illustrate that a sufficient amount of common randomness can reduce the semantic communication rate by up to two orders of magnitude compared to the WCI point, while RDFC without any shared randomness still outperforms lossless transmission by a large margin. A finite blocklength analysis further confirms that the privacy parameter gap between the asymptotic and non-asymptotic RDFC methods closes exponentially fast with input length. Our results position RDFC as an energy-efficient semantic communication strategy for privacy-aware distributed computation systems.

2603.09508 2026-03-11 eess.AS

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

Bunlong Lay, Timo Gerkmann

详情
英文摘要

Diffusion Probabilistic Models (DPMs) are a well-established class of diffusion models for unconditional image generation, while SGMSE+ is a well-established conditional diffusion model for speech enhancement. One of the downsides of diffusion models is that solving the reverse process requires many evaluations of a large Neural Network. Although advanced fast sampling solvers have been developed for DPMs, they are not directly applicable to models such as SGMSE+ due to differences in their diffusion processes. Specifically, DPMs transform between the data distribution and a standard Gaussian distribution, whereas SGMSE+ interpolates between the target distribution and a noisy observation. This work first develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and second proposes a solver for iSDEs. The proposed solver enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

2603.09505 2026-03-11 eess.AS

End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments

Rui Wang, Zhifei Zhang, Yu Gao, Xiaofeng Mou, Yi Xu

Comments Submitted for review to Interspeech 2026

详情
英文摘要

Keyword spotting (KWS) is crucial for many speech-driven applications, but robust KWS in noisy environments remains challenging. Conventional systems often rely on single-channel inputs and a cascaded pipeline separating front-end enhancement from KWS. This precludes joint optimization, inherently limiting performance. We present an end-to-end multi-channel KWS framework that exploits spatial cues to improve noise robustness. A spatial encoder learns inter-channel features, while a spatial embedding injects directional priors; the fused representation is processed by a streaming backbone. Experiments in simulated noisy conditions across multiple signal-to-noise ratios (SNRs) show that spatial modeling and directional priors each yield clear gains over baselines, with their combination achieving the best results. These findings validate end-to-end multi-channel spatial modeling, indicating strong potential for the target-speaker-aware detection in complex acoustic scenarios.