arXivDaily arXiv每日学术速递 周一至周五更新
2602.04834 2026-02-05 physics.optics eess.IV

ConvRML: High-Quality Lensless Imaging with Random Multi-Focal Lenslets

Leyla A. Kabuli, Clara S. Hung, Vasilisa Ponomarenko, Eric Markley, Laura Waller

Comments 28 pages, 11 figures

详情
英文摘要

Mask-based lensless imagers use simple optics and computational reconstruction to design compact form factor cameras with compressive imaging ability. However, these imagers generally suffer from poor reconstruction quality. Here, we describe several advances in both hardware and software that result in improved lensless imaging quality. First, we use a precision-manufactured random multi-focal lenslet (RML) phase mask to produce improved measurements with reduced multiplexing. Next, we implement a ConvNeXt-based reconstruction architecture, which provides up to 6.68 dB improvement in peak signal-to-noise ratio over state-of-the-art attention-based architectures. Finally, we establish a parallel imaging setup that simultaneously images a scene with RML, diffuser and lens systems, with which we collect datasets with 100,000 measurements for each system, to be used for reconstruction model training and evaluation. Using this standardized system, we quantify the improved measurement quality of the RML compared to a diffuser using the modulation transfer function and mutual information. Our ConvRML system benefits from both the optical and the computational developments presented in this work, and our contributions establish resources to support continued development of high-quality, compact, and compressive lensless imagers.

2602.04803 2026-02-05 eess.SP

Safe-NEureka: a Hybrid Modular Redundant DNN Accelerator for On-board Satellite AI Processing

Riccardo Tedeschi, Luigi Ghionda, Alessandro Nadalini, Yvan Tortorella, Arpan Suravi Prasad, Luca Benini, Davide Rossi, Francesco Conti

Comments 22 pages, 13 figures, ACM journal format

详情
英文摘要

Low Earth Orbit (LEO) constellations are revolutionizing the space sector, with on-board Artificial Intelligence (AI) becoming pivotal for next-generation satellites. AI acceleration is essential for safety-critical functions such as autonomous Guidance, Navigation, and Control (GNC), where errors cannot be tolerated, and performance-critical processing of high-bandwidth sensor data, where occasional errors are tolerable. Consequently, AI accelerators for satellites must combine robust protection against radiation-induced faults with high throughput. This paper presents Safe-NEureka, a Hybrid Modular Redundant Deep Neural Network (DNN) accelerator for heterogeneous RISC-V systems. It operates in two modes: a redundancy mode utilizing Dual Modular Redundancy (DMR) with hardware-based recovery, and a performance mode repurposing redundant datapaths to maximize parallel throughput. Furthermore, its memory interface is protected by Error Correction Codes (ECCs), and the controller by Triple Modular Redundancy (TMR). Implementation in GlobalFoundries 12nm technology shows a 96 reduction in faulty executions in redundancy mode, with a manageable 15 area overhead. In performance mode, the architecture achieves near-baseline speeds on 3x3 dense convolutions with a 5 throughput and 11 efficiency reduction, compared to 48 and 53 in redundancy mode. This flexibility ensures high overheads are limited to critical tasks, establishing Safe-NEureka as a versatile solution for space applications.

2602.04801 2026-02-05 eess.SY cs.SY

SQP-Based Cable-Tension Allocation for Multi-Drone Load Transport

Lamberto Vazquez-Soqui, Fatima Oliva-Palomo, Diego Mercado-Ravell, Pedro Castillo

详情
英文摘要

Multi-Agent Aerial Load Transport Systems (MAATS) offer greater payload capacity and fault tolerance than single-drone solutions. However, they have an underdetermined tension allocation problem that leads to uneven energy distribution, cable slack, or collisions between drones and cables. This paper presents a real-time optimization layer that improves a hierarchical load-position-attitude controller by incorporating a Sequential Quadratic Programming (SQP) algorithm. The SQP formulation minimizes the sum of squared cable tensions while imposing a cable-alignment penalty that discourages small inter-cable angles, thereby preventing tether convergence without altering the reference trajectory. We tested the method under nominal conditions by running numerical simulations of four quadrotors. Computational experiments based on numerical simulations demonstrate that the SQP routine runs in a few milliseconds on standard hardware, indicating feasibility for real-time use. A sensitivity analysis confirms that the gain of the cable-alignment penalty can be tuned online, enabling a controllable trade-off between safety margin and energy consumption with no measurable degradation of tracking performance in simulation. This framework provides a scalable path to safe and energy-balanced cooperative load transport in practical deployments.

2602.04776 2026-02-05 cs.SD cs.CL eess.AS

Speaker-Aware Simulation Improves Conversational Speech Recognition

Máté Gedeon, Péter Mihajlik

详情
英文摘要

Automatic speech recognition (ASR) for conversational speech remains challenging due to the limited availability of large-scale, well-annotated multi-speaker dialogue data and the complex temporal dynamics of natural interactions. Speaker-aware simulated conversations (SASC) offer an effective data augmentation strategy by transforming single-speaker recordings into realistic multi-speaker dialogues. However, prior work has primarily focused on English data, leaving questions about the applicability to lower-resource languages. In this paper, we adapt and implement the SASC framework for Hungarian conversational ASR. We further propose C-SASC, an extended variant that incorporates pause modeling conditioned on utterance duration, enabling a more faithful representation of local temporal dependencies observed in human conversation while retaining the simplicity and efficiency of the original approach. We generate synthetic Hungarian dialogues from the BEA-Large corpus and combine them with real conversational data for ASR training. Both SASC and C-SASC are evaluated extensively under a wide range of simulation configurations, using conversational statistics derived from CallHome, BEA-Dialogue, and GRASS corpora. Experimental results show that speaker-aware conversational simulation consistently improves recognition performance over naive concatenation-based augmentation. While the additional duration conditioning in C-SASC yields modest but systematic gains--most notably in character-level error rates--its effectiveness depends on the match between source conversational statistics and the target domain. Overall, our findings confirm the robustness of speaker-aware conversational simulation for Hungarian ASR and highlight the benefits and limitations of increasingly detailed temporal modeling in synthetic dialogue generation.

2602.04725 2026-02-05 cs.LG eess.SP

Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods

Neville Mathew, Yidan Shen, Renjie Hu, Maham Rahimi, George Zouridakis

详情
英文摘要

Cuffless blood pressure screening based on easily acquired photoplethysmography (PPG) signals offers a practical pathway toward scalable cardiovascular health assessment. Despite rapid progress, existing PPG-based blood pressure estimation models have not consistently achieved the established clinical numerical limits such as AAMI/ISO 81060-2, and prior evaluations often lack the rigorous experimental controls necessary for valid clinical assessment. Moreover, the publicly available datasets commonly used are heterogeneous and lack physiologically controlled conditions for fair benchmarking. To enable fair benchmarking under physiologically controlled conditions, we created a standardized benchmarking subset NBPDB comprising 101,453 high-quality PPG segments from 1,103 healthy adults, derived from MIMIC-III and VitalDB. Using this dataset, we systematically benchmarked several state-of-the-art PPG-based models. The results showed that none of the evaluated models met the AAMI/ISO 81060-2 accuracy requirements (mean error $<$ 5 mmHg and standard deviation $<$ 8 mmHg). To improve model accuracy, we modified these models and added patient demographic data such as age, sex, and body mass index as additional inputs. Our modifications consistently improved performance across all models. In particular, the MInception model reduced error by 23\% after adding the demographic data and yielded mean absolute errors of 4.75 mmHg (SBP) and 2.90 mmHg (DBP), achieves accuracy comparable to the numerical limits defined by AAMI/ISO accuracy standards. Our results show that existing PPG-based BP estimation models lack clinical practicality under standardized conditions, while incorporating demographic information markedly improves their accuracy and physiological validity.

2602.04720 2026-02-05 math.DS cs.SY eess.SY

On Data-Driven Unbiased Predictors using the Koopman Operator

Roland Schurig, Pieter van Goor, Karl Worthmann, Rolf Findeisen

Comments This paper is currently under review for ECC 2026

详情
英文摘要

The Koopman operator and its data-driven approximations, such as extended dynamic mode decomposition (EDMD), are widely used for analysing, modelling, and controlling nonlinear dynamical systems. However, when the true Koopman eigenfunctions cannot be identified from finite data, multi-step predictions may suffer from structural inaccuracies and systematic bias. To address this issue, we analyse the first and second moments of the multi-step prediction residual. By decomposing the residual into contributions from the one-step approximation error and the propagation of accumulated inaccuracies, we derive a closed-form expression characterising these effects. This analysis enables the development of a novel and computationally efficient algorithm that enforces unbiasedness and reduces variance in the resulting predictor. The proposed method is validated in numerical simulations, showing improved uncertainty properties compared to standard EDMD. These results lay the foundation for uncertainty-aware and unbiased Koopman-based prediction frameworks that can be extended to controlled and stochastic systems.

2602.04704 2026-02-05 eess.SP

Resilient Channel Charting Under Varying Radio Link Availability

Jonas Pirkl, Jonathan Ott, Maximilian Stahlke, George Yammine, Tobias Feigl, Christopher Mutschler

详情
英文摘要

Channel charting (CC) has become a key technology for RF-based localization, enabling unsupervised radio fingerprinting, even in non line of sight scenarios, with a minimum of reference position labels. However, most CC models assume fixed-size inputs, such as a constant number of antennas or channel measurements. In practical systems, antennas may fail, signals may be blocked, or antenna sets may change during handovers, making fixed-input architectures fragile. Existing radio-fingerprinting approaches address this by training separate models for each antenna configuration, but the resulting training effort scales prohibitively with the array size. We propose Adaptive Positioning (AdaPos), a CC architecture that natively handles variable numbers of channel measurements. AdaPos combines convolutional feature extraction with a transformer-based encoder using learnable antenna identifiers and self-attention to fuse arbitrary subsets of CSI inputs. Experiments on two public real-world datasets (SISO and MIMO) show that AdaPos maintains state-of-the-art accuracy under missing-antenna conditions and replaces roughly 57 configuration-specific models with a single unified model. With AdaPos and our novel training strategies, we provide resilience to both individual antenna failures and full-array outages.

2602.04681 2026-02-05 eess.SP

HFMCA: Orthonormal Feature Learning for EEG-based Brain Decoding

Yinghao Wang, Lintao Xu, Shujian Yu, Enzo Tartaglione, Van-Tam Nguyen

详情
英文摘要

Electroencephalography (EEG) analysis is critical for brain-computer interfaces and neuroscience, but the intrinsic noise and high dimensionality of EEG signals hinder effective feature learning. We propose a self-supervised framework based on the Hierarchical Functional Maximal Correlation Algorithm (HFMCA), which learns orthonormal EEG representations by enforcing feature decorrelation and reducing redundancy. This design enables robust capture of essential brain dynamics for various EEG recognition tasks. We validate HFMCA on two benchmark datasets, SEED and BCIC-2A, where pretraining with HFMCA consistently outperforms competitive self-supervised baselines, achieving notable gains in classification accuracy. Across diverse EEG tasks, our method demonstrates superior cross-subject generalization under leave-one-subject-out validation, advancing state-of-the-art by 2.71\% on SEED emotion recognition and 2.57\% on BCIC-2A motor imagery classification.

2602.04656 2026-02-05 eess.SY cs.SY

Safe Adaptive Control of Parabolic PDE-ODE Cascades

Yun Jiang, Ji Wang

详情
英文摘要

In this paper, we propose a safe adaptive boundary control strategy for a class of parabolic partial differential equation-ordinary differential equation (PDE-ODE) cascaded systems with parametric uncertainties in both the PDE and ODE subsystems. The proposed design is built upon an adaptive Control Barrier Function (aCBF) framework that incorporates high-relative-degree CBFs together with a batch least-squares identification (BaLSI)-based adaptive control that guarantees exact parameter identification in finite time. The proposed control law ensures that: (i) if the system output state initially lies within a prescribed safe set, safety is maintained for all time; otherwise, the output is driven back into the safe region within a preassigned finite time; and (ii) convergence to zero of all plant states is achieved. Numerical simulations are provided to demonstrate the effectiveness of the proposed approach.

2602.04650 2026-02-05 eess.SP cs.LG

Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models

Ariel Rodrigez, Alejandro Lancho, Amir Weiss

Comments 6 pages, 6 figures, 1 table, accepted at the 2026 IEEE International Conference on Communications

详情
英文摘要

The increasingly crowded radio frequency (RF) spectrum forces communication signals to coexist, creating heterogeneous interferers whose structure often departs from Gaussian models. Recovering the interference-contaminated signal of interest in such settings is a central challenge, especially in single-channel RF processing. Existing data-driven methods often assume that the interference type is known, yielding ensembles of specialized models that scale poorly with the number of interferers. We show that detect-then-separate (DTS) strategies admit an analytical justification: within a Gaussian mixture framework, a plug-in maximum a posteriori detector followed by type-conditioned optimal estimation achieves asymptotic minimum mean-square error optimality under a mild temporal-diversity condition. This makes DTS a principled benchmark, but its reliance on multiple type-specific models limits scalability. Motivated by this, we propose a unified joint model (UJM), in which a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal. Using tailored UNet architectures for baseband (complex-valued) RF signals, we compare DTS and UJM on synthetic and recorded interference types, showing that a capacity-matched UJM can match oracle-aided DTS performance across diverse signal-to-interference-and-noise ratios, interference types, and constellation orders, including mismatched training and testing type-uncertainty proportions. These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader regimes.

2602.04623 2026-02-05 eess.SP

Total Variation Sparse Bayesian Learning for Block Sparsity via Majorization-Minimization

Yanbin He, Geethu Joseph

Comments Submitted to EUSIPCO

详情
英文摘要

Block sparsity is a widely exploited structure in sparse recovery, offering significant gains when signal blocks are known. Yet, practical signals often exhibit unknown block boundaries and isolated non-zero entries, which challenge traditional approaches. A promising method to handle such complex sparsity patterns is the difference-of-logs total variation (DoL-TV) regularized sparse Bayesian learning (SBL). However, due to the complex form of DoL-TV term, the resulting optimization problem is hard to solve. This paper develops a new optimization framework for the DoL-TV SBL cost function. By introducing an exponential reparameterization of the SBL hyperparameters, we reveal a novel structure that admits a majorization-minimization formulation and naturally extends to unknown noise variance estimation. Sparse recovery results on both synthetic data and extended source direction-of-arrival estimation demonstrate improved accuracy and runtime performance compared to benchmark methods.

2602.04609 2026-02-05 cs.LG cs.SY eess.SY

Resilient Load Forecasting under Climate Change: Adaptive Conditional Neural Processes for Few-Shot Extreme Load Forecasting

Chenxi Hu, Yue Ma, Yifan Wu, Yunhe Hou

详情
英文摘要

Extreme weather can substantially change electricity consumption behavior, causing load curves to exhibit sharp spikes and pronounced volatility. If forecasts are inaccurate during those periods, power systems are more likely to face supply shortfalls or localized overloads, forcing emergency actions such as load shedding and increasing the risk of service disruptions and public-safety impacts. This problem is inherently difficult because extreme events can trigger abrupt regime shifts in load patterns, while relevant extreme samples are rare and irregular, making reliable learning and calibration challenging. We propose AdaCNP, a probabilistic forecasting model for data-scarce condition. AdaCNP learns similarity in a shared embedding space. For each target data, it evaluates how relevant each historical context segment is to the current condition and reweights the context information accordingly. This design highlights the most informative historical evidence even when extreme samples are rare. It enables few-shot adaptation to previously unseen extreme patterns. AdaCNP also produces predictive distributions for risk-aware decision-making without expensive fine-tuning on the target domain. We evaluate AdaCNP on real-world power-system load data and compare it against a range of representative baselines. The results show that AdaCNP is more robust during extreme periods, reducing the mean squared error by 22\% relative to the strongest baseline while achieving the lowest negative log-likelihood, indicating more reliable probabilistic outputs. These findings suggest that AdaCNP can effectively mitigate the combined impact of abrupt distribution shifts and scarce extreme samples, providing a more trustworthy forecasting for resilient power system operation under extreme events.

2602.04578 2026-02-05 eess.SY cs.SY

Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour

Meng Yuan, Ye Wang, Xinghuo Yu, Torsten Wik, Changfu Zou

详情
英文摘要

The widespread adoption of photovoltaic (PV), electric vehicles (EVs), and stationary energy storage systems (ESS) in households increases system complexity while simultaneously offering new opportunities for energy regulation. However, effectively coordinating these resources under uncertainties remains challenging. This paper proposes a novel home energy management framework based on deep reinforcement learning (DRL) that can jointly minimise energy expenditure and battery degradation while guaranteeing occupant comfort and EV charging requirements. Distinct from existing studies, we explicitly account for the heterogeneous degradation characteristics of stationary and EV batteries in the optimisation, alongside stochastic user behaviour regarding arrival time, departure time, and driving distance. The energy scheduling problem is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian soft actor-critic (SAC) algorithm. This approach enables the agent to learn optimal control policies that enforce physical constraints, including indoor temperature bounds and target EV state of charge upon departure, despite stochastic uncertainties. Numerical simulations over a one-year horizon demonstrate the effectiveness of the proposed framework in satisfying physical constraints while eliminating thermal oscillations and achieving significant economic benefits. Specifically, the method reduces the cumulative operating cost substantially compared to two standard rule-based baselines while simultaneously decreasing battery degradation costs by 8.44%.

2602.04568 2026-02-05 eess.SY cs.SY math.OC

Peak Bounds for the Estimation Error under Sensor Attacks

Axel Stafström, Daniel Arnström, Adam Miksits, David Umsonst

Comments 7 pages, 3 figures, accepted at the American Control Conference 2026

详情
英文摘要

This paper investigates bounds on the estimation error of a linear system affected by norm-bounded disturbances and full sensor attacks. The system is equipped with a detector that evaluates the norm of the innovation signal to detect faults, and the attacker wants to avoid detection. We utilize induced $L_\infty$ system norms, also called \emph{peak-to-peak} norms, to compare the estimation error bounds under nominal operations and under attack. This leads to a sufficient condition for when the bound on the estimation error is smaller during an attack than during nominal operation. This condition is independent of the attack strategy and depends only on the attacker's desire to remain undetected and (indirectly) the observer gain. Therefore, we investigate both an observer design method, that seeks to reduce the error bound under attack while keeping the nominal error bound low, and detector threshold tuning. As a numerical illustration, we show how a sensor attack can deactivate a robust safety filter based on control barrier functions if the attacked error bound is larger than the nominal one. We also statistically evaluate our observer design method and the effect of the detector threshold.

2602.04465 2026-02-05 eess.SP

An Information-Theoretic Detector for Multiple Scatterers in SAR Tomography

Pia Addabbo, Diego Reale, Antonio Pauciullo, Gianfranco Fornaro, Danilo Orlando

详情
英文摘要

Persistent scatterer interferometry and Synthetic Aperture Radar (SAR) Tomography are powerful tools for the detection and time monitoring of persistent scatterers. They have been proven to be effective in urban scenarios, especially for buildings and infrastructures 3-D reconstruction and monitoring of deformation. In urban areas, occurrence of layover leads to the presence of multiple contributions within the same image pixel from scatterers located at different heights. In the context of SAR Tomography, this problem can be addressed by considering a multiple hypothesis test to detect the presence of feasible multiple scatterers [1][2]. In the present paper, we consider this problem in the framework of the information theory and exploit the theoretical tool, developed in [3], to design a one-stage adaptive architecture for multiple hypothesis testing problems in the context of SAR Tomography. Moreover, we resort to the compressive sensing approach for the estimation of the unknown parameters under each hypothesis. This architecture has been verified on both simulated as well as real data also in comparison with suitable counterparts.

2602.04410 2026-02-05 eess.SP

Rigid Body Localization via Gaussian Belief Propagation with Quadratic Angle Approximation

Niclas Führling, Hyeon Seok Rou, Giuseppe Abreu, David González G., Osvaldo Gonsa

详情
英文摘要

Gaussian belief propagation (GaBP) is a technique that relies on linearized error and input-output models to yield low-complexity solutions to complex estimation problems, which has been recently shown to be effective in the design of range-based GaBP schemes for stationary and moving rigid body localization (RBL) in three-dimensional (3D) space, as long as an accurate prior on the orientation of the target rigid body is available. In this article we present a novel range-based RBL scheme via GaBP that removes the latter limitation. To this end, the proposed method incorporates a quadratic angle approximation to linearize the relative orientation between the prior and the target rigid body, enabling high precision estimates of corresponding rotation angles even for large deviations. Leveraging the resulting linearized model, we derive the corresponding message-passing (MP) rules to obtain estimates of the translation vector and rotation matrix of the target rigid body, relative to a prior reference frame. Numerical results corroborate the good performance of the proposed angle approximation itself, as well as the consequent RBL performance in terms of root mean square errors (RMSEs) in comparison to the state-of-the-art (SotA), while maintaining a low computational complexity

2601.13252 2026-02-05 cs.RO cs.SY eess.SY

Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints

Mahmud S. Zango, Jianglin Lan

Comments 30 pages, 5 figures, 2 table. Review article

详情
英文摘要

Autonomous navigation for nano-scale unmanned aerial vehicles (nano-UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub-100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state-of-the-art in sensing, computing, and control architectures designed specifically for these sub- 100mW computational envelopes. We critically analyse the transition from classical geometry-based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra-low-power System-on-Chips (SoCs) and neuromorphic event-based control. Beyond algorithms, we evaluate the hardware-software co-design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning-based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long-term endurance, robust obstacle avoidance in dynamic environments, and the "Sim-to-Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data-driven perception to enable fully autonomous, agile nano-UAVs in GPS-denied environments.

2601.00459 2026-02-05 cs.LG eess.SP

Combining Residual U-Net and Data Augmentation for Dense Temporal Segmentation of Spike Wave Discharges in Single-Channel EEG

Saurav Sengupta, Scott Kilianski, Suchetha Sharma, Sakina Lashkeri, Ashley McHugh, Mark Beenhakker, Donald E. Brown

详情
英文摘要

Manual annotation of spike-wave discharges (SWDs), the electrographic hallmark of absence seizures, is labor-intensive for long-term electroencephalography (EEG) monitoring studies. While machine learning approaches show promise for automated detection, they often struggle with cross-subject generalization due to high inter-individual variability in seizure morphology and signal characteristics. In this study we compare the performance of 15 machine learning classifiers on our own manually annotated dataset of 961 hours of EEG recordings from C3H/HeJ mice, including 22,637 labeled SWDs and find that a 1D U-Net performs the best. We then improve its performance by employing residual connections and data augmentation strategies combining amplitude scaling, Gaussian noise injection, and signal inversion during training to enhance cross-subject generalization. We also compare our method, named AugUNet1D, to a recently published time- and frequency-based algorithmic approach called "Twin Peaks" and show that AugUNet1D performs better on our dataset. AugUNet1D, pretrained on our manually annotated data or untrained, is made public for other users.

2512.13144 2026-02-05 cs.CV cs.LG eess.IV

Weight Space Correlation Analysis: Quantifying Feature Utilization in Deep Learning Models

Chun Kit Wong, Paraskevas Pegios, Nina Weng, Emilie Pi Fogtmann Sejer, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen, Aasa Feragen

Comments 26 pages

详情
英文摘要

Deep learning models in medical imaging are susceptible to shortcut learning, relying on confounding metadata (e.g., scanner model) that is often encoded in image embeddings. The crucial question is whether the model actively utilizes this encoded information for its final prediction. We introduce Weight Space Correlation Analysis, an interpretable methodology that quantifies feature utilization by measuring the alignment between the classification heads of a primary clinical task and auxiliary metadata tasks. We first validate our method by successfully detecting artificially induced shortcut learning. We then apply it to probe the feature utilization of an SA-SonoNet model trained for Spontaneous Preterm Birth (sPTB) prediction. Our analysis confirmed that while the embeddings contain substantial metadata, the sPTB classifier's weight vectors were highly correlated with clinically relevant factors (e.g., birth weight) but decoupled from clinically irrelevant acquisition factors (e.g. scanner). Our methodology provides a tool to verify model trustworthiness, demonstrating that, in the absence of induced bias, the clinical model selectively utilizes features related to the genuine clinical signal.

2511.12268 2026-02-05 eess.IV cs.CV

Patient-Aware Multimodal RGB-HSI Fusion via Incremental Heuristic Meta-Learning for Oral Lesion Classification

Rupam Mukherjee, Rajkumar Daniel, Soujanya Hazra, Shirin Dasgupta, Subhamoy Mandal

Comments 6 pages, 3 figures, 2 tables

详情
英文摘要

Early detection of oral cancer and potentially malignant diseases is a major challenge in low-resource settings due to the scarcity of annotated data. We provide a unified approach for four-class oral lesion classification that incorporates deep learning, spectral analysis, and demographic data. A pathologist-verified subset of oral cavity images was curated from a publicly available dataset. Oral cavity pictures were processed using a fine-tuned ConvNeXt-v2 network for deep embeddings before being translated into the hyperspectral domain using a reconstruction algorithm. Haemoglobin-sensitive, textural, and spectral descriptors were obtained from the reconstructed hyperspectral cubes and combined with demographic data. Multiple machine-learning models were evaluated using patient-specific validation. Finally, an incremental heuristic meta-learner (IHML) was developed that merged calibrated base classifiers via probabilistic feature stacking and uncertainty-aware abstraction of multimodal representations with patient-level smoothing. By decoupling evidence extraction from decision fusion, IHML stabilizes predictions in heterogeneous, small-sample medical datasets. On an unseen test set, our proposed model achieved a macro F1 of 66.23% and an overall accuracy of 64.56%. The findings demonstrate that RGB-to-hyperspectral reconstruction and ensemble meta-learning improve diagnostic robustness in real-world oral lesion screening.

2510.13616 2026-02-05 cs.RO cs.SY eess.SY

Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor

Preston Fairchild, Claudia Chen, Xiaobo Tan

Comments For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing

详情
英文摘要

Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.

2509.14764 2026-02-05 eess.SP cs.SD

Efficient Solutions for Mitigating Initialization Bias in Unsupervised Self-Adaptive Auditory Attention Decoding

Yuanyuan Yao, Simon Geirnaert, Tinne Tuytelaars, Alexander Bertrand

详情
英文摘要

Decoding the attended speaker in a multi-speaker environment from electroencephalography (EEG) has attracted growing interest in recent years, with neuro-steered hearing devices as a driver application. Current approaches typically rely on ground-truth labels of the attended speaker during training, necessitating calibration sessions for each user and each EEG set-up to achieve optimal performance. While unsupervised self-adaptive auditory attention decoding (AAD) for stimulus reconstruction has been developed to eliminate the need for labeled data, it suffers from an initialization bias that can compromise performance. Although an unbiased variant has been proposed to address this limitation, it introduces substantial computational complexity that scales with data size. This paper presents three computationally efficient alternatives that achieve comparable performance, but with a significantly lower and constant computational cost. The code for the proposed algorithms is available at https://github.com/YYao-42/Unsupervised_AAD.

2508.18998 2026-02-05 eess.AS

MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR

Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu

Comments 5 pages, 3 figures, accepted to ICASSP 2026

详情
英文摘要

LLM-based ASR overcomes multilingual data scarcity by projecting speech representations into the LLM space to leverage its robust semantic and reasoning capabilities. However, while previous approaches typically enhance performance by scaling data or model parameters, a single projector often struggles to effectively align representations across different languages. In this work, we propose an MoE-based projector named MOSA (Mixture of Simple Adapters). By aggregating multiple simple adapters, this architecture enables different experts to specialize in learning either language-shared or language-specific knowledge. This approach not only mitigates parameter interference between languages but also facilitates positive transfer from high-resource to low-resource languages, effectively alleviating data scarcity issues. Experimental results demonstrate that MOSA-Base achieves a 15.4% relative reduction in average WER compared to the Ideal-LLM Base, consistently outperforming it across all languages. Notably, MOSA achieves a 13.3% WER reduction over the Ideal-LLM Base while utilizing only 60% of its parameters. These findings highlight MOSA's superior parameter efficiency and robustness against data imbalance, suggesting that a mixture of simple adapters is more suitable for multilingual LLM-based ASR than complex single-adapter designs.

2508.08153 2026-02-05 eess.SY cs.SY math.OC

Robust Adaptive Discrete-Time Control Barrier Certificate

Changrui Liu, Anil Alan, Shengling Shi, Bart De Schutter

Comments 11 pages with updated simulations and illustrative figures. The conditions used for verifying barrier functions and online safe control are separated, which is a crucial and overlooked point in the literature on adaptive safe control. The paper is submitted to Automatica

详情
英文摘要

This work develops a robust adaptive control strategy for discrete-time systems using Control Barrier Functions (CBFs) to ensure safety under parametric model uncertainty and disturbances. A key contribution of this work is establishing a barrier function certificate in discrete time for general online parameter estimation algorithms. This barrier function certificate guarantees positive invariance of the safe set despite disturbances and parametric uncertainty without access to the true system parameters. In addition, real-time implementation and inherent robustness guarantees are provided. The proposed robust adaptive safe control framework demonstrates that the parameter estimation module can be designed separately from the CBF-based safety filter, simplifying the development of safe adaptive controllers for discrete-time systems. The resulting safe control approach guarantees that the system remains within the safe set while adapting to model uncertainties, making it a promising strategy for discrete-time safety-critical systems.

2507.21669 2026-02-05 eess.SY cs.SY

Data-Driven Greenhouse Climate Regulation in Lettuce Cultivation Using BiLSTM and GRU Predictive Control

Soumo Emmanuel Arnaud, Marcello Calisti, Athanasios Polydoros

详情
英文摘要

Efficient greenhouse management is essential for sustainable food production, but the high energy demand for climate regulation poses significant economic and environmental challenges. While traditional process-based greenhouse models exist, they are often too complex or imprecise for reliable control. To address this, our study introduces a novel data-driven predictive control framework using Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks within a Model Predictive Control (MPC) architecture. Training data were generated from a validated dynamic model simulating lettuce cultivation under various environmental conditions. The LSTM and GRU networks were trained to predict future greenhouse states -- including temperature, humidity, CO\textsubscript{2} concentration, and crop dry matter -- with robustness confirmed via $10$-fold cross-validation. These networks were embedded into an online MPC controller to optimize heating, ventilation, and CO\textsubscript{2} injection, aiming to minimize energy consumption and maximize crop yield while respecting biological constraints. Results showed that both the LSTM- and GRU-based controllers significantly outperformed a conventional MPC baseline. For example, humidity violations dropped from 54.77\% (MPC) to 15.45\% (GRU) and 17.71\% (LSTM), while day-night temperature deviations were kept below $2^\circ\text{C}$. The GRU controller further achieved up to 40\% lower computation time than its LSTM counterpart, confirming its real-time feasibility. Overall, the proposed GRU-driven predictive control approach offers a robust and computationally efficient solution for intelligent greenhouse climate automation under practical operational constraints.

2507.15958 2026-02-05 eess.IV cs.AI cs.CV

Quantization-Aware Neuromorphic Architecture for Skin Disease Classification on Resource-Constrained Devices

Haitian Wang, Xinyu Wang, Yiren Wang, Bo Miao, Atif Mansoor

详情
英文摘要

On-device skin lesion analysis is constrained by the compute and energy cost of conventional CNN inference and by the need to update models as new patient data become available. Neuromorphic processors provide event-driven sparse computation and support on-chip incremental learning, yet deployment is often hindered by CNN-to-SNN conversion failures, including non-spike-compatible operators and accuracy degradation under class imbalance. We propose QANA, a quantization-aware CNN backbone embedded in an end-to-end pipeline engineered for conversion-stable neuromorphic execution. QANA replaces conversion-fragile components with spike-compatible transformations by bounding intermediate activations and aligning normalization with low-bit quantization, reducing conversion-induced distortion that disproportionately impacts rare classes. Efficiency is achieved through Ghost-based feature generation under tight FLOP budgets, while spatially-aware efficient channel attention and squeeze-and-excitation recalibrate channels without heavy global operators that are difficult to map to spiking cores. The resulting quantized projection head produces SNN-ready logits and enables incremental updates on edge hardware without full retraining or data offloading. On HAM10000, QANA achieves 91.6% Top-1 accuracy and 91.0% macro F1, improving the strongest converted SNN baseline by 3.5 percentage points in Top-1 accuracy (a 4.0% relative gain) and by 12.0 points in macro F1 (a 15.2% relative gain). On a clinical dataset, QANA achieves 90.8% Top-1 accuracy and 81.7% macro F1, improving the strongest converted SNN baseline by 3.2 points in Top-1 accuracy (a 3.7% relative gain) and by 3.6 points in macro F1 (a 4.6% relative gain). When deployed on BrainChip Akida, QANA runs in 1.5 ms per image with 1.7 mJ per image, corresponding to 94.6% lower latency and 99.0% lower energy than its GPU-based CNN implementation.

2507.14169 2026-02-05 eess.SP cs.IT math.IT

CQI-Based Interference Prediction for Link Adaptation in Industrial Sub-networks

Pramesh Gautam, Ravi Sharan Bhagavathula, Paolo Baracca, Carsten Bockelmann, Thorsten Wild, Armin Dekorsy

详情
英文摘要

We propose a novel interference prediction scheme to improve link adaptation (LA) in densely deployed industrial sub-networks (SNs) with high-reliability and low-latency communication (HRLLC) requirements. The proposed method aims to improve the LA framework by predicting and leveraging the heavy-tailed interference probability density function (pdf). Interference is modeled as a latent vector of available channel quality indicator (CQI), using a vector discrete-time state-space model (vDSSM) at the SN controller, where the CQI is subjected to compression, quantization, and delay-induced errors. To robustly estimate interference power values under these impairments, we employ a low-complexity, outlier-robust, sparse Student-t process regression (SPTPR) method. This is integrated into a modified unscented Kalman filter, which recursively refines predicted interference using CQI, enabling accurate estimation and compensating protocol feedback delays, crucial for accurate LA. Numerical results show that the proposed method achieves over 10x lower complexity compared to a similar non-parametric baseline. It also maintains a BLER below the 90th percentile target of 1e-6 while delivering performance comparable to a state-of-the-art supervised technique using only CQI reports.

2506.22411 2026-02-05 eess.SP

19.3 GHz Acoustic Filter with High Close-in Rejection in Tri-layer Thin-Film Lithium Niobate

Omar Barrera, Sinwoo Cho, Jack Kramer, Vakhtang Chulukhadze, Tzu-Hsuan Hsu, Ruochen Lu

Comments 4 Pages, 5 figures

详情
英文摘要

Acoustic filters are preferred front-end solutions at sub-6 GHz due to their superior frequency selectivity compared to electromagnetic (EM) counterparts. With the ongoing development of 5G and the evolution toward 6G, there is a growing need to extend acoustic filter technologies into frequency range 3 (FR3), which spans 7 to 24 GHz to accommodate emerging high-frequency bands. However, scaling acoustic filters beyond 10 GHz presents significant challenges, as conventional platforms suffer from increased insertion loss (IL) and degraded out-of-band (OoB) rejection at higher frequencies. Recent innovations have led to the emergence of periodically poled piezoelectric lithium niobate (P3F LN) laterally excited bulk acoustic resonators (XBARs), offering low-loss and high electromechanical coupling performance above 10 GHz. This work presents the first tri-layer P3F LN filter operating at 19.3 GHz, achieving a low IL of 2.2 dB, a 3-dB fractional bandwidth (FBW) of 8.5%, and an impressive 49 dB close in rejection. These results demonstrate strong potential for integration into FR3 diplexers.

2506.16231 2026-02-05 eess.AS cs.SD

EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training

Doyeop Kwak, Youngjoon Jang, Seongyu Kim, Joon Son Chung

Comments Accepted by IEEE Transactions on Audio, Speech and Language Processing. Copyright IEEE. The final version will appear in IEEE Xplore

详情
英文摘要

Speech signals in real-world environments are frequently affected by various distortions such as additive noise, reverberation, and bandwidth limitation, which may appear individually or in combination. Traditional speech enhancement methods typically rely on either masking, which focuses on suppressing non-speech components while preserving observable structure, or mapping, which seeks to recover clean speech through direct transformation of the input. Each approach offers strengths in specific scenarios but may be less effective outside its target conditions. We propose the Erase and Draw Network (EDNet), a versatile speech enhancement framework designed to handle a broad range of distortion types without prior assumptions about task or input characteristics. EDNet consists of two main components: (1) the Gating Mamba (GM) module, which adaptively combines masking and mapping through a learnable gating mechanism that selects between suppression (Erase) and reconstruction (Draw) based on local signal features, and (2) Phase Shift-Invariant Training (PSIT), a shift tolerant supervision strategy that improves phase estimation by enabling dynamic alignment during training while remaining compatible with standard loss functions. Experimental results on denoising, dereverberation, bandwidth extension, and multi distortion enhancement tasks show that EDNet consistently achieves strong performance across conditions, demonstrating its architectural flexibility and adaptability to diverse task settings.

2506.00934 2026-02-05 cs.SD cs.AI eess.AS

GRAM: Spatial general-purpose audio representation models for real-world applications

Goksenin Yuksel, Marcel van Gerven, Kiki van der Heijden

Comments Revise with RealSELD

详情
英文摘要

Audio foundation models learn general-purpose audio representations that facilitate a wide range of downstream tasks. While the performance of these models has greatly increased for conventional single-channel, dry audio clips, their success in real-world acoustic environments with reverberation and noise is limited. Furthermore, most audio foundation models ignore the spatial dimension of real-world acoustic environments, ruling out tasks involving sound localization. To address these limitations, we propose GRAM: a general-purpose real-world audio model that employs a multi-channel masked autoencoder to efficiently learn spatial audio representations. We evaluated GRAM and other audio foundation models in a standardized manner on high-quality simulations of naturalistic, spatial acoustic environments as well as recordings of real-world environments and release these two complementary benchmark task suites: NatHEAR and RealSELD. Our results demonstrate that GRAM outperforms all state-of-the-art self-supervised audio foundation models on NatHEAR and the clean, single-channel version HEAR, while using only a fraction of the training data. GRAM also shows state-of-the-art localization performance in simulated environments and generalizes efficiently to real-world recordings in RealSELD. Taken together, GRAM presents a significant advance toward robust spatial audio foundation models for real-world environments.