arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.05489 2026-03-06 cs.AR cs.CY cs.LO cs.SY eess.SY

NL2GDS: LLM-aided interface for Open Source Chip Design

Max Eland, Jeyan Thiyagalingam, Dinesh Pamunuwa, Roshan Weerasekera

Comments 10 pages, 6 figures

详情
英文摘要

The growing complexity of hardware design and the widening gap between high-level specifications and register-transfer level (RTL) implementation hinder rapid prototyping and system design. We introduce NL2GDS (Natural Language to Layout), a novel framework that leverages large language models (LLMs) to translate natural language hardware descriptions into synthesizable RTL and complete GDSII layouts via the open-source OpenLane ASIC flow. NL2GDS employs a modular pipeline that captures informal design intent, generates HDL using multiple LLM engines and verifies them, and orchestrates automated synthesis and layout. Evaluations on ISCAS'85 and ISCAS'89 benchmark designs demonstrate up to 36% area reduction, 35% delay reduction, and 70% power savings compared to baseline designs, highlighting its potential to democratize ASIC design and accelerate hardware innovation.

2603.05385 2026-03-06 cs.RO cs.SY eess.SY

Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics

Wenjian Hao, Yuxuan Fang, Zehui Lu, Shaoshuai Mou

详情
英文摘要

This paper presents an efficient model predictive path integral (MPPI) control framework for systems with complex nonlinear dynamics. To improve the computational efficiency of classic MPPI while preserving control performance, we replace the nonlinear dynamics used for trajectory propagation with a learned linear deep Koopman operator (DKO) model, enabling faster rollout and more efficient trajectory sampling. The DKO dynamics are learned directly from interaction data, eliminating the need for analytical system models. The resulting controller, termed MPPI-DK, is evaluated in simulation on pendulum balancing and surface vehicle navigation tasks, and validated on hardware through reference-tracking experiments on a quadruped robot. Experimental results demonstrate that MPPI-DK achieves control performance close to MPPI with true dynamics while substantially reducing computational cost, enabling efficient real-time control on robotic platforms.

2603.05354 2026-03-06 cs.CL eess.AS

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

Carlos Carvalho, Francisco Teixeira, Thomas Rolland, Alberto Abad

Comments submitted for review for INTERSPEECH2026 conference

详情
英文摘要

Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.

2603.05159 2026-03-06 cs.CV eess.IV

Generic Camera Calibration using Blurry Images

Zezhun Shi

详情
英文摘要

Camera calibration is the foundation of 3D vision. Generic camera calibration can yield more accurate results than parametric cam era calibration. However, calibrating a generic camera model using printed calibration boards requires far more images than parametric calibration, making motion blur practically unavoidable for individual users. As a f irst attempt to address this problem, we draw on geometric constraints and a local parametric illumination model to simultaneously estimate feature locations and spatially varying point spread functions, while re solving the translational ambiguity that need not be considered in con ventional image deblurring tasks. Experimental results validate the effectiveness of our approach.

2603.03229 2026-03-06 cs.LG eess.SP

Inverse Reconstruction of Shock Time Series from Shock Response Spectrum Curves using Machine Learning

Adam Watts, Andrew Jeon, Destry Newton, Ryan Bowering

Comments Extended journal-style manuscript. 27 pages, 13 figures

详情
英文摘要

The shock response spectrum (SRS) is widely used to characterize the response of single-degree-of-freedom (SDOF) systems to transient accelerations. Because the mapping from acceleration time history to SRS is nonlinear and many-to-one, reconstructing time-domain signals from a target spectrum is inherently ill-posed. Conventional approaches address this problem through iterative optimization, typically representing signals as sums of exponentially decayed sinusoids, but these methods are computationally expensive and constrained by predefined basis functions. We propose a conditional variational autoencoder (CVAE) that learns a data-driven inverse mapping from SRS to acceleration time series. Once trained, the model generates signals consistent with prescribed target spectra without requiring iterative optimization. Experiments demonstrate improved spectral fidelity relative to classical techniques, strong generalization to unseen spectra, and inference speeds three to six orders of magnitude faster. These results establish deep generative modeling as a scalable and efficient approach for inverse SRS reconstruction.

2603.02813 2026-03-06 eess.AS

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

Dhanya E, Ankita Meena, Manas Nanivadekar, Noumida A, Victor Azad, Ashwini Nagaraj Shenoy, Pratik Roy Chowdhuri, Shobhit Banga, Vanshika Chhabra, Chitralekha Bhat, Shareef babu Kalluri, Srikanth Raj Chetupalli, Deepu Vijayasenan, Sriram Ganapathy

Comments Submitted for review to Interspeech 2026

详情
英文摘要

The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark for understanding goal-oriented, real-world medical dialogues. The challenge addresses multi-speaker interactions between frontline health workers and care seekers, characterized by spontaneous, noisy and overlapping speech. As part of the challenge, medical conversational dataset comprising 40 hours of development and 15 hours of blind evaluation recordings was released. We provided baseline systems across 4 tasks - speaker diarization, automatic speech recognition, topic identification and dialogue summarization - to enable consistent benchmarking. System performance is evaluated using diarization error rate (DER), time-constrained minimum-permutation word error rate (tcpWER) and ROUGE-L. This paper describes the Phase-I evaluation - data, tasks and baseline systems - along with the summary of the evaluation results.

2602.18452 2026-03-06 cs.SD cs.LG eess.AS

RA-QA: A Benchmarking System for Respiratory Audio Question Answering Under Real-World Heterogeneity

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo

详情
英文摘要

As conversational multimodal AI tools are increasingly adopted to process patient data for health assessment, robust benchmarks are needed to measure progress and expose failure modes under realistic conditions. Despite the importance of respiratory audio for mobile health screening, respiratory audio question answering remains underexplored, with existing studies evaluated narrowly and lacking real-world heterogeneity across modalities, devices, and question types. We hence introduce the Respiratory-Audio Question-Answering (RA-QA) benchmark, including a standardized data generation pipeline, a comprehensive multimodal QA collection, and a unified evaluation protocol. RA-QA harmonizes public RA datasets into a collection of 9 million format-diverse QA pairs covering diagnostic and contextual attributes. We benchmark classical ML baselines alongside multimodal audio-language models, establishing reproducible reference points and showing how current approaches fail under heterogeneity.

2512.04551 2026-03-06 cs.SD cs.AI eess.AS

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Cong Wang, Yizhong Geng, Yuhua Wen, Qifei Li, Yingming Gao, Ruimin Wang, Chunfeng Wang, Hao Li, Ya Li, Wei Chen

Comments Submitted for review to Interspeech 2026

详情
英文摘要

Speech emotion recognition (SER) is an important technology in human-computer interaction. However, achieving high performance is challenging due to emotional complexity and scarce annotated data. To tackle these challenges, we propose a multi-loss learning (MLL) framework integrating an energy-adaptive mixup (EAM) method and a frame-level attention module (FLAM). The EAM method leverages SNR-based augmentation to generate diverse speech samples capturing subtle emotional variations. FLAM enhances frame-level feature extraction for multi-frame emotional cues. Our MLL strategy combines Kullback-Leibler divergence, focal, center, and supervised contrastive loss to optimize learning, address class imbalance, and improve feature separability. We evaluate our method on four widely used SER datasets: IEMOCAP, MSP-IMPROV, RAVDESS, and SAVEE. The results demonstrate our method achieves state-of-the-art performance, suggesting its effectiveness and robustness.

2510.16834 2026-03-06 cs.SD cs.AI cs.LG eess.AS

Schrödinger Bridge Mamba for One-Step Speech Enhancement

Jing Yang, Sirui Wang, Chao Wu, Lei Guo, Fan Fan

Comments Revised version. Submitted to Interspeech 2026

详情
英文摘要

We present Schrödinger Bridge Mamba (SBM), a novel model for efficient speech enhancement by integrating the Schrödinger Bridge (SB) training paradigm and the Mamba architecture. Experiments of joint denoising and dereverberation tasks demonstrate SBM outperforms strong generative and discriminative methods on multiple metrics with only one step of inference while achieving a competitive real-time factor for streaming feasibility. Ablation studies reveal that the SB paradigm consistently yields improved performance across diverse architectures over conventional mapping. Furthermore, Mamba exhibits a stronger performance under the SB paradigm compared to Multi-Head Self-Attention (MHSA) and Long Short-Term Memory (LSTM) backbones. These findings highlight the synergy between the Mamba architecture and the SB trajectory-based training, providing a high-quality solution for real-world speech enhancement. Demo page: https://sbmse.github.io

2509.15001 2026-03-06 eess.AS cs.LG cs.SD

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Théo Charlot, Tarek Kunze, Maxime Poli, Alejandrina Cristia, Emmanuel Dupoux, Marvin Lavechin

Comments 5 pages, 1 figure

详情
英文摘要

Child-centered daylong recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, a self-supervised speech model trained on 13,000 hours of multilingual child-centered recordings spanning 40+ languages. Evaluated on voice type classification -- distinguishing target children from female adults, male adults, and other children, a key preprocessing step for analyzing naturalistic language experiences -- BabyHuBERT-VTC achieves F1-scores from 52.1% to 74.4% across six corpora, consistently outperforming W2V2-LL4300 (English daylongs) and HuBERT (clean adult speech). Notable gains include 13.2 and 15.9 absolute F1 points over HuBERT on Vanuatu and Solomon Islands, demonstrating effectiveness on underrepresented languages. We share code and model to support researchers working with child-centered recordings across diverse linguistic contexts.

2507.09995 2026-03-06 eess.IV cs.CV

Graph-Based Multi-Modal Light-weight Network for Adaptive Brain Tumor Segmentation

Guohao Huo, Ruiting Dai, Zitong Wang, Junxin Kong, Hao Tang

详情
英文摘要

Multi-modal brain tumor segmentation remains challenging for practical deployment due to the high computational costs of mainstream models. In this work, we propose GMLN-BTS, a Graph-based Multi-modal interaction Lightweight Network for brain tumor segmentation. Our architecture achieves high-precision, resource-efficient segmentation through three key components. First, a Modality-Aware Adaptive Encoder (M2AE) facilitates efficient multi-scale semantic extraction. Second, a Graph-based Multi-Modal Collaborative Interaction Module (G2MCIM) leverages graph structures to model complementary cross-modal relationships. Finally, a Voxel Refinement UpSampling Module (VRUM) integrates linear interpolation with multi-scale transposed convolutions to suppress artifacts and preserve boundary details. Experimental results on BraTS 2017, 2019, and 2021 benchmarks demonstrate that GMLN-BTS achieves state-of-the-art performance among lightweight models. With only 4.58M parameters, our method reduces parameter count by 98% compared to mainstream 3D Transformers while significantly outperforming existing compact approaches.

2603.05279 2026-03-06 cs.RO cs.SY eess.SY

From Code to Road: A Vehicle-in-the-Loop and Digital Twin-Based Framework for Central Car Server Testing in Autonomous Driving

Chengdong Wu, Sven Kirchner, Nils Purschke, Axel Torschmied, Norbert Kroth, Yinglei Song, André Schamschurko, Erik Leo Haß, Kuo-Yi Chao, Yi Zhang, Nenad Petrovic, Alois C. Knoll

Comments 8 pages; Accepted for publication at the 37th IEEE Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026

详情
英文摘要

Simulation is one of the most essential parts in the development stage of automotive software. However, purely virtual simulations often struggle to accurately capture all real-world factors due to limitations in modeling. To address this challenge, this work presents a test framework for automotive software on the centralized E/E architecture, which is a central car server in our case, based on Vehicle-in-the-Loop (ViL) and digital twin technology. The framework couples a physical test vehicle on a dynamometer test bench with its synchronized virtual counterpart in a simulation environment. Our approach provides a safe, reproducible, realistic, and cost-effective platform for validating autonomous driving algorithms with a centralized architecture. This test method eliminates the need to test individual physical ECUs and their communication protocols separately. In contrast to traditional ViL methods, the proposed framework runs the full autonomous driving software directly on the vehicle hardware after the simulation process, eliminating flashing and intermediate layers while enabling seamless virtual-physical integration and accurately reflecting centralized E/E behavior. In addition, incorporating mixed testing in both simulated and physical environments reduces the need for full hardware integration during the early stages of automotive development. Experimental case studies demonstrate the effectiveness of the framework in different test scenarios. These findings highlight the potential to reduce development and integration efforts for testing autonomous driving pipelines in the future.

2603.05270 2026-03-06 eess.AS cs.AI

Visual-Informed Speech Enhancement Using Attention-Based Beamforming

Chihyun Liu, Jiaxuan Fan, Mingtung Sun, Michael Anthony, Mingsian R. Bai, Yu Tsao

Comments 15 pages, 14 figures

详情
Journal ref
IEEE Transactions on Audio, Speech and Language Processing, vol. 33, Volume: 33, pp. 4941-4955, 2025
英文摘要

Recent studies have demonstrated that incorporating auxiliary information, such as speaker voiceprint or visual cues, can substantially improve Speech Enhancement (SE) performance. However, single-channel methods often yield suboptimal results in low signal-to-noise ratio (SNR) conditions, when there is high reverberation, or in complex scenarios involving dynamic speakers, overlapping speech, or non-stationary noise. To address these issues, we propose a novel Visual-Informed Neural Beamforming Network (VI-NBFNet), which integrates microphone array signal processing and deep neural networks (DNNs) using multimodal input features. The proposed network leverages a pretrained visual speech recognition model to extract lip movements as input features, which serve for voice activity detection (VAD) and target speaker identification. The system is intended to handle both static and moving speakers by introducing a supervised end-to-end beamforming framework equipped with an attention mechanism. The experimental results demonstrated that the proposed audiovisual system has achieved better SE performance and robustness for both stationary and dynamic speaker scenarios, compared to several baseline methods.

2603.05268 2026-03-06 cs.RO cs.SY eess.SY

Curve-Induced Dynamical Systems on Riemannian Manifolds and Lie Groups

Saray Bakker, Martin Schonger, Tobias Löw, Javier Alonso-Mora, Sylvain Calinon

Comments Preprint, 14 pages, video linked in the paper, Saray Bakker and Martin Schonger contributed equally as first authors and are listed alphabetically

详情
英文摘要

Deploying robots in household environments requires safe, adaptable, and interpretable behaviors that respect the geometric structure of tasks. Often represented on Lie groups and Riemannian manifolds, this includes poses on SE(3) or symmetric positive definite matrices encoding stiffness or damping matrices. In this context, dynamical system-based approaches offer a natural framework for generating such behavior, providing stability and convergence while remaining responsive to changes in the environment. We introduce Curve-induced Dynamical systems on Smooth Manifolds (CDSM), a real-time framework for constructing dynamical systems directly on Riemannian manifolds and Lie groups. The proposed approach constructs a nominal curve on the manifold, and generates a dynamical system which combines a tangential component that drives motion along the curve and a normal component that attracts the state toward the curve. We provide a stability analysis of the resulting dynamical system and validate the method quantitatively. On an S2 benchmark, CDSM demonstrates improved trajectory accuracy, reduced path deviation, and faster generation and query times compared to state-of-the-art methods. Finally, we demonstrate the practical applicability of the framework on both a robotic manipulator, where poses on SE(3) and damping matrices on SPD(n) are adapted online, and a mobile manipulator.

2603.05251 2026-03-06 eess.SP

On Dual-Fed Pinching Antenna Systems with In-Waveguide Attenuation

Ximing Xie, Hao Qin, Fang Fang, Xianbin Wang

详情
英文摘要

Pinching antenna systems (PAS) have recently emerged as a promising architecture for flexible and reconfigurable wireless communications. However, their performance is fundamentally constrained by in-waveguide attenuation, which is non-negligible in practical dielectric waveguides and can severely degrade the achievable data rate, particularly for long waveguides. To overcome this limitation, we propose a dual-fed PAS (DF-PAS), in which each waveguide is equipped with two feed points located at the two ends, enabling dynamic feed-point selection based on user locations. This design effectively shortens the in-waveguide propagation distance and mitigates attenuation-induced power loss without modifying the waveguide structure or the PA actuation mechanism. We investigate the DF-PAS in both single- and multi-waveguide scenarios. For the single-waveguide case, we derive closed-form high-SNR approximations of the ergodic rate and obtain closed-form solutions for the optimal PA position and feed-point selection under time-division multiple access (TDMA). We then extend DF-PAS to a multi-waveguide scenario, where we first derive closed-form high-SNR approximations of the ergodic rate and then formulate a joint optimization problem over feed-point selection, PA placement, and beamforming under general orthogonal multiple access (OMA). To solve this problem efficiently, we develop a two-phase optimization framework that integrates greedy feed-point switching, gradient-based PA placement, and WMMSE-based beamforming. Simulation results demonstrate that the proposed DF-PAS consistently outperforms conventional single-fed PAS (SF-PAS) across various network configurations, validating its effectiveness as a practical and scalable solution for mitigating in-waveguide attenuation in PAS-enabled wireless networks.

2603.05247 2026-03-06 eess.IV cs.CV physics.med-ph

ICHOR: A Robust Representation Learning Approach for ASL CBF Maps with Self-Supervised Masked Autoencoders

Xavier Beltran-Urbano, Yiran Li, Xinglin Zeng, Katie R. Jobson, Manuel Taso, Christopher A. Brown, David A. Wolk, Corey T. McMillan, Ilya M. Nashrallah, Paul A. Yushkevich, Ze Wang, John A. Detre, Sudipto Dolui

详情
英文摘要

Arterial spin labeling (ASL) perfusion MRI allows direct quantification of regional cerebral blood flow (CBF) without exogenous contrast, enabling noninvasive measurements that can be repeated without constraints imposed by contrast injection. ASL is increasingly acquired in research studies and clinical MRI protocols. Building on successes in structural imaging, recent efforts have implemented deep learning based methods to improve image quality, enable automated quality control, and derive robust quantitative and predictive biomarkers with ASL derived CBF. However, progress has been limited by variable image quality, substantial inter-site, vendor and protocol differences, and limited availability of labeled datasets needed to train models that generalize across cohorts. To address these challenges, we introduce ICHOR, a self supervised pre-training approach for ASL CBF maps that learns transferable representations using 3D masked autoencoders. ICHOR is pretrained via masked image modeling using a Vision Transformer backbone and can be used as a general-purpose encoder for downstream ASL tasks. For pre-training, we curated one of the largest ASL datasets to date, comprising 11,405 ASL CBF scans from 14 studies spanning multiple sites and acquisition protocols. We evaluated the pre-trained ICHOR encoder on three downstream diagnostic classification tasks and one ASL CBF map quality prediction regression task. Across all evaluations, ICHOR outperformed existing neuroimaging self-supervised pre-training methods adapted to ASL. Pre-trained weights and code will be made publicly available.

2603.05239 2026-03-06 eess.SY cs.SY math.OC

Computing Scaled Relative Graphs of Discrete-time LTI Systems from Data

Talitha Nauta, Richard Pates

Comments 11 pages, 3 figures, submitted for possible publication

详情
英文摘要

Graphical methods for system analysis have played a central role in control theory. A recently emerging tool in this field is the Scaled Relative Graph (SRG). In this paper, we further extend its applicability by showing how the SRG of discrete-time linear-time-invariant (LTI) systems can be computed exactly from its state-space representation using linear matrix inequalities. We additionally propose a fully data-driven approach where we demonstrate how to compute the SRG exclusively from input-output data. Furthermore, we introduce a robust version of the SRG, which can be computed from noisy data trajectories and contains the SRG of the actual system.

2603.05220 2026-03-06 eess.IV cs.IT math.IT

Adaptive Sampling for Storage of Progressive Images on DNA

Xavier Pic, Nimesh Pinnamaneni, Raja Appuswamy

详情
英文摘要

The short lifespan of traditional data storage media, coupled with an exponential increase in storage demand, has made long-term archival a fundamental problem in the data storage industry and beyond. Consequently, researchers are looking for innovative media solutions that can store data over long time periods at a very low cost. DNA molecules, with their high density, long lifespan, and low energy needs, have emerged as a viable alternative to digital data archival. However, current DNA data storage technologies are facing challenges with respect to cost and reliability. Thus, coding rate and error robustness are critical to scale DNA storage and make it technologically and economically achievable. Moreover, the molecules of DNA that encode different files are often located in the same oligo pool. Without random access solutions at the oligo level, it is very impractical to decode a specific file from these mixed pools, as all oligos need to first be sequenced and decoded before a target file can be retrieved, which greatly deteriorates the read cost. This paper introduces a solution to efficiently encode and store images into DNA molecules, that aims at reducing the read cost necessary to retrieve a resolution-reduced version of an image. This image storage system is based on the Progressive Decoding Functionality of the JPEG2000 codec but can be adapted to any conventional progressive codec. Each resolution layer is encoded into a set of oligos using the JPEG DNA VM codec, a DNA-based coder that aims at retrieving a file with a high reliability. Depending on the desired resolution to be read, the set of oligos as well as the portion of the oligos to be sequenced and decoded are adjusted accordingly. These oligos will be selected at sequencing time, with the help of the adaptive sampling method provided by the Nanopore sequencers, making it a PCR-free random access solution.

2603.05183 2026-03-06 eess.IV

Limited-Angle CT Reconstruction Using Multi-Volume Latent Consistency Model

Hinako Isogai, Naruki Murahashi, Mitsuhiro Nakamura, Megumi Nakao

详情
英文摘要

Limited-angle computed tomography (LACT) reconstruction is an inverse problem with severe ill-posedness arising from missing projection angles, and it is difficult to restore high-precision images without sufficient prior knowledge. In recent years, machine learning methods represented by diffusion models have demonstrated high image generation capabilities. However, accurate restoration of three-dimensional structures of organs and vessels and preservation of contrast remain challenges, and the impact of differences in diverse clinical imaging conditions such as field of view (FOV) and projection angle range on reconstruction accuracy has not been sufficiently investigated. In this study, we propose a multi-volume latent diffusion model that uses three-dimensional latent representations obtained from multiple effective fields of view as guidance for LACT reconstruction in clinical practical problems. The proposed method achieves fast and stable inference by introducing consistency models into latent space, and enables high-precision preservation of organ boundary information and internal structures under different FOV conditions through a Multi-volume encoder that acquires latent variables from different scales of the global region and central region. The evaluation experiments demonstrated that the proposed method achieved high-precision synthetic CT image generation compared to existing methods. Under the limited-angle condition of 60 degrees, MAE of 10.12 HU and SSIM of 0.9677 were achieved, and under the extreme limited-angle condition of 30 degrees, MAE of 16.69 HU and SSIM of 0.9393 were achieved. Furthermore, stable reconstruction performance was demonstrated even for unknown projection angle conditions not included during training, confirming the applicability to diverse imaging conditions in clinical practice.

2603.05157 2026-03-06 cs.CV cs.LG eess.IV

The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

Dishantkumar Sutariya, Eike Petersen

Comments Preprint accepted for publication at BVM 2026 (https://www.bvm-conf.org/)

详情
英文摘要

Deep learning models can identify racial identity with high accuracy from chest X-ray (CXR) recordings. Thus, there is widespread concern about the potential for racial shortcut learning, where a model inadvertently learns to systematically bias its diagnostic predictions as a function of racial identity. Such racial biases threaten healthcare equity and model reliability, as models may systematically misdiagnose certain demographic groups. Since racial shortcuts are diffuse - non-localized and distributed throughout the whole CXR recording - image preprocessing methods may influence racial shortcut learning, yet the potential of such methods for reducing biases remains underexplored. Here, we investigate the effects of image preprocessing methods including lung masking, lung cropping, and Contrast Limited Adaptive Histogram Equalization (CLAHE). These approaches aim to suppress spurious cues encoding racial information while preserving diagnostic accuracy. Our experiments reveal that simple bounding box-based lung cropping can be an effective strategy for reducing racial shortcut learning while maintaining diagnostic model performance, bypassing frequently postulated fairness-accuracy trade-offs.

2603.05154 2026-03-06 eess.SP stat.AP

Revitalizing AR Process Simulation of Non-Gaussian Radar Clutter via Series-Based Analytic Continuation

Xingxing Liao, Junhao Xie

Comments 13 pages, 12 figures

详情
英文摘要

Due to the conceptual simplicity, the linear filtering framework, notably the autoregressive (AR) process, has a long history in simulating clutter sequences with specified probability density functions (PDFs) and autocorrelation functions (ACFs). However, linear filtering inevitably distorts the input distribution, which may lead to inaccurate PDF reproduction or restrict applicability to very simple ACFs. To address these challenges, this study proposes a series-based analytic continuation strategy that revitalizes AR process clutter simulation by accurately precomputing the input pre-distortion required to compensate for AR filtering. First, the moments and cumulants of the AR input are derived based on the input-output relationship of the AR process, facilitating the moment and cumulant expansions of the Laplace transform (LT) and the logarithmic LT around zero, respectively. Second, both series expansions are analytically continued via the Padé approximation (PA) to recover the LT over the full complex plane. Notably, the PA-based continuation of the moment expansion, a conventional choice, can be highly inaccurate when the LT exhibits strong oscillations. By contrast, given the logarithmic LT generally has a simpler structure, the continuation of the cumulant expansion provides a more stable and accurate alternative. Third, the LT recovered from the cumulant expansion facilitates fast simulation of the AR input non-Gaussian white sequence via a random variable transformation method, thereby enabling an efficient AR process. Finally, simulations demonstrate that the proposed strategy enables accurate and fast simulation of non-Gaussian correlated clutter sequences.

2603.05133 2026-03-06 eess.IV

Anti-Aliasing Snapshot HDR Imaging Using Non-Regular Sensing

Teresa Stürzenhofäcker, Moritz Klimm, Jürgen Seiler, André Kaup

详情
英文摘要

Snapshot HDR imaging is essential to capture the full dynamic range of a scene in a single exposure, making it essential for video and dynamic environments where motion prevents the use of multi-exposure techniques or complex hardware set-ups. This work presents a snapshot HDR imaging sensor that is based on spatially varying apertures, implemented by combining two differently sized prototype pixels. The different light integration areas physically extend the dynamic range towards the lower end, compared to a standard high resolution sensor. A non-regular pixel arrangement is suggested, to mitigate aliasing and overcome a loss in spatial resolution that is associated with increased light integration area of the larger prototype pixel. Subsequent reconstruction in the Fourier domain, where natural images can be sparsely represented allows to recover the image with high detail. The image acquisition approach with the proposed non-regular HDR sensor is simulated and analysed with special emphasis on the spatial resolution. The results suggest the snapshot HDR sensor layout to be an effective way to acquire images with high dynamic range and free from aliasing artefacts.

2603.05127 2026-03-06 eess.SP

A Fully Open-source Implementation of an Analog 8-PAM Demapper for High-speed Communications

Mohamed Aiham Hemza, Alex Alvarado, Krzysztof Herman, Piyush Kaul

Comments 5 pages, 5 figures

详情
英文摘要

Spectrally-efficient communication systems rely on the use of multi-level modulation formats. At the receiver side, a demodulator is often used to extract soft information about the transmitted bits. Such a demodulator is typically implemented in the digital domain. However, analog implementations of such demodulators are also possible. In this paper, we design and simulate an analog 8-ary pulse-amplitude modulation (8-PAM) demapper in IHP SG13G2 SiGe BiCMOS technology. We generalize and improve a design available in the literature for 4-PAM. A fully MOSFET-based 8-PAM design is proposed. Our simulations and design are completely based on open-source IC design tools. Our results show an energy efficiency of 0.33 pJ/bit for a data rate of 1Gbit/s.

2603.05115 2026-03-06 eess.SY cs.SY

Trajectory Tracking for Uncrewed Surface Vessels with Input Saturation and Dynamic Motion Constraints

Ram Milan Kumar Verma, Shashi Ranjan Kumar, Hemendra Arya

Comments 32 pages, 7 figures

详情
英文摘要

This work addresses the problem of constrained motion control of the uncrewed surface vessels. The constraints are imposed on states/inputs of the vehicles due to the physical limitations, mission requirements, and safety considerations. We develop a nonlinear feedback controller utilizing log-type Barrier Lyapunov Functions to enforce static and dynamic motion constraints. The proposed scheme uniquely addresses asymmetric constraints on position and heading alongside symmetric constraints on surge, sway, and yaw rates. Additionally, a smooth input saturation model is incorporated in the design to guarantee stability even under actuator bounds, which, if unaccounted for, can lead to severe performance degradation and poor tracking. Rigorous Lyapunov stability analysis shows that the closed-loop system remains stable and that all state variables remain within their prescribed bounds at all times, provided the initial conditions also lie within those bounds. Numerical simulations demonstrate the effectiveness of the proposed strategies for surface vessels without violating the motion and actuator constraints.

2603.05091 2026-03-06 eess.AS

Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters

Aemon Yat Fei Chiu, Yujia Xiao, Qiuqiang Kong, Tan Lee

Comments Under review

详情
英文摘要

Voice timbre attribute detection (vTAD) is the task of determining the relative intensity of timbre attributes between speech utterances. Voice timbre is a crucial yet inherently complex component of speech perception. While deep neural network (DNN) embeddings perform well in speaker modelling, they often act as black-box representations with limited physical interpretability and high computational cost. In this work, a compact acoustic parameter set is investigated for vTAD. The set captures important acoustic measures and their temporal dynamics which are found to be crucial in the task. Despite its simplicity, the acoustic parameter set is competitive, outperforming conventional cepstral features and supervised DNN embeddings, and approaching state-of-the-art self-supervised models. Importantly, the studied set require no trainable parameters, incur negligible computation, and offer explicit interpretability for analysing physical traits behind human timbre perception.

2603.05058 2026-03-06 cs.CV cs.AI eess.IV

A 360-degree Multi-camera System for Blue Emergency Light Detection Using Color Attention RT-DETR and the ABLDataset

Francisco Vacalebri-Lloret, Lucas Banchero, Jose J. Lopez, Jose M. Mossi

Comments 16 pages, 17 figures. Submitted to IEEE Transactions on Intelligent Vehicles

详情
英文摘要

This study presents an advanced system for detecting blue lights on emergency vehicles, developed using ABLDataset, a curated dataset that includes images of European emergency vehicles under various climatic and geographic conditions. The system employs a configuration of four fisheye cameras, each with a 180-degree horizontal field of view, mounted on the sides of the vehicle. A calibration process enables the azimuthal localization of the detections. Additionally, a comparative analysis of major deep neural network algorithms was conducted, including YOLO (v5, v8, and v10), RetinaNet, Faster R-CNN, and RT-DETR. RT-DETR was selected as the base model and enhanced through the incorporation of a color attention block, achieving an accuracy of 94.7 percent and a recall of 94.1 percent on the test set, with field test detections reaching up to 70 meters. Furthermore, the system estimates the approach angle of the emergency vehicle relative to the center of the car using geometric transformations. Designed for integration into a multimodal system that combines visual and acoustic data, this system has demonstrated high efficiency, offering a promising approach to enhancing Advanced Driver Assistance Systems (ADAS) and road safety.

2603.05023 2026-03-06 eess.SP

Label Hijacking in Track Consensus-Based Distributed Multi-Target Tracking

Helena Calatrava, Shuo Tang, Pau Closas

Comments 8 pages, 7 figures; This work has been submitted to the IEEE for possible publication

详情
英文摘要

Distributed multi-target tracking (DMTT) in limited field-of-view (FoV) sensor networks commonly suffers from label inconsistency, whereby different nodes disagree on the identity of the same target. Recent track-consensus DMTT (TC-DMTT) strategies mitigate this issue by enforcing kinematic and label agreement through metric-based track matching. Nevertheless, their behavior under adversarial conditions remains largely unexplored. In this paper, we reveal identity-level vulnerabilities in TC-DMTT and introduce the concept of label hijacking: an attack in which an adversary injects spoofed tracks to corrupt target identities across the network. Drawing on an analogy to classical pull-off deception in radar, we formalize a notion of attack stealthiness and derive an optimization-based strategy for crafting such attacks. A three-sensor network case study demonstrates the impact of the proposed attack on label consistency and tracking accuracy, showing successful target impersonation. Overall, this work highlights the need to rethink robustness at the consensus layer in DMTT frameworks.

2603.05021 2026-03-06 eess.SY cs.IT cs.SY math.DS math.IT math.OC

Formal Entropy-Regularized Control of Stochastic Systems

Menno van Zutphen, Giannis Delimpaltadakis, Duarte J. Antunes

详情
英文摘要

Analyzing and controlling system entropy is a powerful tool for regulating predictability of control systems. Applications benefiting from such approaches range from reinforcement learning and data security to human-robot collaboration. In continuous-state stochastic systems, accurate entropy analysis and control remains a challenge. In recent years, finite-state abstractions of continuous systems have enabled control synthesis with formal performance guarantees on objectives such as stage costs. However, these results do not extend to entropy-based performance measures. We solve this problem by first obtaining bounds on the entropy of system discretizations using traditional formal-abstractions results, and then obtaining an additional bound on the difference between the entropy of a continuous distribution and that of its discretization. The resulting theory enables formal entropy-aware controller synthesis that trades predictability against control performance while preserving formal guarantees for the original continuous system. More specifically, we focus on minimizing the linear combination of the KL divergence of the system trajectory distribution to uniform -- our system entropy metric -- and a generic cumulative cost. We note that the bound we derive on the difference between the KL divergence to uniform of a given continuous distribution and its discretization can also be relevant in more general information-theoretic contexts. A set of case studies illustrates the effectiveness of the method.

2603.04988 2026-03-06 eess.SY cs.SY

A Unified Hybrid Control Architecture for Multi-DOF Robotic Manipulators

Xinyu Qiao, Yongyang Xiong, Yu Han, Keyou You

Comments 10pages, 6figures

详情
英文摘要

Multi-degree-of-freedom (DOF) robotic manipulators exhibit strongly nonlinear, high-dimensional, and coupled dynamics, posing significant challenges for controller design. To address these issues, this work proposes a unified hybrid control architecture that integrates model predictive control (MPC) with feedback regulation, together with a stability analysis of the proposed scheme. The proposed approach mitigates the optimization difficulty associated with high-dimensional nonlinear systems and enhances overall control performance. Furthermore, a hardware implementation scheme based on machine learning (ML) is proposed to achieve high computational efficiency while maintaining control accuracy. Finally, simulation and hardware experiments under external disturbances validate the proposed architecture, demonstrating its superior performance, hardware feasibility, and generalization capability for multi-DOF manipulation tasks.

2603.04962 2026-03-06 eess.SY cs.SY

Design of Grid Forming Multi Timescale Coordinated Control Strategies for Dynamic Virtual Power Plants

Yan Tong, Qin Wang, Sihao Chen, Xue Hu, Zhaoyuan Wu

详情
英文摘要

As the penetration level of distributed energy resources (DERs) continues to rise, traditional frequency and voltage support from synchronous machines declines. This weakens grid stability and increases the need for fast and adaptive control in a dynamic manner, especially in weak grids. However, most virtual power plants (VPPs) rely on static aggregation and plan based resource allocation strategies. These methods overlook differences in device response times and limit flexibility for ancillary services. To address this issue, we propose a dynamic virtual power plant (DVPP) that coordinates heterogeneous resources across multiple time scales using grid forming control. We first contrast grid following and grid forming converters: grid following designs rely on a phase locked loop which can undermine stability in weak grids, whereas our DVPP applies virtual synchronous generator control at the aggregate level to provide effective inertia and damping. Then, we introduce a dynamic participation factor framework that measures each device s contribution through the frequency active power and voltage reactive power loops. Exploiting device heterogeneity, we adopt a banded allocation strategy: slow resources manage steady state and low frequency regulation; intermediate resources smooth transitions; and fast resources deliver rapid response and high frequency damping. Comparative simulations demonstrate that this coordinated, timescale aware approach enhances stability and ancillary service performance compared to conventional VPPs.