arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 167
2604.24726 2026-04-28 cs.CE cs.SY eess.SY

VEHRON: A Configuration-Driven BEV Simulation Framework for Subsystem-Level Studies

Subramanyam Natarajan

Comments 12 pages, 3 figures, 5 tables; software paper

详情
英文摘要

In practical early-stage battery-electric vehicle studies, analysis workflows may become fragmented across spreadsheets, notebooks, and project-specific scripts, making reuse, audit, and extension harder. VEHRON is an open-source Python framework for a deterministic, traceable workflow built around prescribed-speed longitudinal simulation of battery-electric vehicles using validated YAML configuration, packaged drive-cycle resources, interchangeable subsystem models, and auditable case outputs. VEHRON currently runs as a command-line workflow in which a vehicle definition and a testcase definition are combined to execute a simulation, emit a flat time series, and write a case package containing copied inputs, resolved configuration, summary metadata, and standard plots. Architecturally, VEHRON is organized around a small simulation engine, a shared state bus, a registry of model selections, schema-based configuration loading, and extension points for custom battery and HVAC models loaded from external Python files. VEHRON currently focuses on battery-electric longitudinal simulation with low-order battery, thermal, auxiliary-load, and HVAC models. This paper explains how VEHRON is structured, how it is used, which models it implements, and where its present limits lie. Source code is available at https://github.com/vehron-dev/vehron, with archived release metadata recorded under DOI https://doi.org/10.5281/zenodo.19820111.

2604.24714 2026-04-28 math.AT eess.IV q-bio.NC

Homology-based Morphometry of Brain Atrophy: Methods and Applications

Donato Quiccione, Mariam Pirashvili, Nathan Broomhead, Sean J. Fallon

详情
英文摘要

Understanding the structure of the brain, and how it changes with time and disease, is a core goal of structural neuroimaging. Contemporary approaches to structural brain analysis are dominated by voxel-wise, mass-univariate methods such as voxel-based morphometry (VBM). However, these techniques require images to be normalized to a standard template, which can obscure subject-specific geometric features. Normalization to a common stereotactic space can also be problematic when comparing groups with substantial brain pathology, lesions, or other anatomical abnormalities. Here, we introduce two complementary pipelines based on persistent homology (PH), a tool from topological data analysis, to quantify multiscale geometric features of structural T1-weighted MRI scans. Pipeline 1 quantifies regional thinning by applying the Euclidean distance transform to tissue masks in a slice-wise manner. Pipeline 2 uses \(α\)-filtrations to measure structural similarity between pairs of scans, capturing sulcal widening and ventricular enlargement. Synthetic experiments with controlled induced lesions showed that Pipeline 1 is best suited to between-subject analyses, whereas Pipeline 2 is better suited to within-subject designs. Applied to real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), Pipeline 1 separated Alzheimer's disease (AD) from cognitively normal (CN) participants using single-modality T1-weighted MRI without nonlinear registration (ROC-AUC = 0.895), with peak effects localized to medial temporal regions. Pipeline 2 captured disease-related longitudinal change, with follow-up scans remaining closest to their own baselines and AD subjects showing greater short-interval change than CN subjects. Together, these pipelines provide interpretable topological biomarkers for cross-sectional group comparisons and longitudinal tracking.

2604.24706 2026-04-28 eess.SY cs.LG cs.RO cs.SY

Exploiting Differential Flatness for Efficient Learning-based Model Predictive Control of Constrained Multi-Input Control Affine Systems

Tobias A. Farger, Adam W. Hall, Angela P. Schoellig

Comments Accepted for publication in 2026 European Control Conference

详情
英文摘要

Learning-based control techniques use data from past trajectories to control systems with uncertain dynamics. However, learning-based controllers are often computationally inefficient, limiting their practicality. To address this limitation, we propose a learning-based controller that exploits differential flatness, a property of many robotic systems. Recent research on using flatness for learning-based control either is limited in that it (i) ignores input constraints, (ii) applies only to single-input systems, or (iii) is tailored to specific platforms. In contrast, our approach uses a system extension and block-diagonal cost formulation to control general multi-input, nonlinear, affine systems. Furthermore, it satisfies input and half-space flat state constraints and guarantees probabilistic Lyapunov decrease using only two sequential convex optimizations. We show that our approach performs similarly to, but is multiple times more efficient than, a Gaussian process model predictive controller in simulation, and achieves competitive tracking in real hardware experiments.

2604.24691 2026-04-28 eess.SY cs.SY

Reachability Analysis of the State Transition and State Covariance Matrices for an LTV System

Fengjiao Liu, Yixiao Zhang, Panagiotis Tsiotras

Comments 12 pages, 2 figures

详情
英文摘要

In this paper, we study the reachability of two closely related matrices appearing in the analysis of linear time-varying (LTV) systems over a finite time interval, namely, its closed-loop state transition matrix via a state feedback control and its state covariance matrix starting from some given initial state covariance matrix. Under a mild assumption, we first characterize the set of closed-loop terminal state transition matrices reachable from the identity matrix using controls of the state feedback form. Then, we provide the set of terminal state covariance matrices reachable from any given positive definite initial state covariance matrix when the LTV system is not necessarily controllable. Both results are based on the solutions of corresponding matrix Riccati differential equations (RDE).

2604.24663 2026-04-28 math.OC cs.LG cs.SY eess.SY

Dual Control of Linear Systems from Bilinear Observations with Belief Space Model Predictive Control

Daniel Cao, Beixi Du, Andrew Lowitt, Sunmook Choi, Sarah Dean, Yahya Sattar

详情
英文摘要

We study finite-horizon quadratic control of linear systems with bilinear observations, in which the control input affects not only the state dynamics but also the partial observations of the state. In this setting, the separation principle can fail because control inputs influence the future quality of state estimates. State estimation requires an input-dependent Kalman filter whose gain and error covariance evolve as functions of the control inputs. To address this challenge, we propose a belief-space model predictive control ($\texttt{B-MPC}$) method that plans directly over both the estimated state and its error covariance. In particular, $\texttt{B-MPC}$ plans with a deterministic surrogate of the belief evolution defined by the input-dependent Kalman filter. Through numerical experiments in two synthetic settings, we show that $\texttt{B-MPC}$ can outperform both the separation-principle controller and its MPC variant in favorable regimes, and that these gains are accompanied by lower estimation covariance and more uncertainty-aware action choices.

2604.24606 2026-04-28 cs.RO cs.SY eess.SY

Hybrid A*-Based Reverse Path-Planning of a Vehicle with Trailer System

Xincheng Cao, Haochong Chen, Bilin Aksun-Guvenc, Levent Guvenc, Brian Link, Peter J Richmond, Dokyung Yim, Shihong Fan, John Harber

详情
英文摘要

Reverse parking maneuvering of a vehicle with trailer system is a difficult task to complete for human drivers due to the multi-body nature of the system and the unintuitive controls required to orientate the trailer properly. The problem is complicated with the presence of other vehicles that the trailer and its connected vehicle must avoid during the reverse parking maneuver. While path planning methods in reverse motion for vehicles with trailers exist, there is a lack of results that also offer collision avoidance as part of the algorithm. This paper hence proposes a modified Hybrid A*-based algorithm that can accommodate the vehicle-trailer system as well as collision avoidance considerations with the other vehicles and obstacles in the parking environment. One of the novelties of this proposed approach is its adaptability to the vehicle with trailer system, where limits of usable steering input that prevent the occurrence of jackknife incidents vary with respect to system configuration. The other contribution is the addition of the collision avoidance functionality which the standard Hybrid A* algorithm lacks. The method is developed and presented first, followed by simulation case studies to demonstrate the efficacy of the proposed approach.

2604.24538 2026-04-28 eess.SP

Quantization-Aware EE Optimization and SE-EE Tradeoff for MiLAC-Aided MU-MISO Beamforming

Yuchen Zhang, Pinjun Zheng, Tareq Y. Al-Naffouri

Comments This paper has been submitted to the IEEE for possible publication

详情
英文摘要

In large antenna arrays, hardware power consumption becomes a dominant design constraint, making energy efficiency (EE) a first-class objective alongside spectral efficiency (SE). Microwave linear analog computer (MiLAC)-aided beamforming, whose front end is a passive reciprocal stream-to-antenna network, addresses this tension by reducing the active radio-frequency chain count to the stream number, at a moderate SE cost. Despite this promise, no EE optimization framework has been established for MiLAC-aided beamforming that accounts for digital-to-analog converter quantization noise and post-quantized transmit power. We fill this gap for downlink multiuser multiple-input single-output (MU-MISO) systems by formulating quantization-aware EE maximization over the MiLAC-feasible beamformer and characterizing the resulting SE-EE tradeoff. Three contributions follow. First, we prove a row-space optimality property of the effective MiLAC-aided beamformer, yielding an equivalent reduced-dimension reformulation whose complexity scales with the stream number rather than the antenna number. Second, we develop a low-complexity Dinkelbach-weighted minimum mean-square error algorithm aided by projected gradient descent that is guaranteed to converge to a stationary point. Third, we cast the SE-EE tradeoff as a multi-objective problem and trace its Pareto boundary via a weighted-sum method that combines an alternative reduced-dimension coordinate with auxiliary-variable successive convex approximation, yielding convex per-iteration subproblems with guaranteed convergence. Numerical results on a DeepMIMO v4 deployment show MiLAC-aided beamforming substantially improves EE over digital and hybrid benchmarks at a moderate SE cost and significantly expands the achievable SE-EE operating region.

2604.24518 2026-04-28 eess.SY cs.RO cs.SY math.OC

Sliding Mode Control for Safe Trajectory Tracking with Moving Obstacles Avoidance: Experimental Validation on Planar Robots

Shubham Sawarkar, P Sangeerth, S Saharsh, Pushpak Jagtap

详情
英文摘要

This paper presents a unified control framework for robust trajectory tracking and moving obstacle avoidance applicable to a broad class of mobile robots. By formulating a generalized kinematic transformation, we convert diverse vehicle dynamics into a strict feedback form, facilitating the design of a Sliding Mode Control (SMC) strategy for precise and robust reference tracking. To ensure operational safety in dynamic environments, the tracking controller is integrated with a Collision Cone Control Barrier Function (C3BF) based safety filter. The proposed architecture guarantees asymptotic tracking in the presence of external disturbances while strictly enforcing collision avoidance constraints. The novelty of this work lies in designing a sliding mode controller for ground robots like the Ackermann drive, which has not been done before. The efficacy and versatility of the approach are validated through numerical simulations and extensive real-world experiments on three distinct platforms: an Ackermann-steered vehicle, a differential drive robot, and a quadrotor drone. Video of the experiments are available at https://youtu.be/dWcxwum96vk

2604.24501 2026-04-28 cs.NI cs.SY eess.SY

TARMM: Scaling Delay-Critical Edge AI Offloading in 5G O-RAN via Temporal Graph Mobility Management

Peihao Yan, Yun Chen, Jie Lu, Qijun Wang, Huacheng Zeng

详情
英文摘要

Emerging delay-critical edge AI applications, such as VR perception and real-time video analytics, impose stringent latency and reliability requirements on 5G networks. However, existing mobility management mechanisms are largely reactive and fail to adapt to dynamic network conditions, resulting in suboptimal handover decisions and degraded performance. In this paper, we present TARMM, a 5G Open Radio Access Network (O-RAN) system that optimizes user mobility management for delay-critical edge AI offloading. The core of TARMM is a temporal graph model that captures the spatiotemporal dynamics of the RAN across users and cells, enabling near real-time handover decisions. Building on this representation, we design a multi-agent reinforcement learning (MARL) framework with rule-based action masking and proactive resource preparation to ensure safe, stable, and efficient handovers. We implement TARMM on a multi-cell indoor 5G O-RAN testbed and evaluate it using diverse VR workloads. Extensive experiments show that TARMM reduces tail latency by up to 44% and packet loss by up to 56% compared to state-of-the-art approaches.

2604.24440 2026-04-28 cs.FL cs.SY eess.SY

Minimum Reachability Probabilities in Rectangular Automata with Random Clocks

Joanna Delicaris, Erika Ábrahám, Anne Remke

Comments This paper is accepted for publication (without appendix) in the Proceedings of the 32nd International Symposium on Model Checking Software (SPIN 2026). The appendix was part of the submission and provides additional material which is not included in the SPIN publication

详情
英文摘要

Control applications for cyber-physical systems must make reliably safe control decisions in the presence of continuous dynamics as well as stochastic uncertainty. Providing safety guarantees for such systems requires formal modeling and analysis techniques that capture these aspects. For modeling, in this paper we consider rectangular automata with random clocks under prophetic scheduling. For this model class, existing methods can compute only upper bounds on reachability probabilities, enabling optimistic, best-case safety reasoning. We complement this view by introducing a novel method to compute lower bounds, thereby enabling worst-case analysis that is essential for safety-critical applications. Although both upper and lower bounds rely on reachability analysis, they are not dual: computing lower bounds requires an explicit separation of stochastic and nondeterministic choices along executions. We implement our approach and demonstrate its practical feasibility on an electric vehicle charging scenario, showing that meaningful worst-case guarantees can be obtained.

2604.24404 2026-04-28 cs.CR eess.SP

From Spoofing to Trust: Emergency Alerts Spoofing Testbed and Cross-Cell Verification

Abdallah Abou Hasna, Nada Chendeb, Ammar El Falou

Comments To appear in the IEEE Vehicular Technology Conference (VTC)-Spring

详情
英文摘要

Public warning systems (PWS) in cellular networks enable authorities to broadcast emergency alerts to all mobile phones in a geographic region in the event of threats such as earthquakes or severe weather. If an attacker can imitate these alerts and transmit a forged warning containing fake news or phishing links, the impact could range from public panic to user compromise. In this work, we present the first open-source 5G emergency alert spoofing attack, implemented by modifying the openairinterface (OAI) radio access network (RAN) code and executed using a software-defined radio, complemented by a custom network management system to automate network and warning configuration. We conduct a detailed analysis of how different smartphones behave under various conditions. Our findings show that while devices readily display spoofed alerts, the alerting mechanism enables multiple practical attack scenarios beyond simple warning display. Finally, to address this threat, we propose and implement a lightweight cross-cell verification mechanism in OAI, in which the device compares the received warning with neighboring cell broadcasts to flag single-source alerts as suspicious.

2604.24401 2026-04-28 cs.SD cs.AI cs.CL eess.AS

All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

Leonardo Haw-Yang Foo, Chih-Kai Yang, Chen-An Li, Ke-Han Lu, Hung-yi Lee

Comments 6 pages, 3 figures, 5 tables

详情
英文摘要

Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, the benchmark fails as a measure of auditory understanding. We present a diagnostic framework using two axes: text prior, which measures answerability from text and general knowledge alone, and audio reliance, which assesses actual dependency on the acoustic signal. Evaluating eight LALMs across three benchmarks, we find that models retain 60-72% of their full audio scores even without any audio input. Moreover, among items that require audio, only 3.0-4.2% need the complete audio clip; the majority can be resolved using localized fragments. These findings challenge the assumption that benchmark performance equals robust audio understanding, and we conclude with practical guidelines for improving evaluation reliability and benchmark design.

2604.24394 2026-04-28 eess.SY cs.SY

A Realistic Discrete Event Simulation model for Ambulance Location and Deployment within a regional Emergency Medical Service

Alberto De Santis, Stefania Iannazzo, Fabio Ingravalle, Stefano Lucidi, Massimo Maurici, Giulia Riccardi, Massimo Roma, Antonio Vinci

Comments 41 pages, 12 figures

详情
英文摘要

The objective of Emergency Medical Services (EMSs) is to promptly respond to calls from citizens for first aid, providing pre-hospital care and, if necessary, to transfer patients to an appropriate Emergency Department (ED) by ambulance. The efficiency of such a system strongly depends on the deployment of ambulance home bases, i.e., locations where ambulances and their crews are strategically positioned, ready to respond to emergency calls. This paper presents a general Discrete Event Simulation (DES) model designed to capture the stochastic behaviour and workflow of regional ambulance emergency systems. The proposed model incorporates and integrates information collected from different sources, reproducing very accurately the operation of the ambulance system, thus allowing a more comprehensive and realistic analysis. To show the applicability and reliability of the proposed general model, a case study provided by the Azienda Regionale Emergenza Sanitaria - ARES 118 (an Italian Regional Emergency Medical Services Authority - ARES~118}) is presented. It concerns a territory within the Lazio region of Italy, including a medium-size city along with sparsely populated areas. The reported results about scenario analyses highlight how the model we propose can be fruitfully used by the managers to improve effectiveness and quickness of the entire regional EMS system.

2604.24386 2026-04-28 cs.SD eess.AS

An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization

Leekyung Kim, Jonghun Park

Comments accepted to ICASSP 2026

详情
英文摘要

Automatic chord recognition (ACR) extracts time-aligned chord labels from music audio recordings. Despite recent advances, ACR still struggles with oversegmentation, data scarcity, and imbalance, especially in recognizing complex chords such as non-triads, which are unpopular in existing datasets. To address these challenges, we reformulate ACR as a segment-level sequence-to-sequence prediction task, where chord sequences are predicted auto-regressively rather than frame by frame. This design mitigates excessive segmentation by detecting chord changes only at segment boundaries. We further introduce two types of token representations and an encoder pre-training method, both specifically designed for time-aligned chord modeling. Experimental results show that our model improves performance in both chord recognition and segmentation, with notable gains for complex and infrequent chord types. These findings demonstrate the effectiveness of segment-level sequence modeling, structured tokenization, and representation learning for advancing chord recognition systems.

2604.24369 2026-04-28 eess.SP cs.NI

Beam Scheduling for Cross-Layer ISAC: A Deep Reinforcement Learning Approach

Xiyu Wang, Gilberto Berardinelli, Hei Victor Cheng, Petar Popovski, Ramoni Adeogun

详情
英文摘要

Resource allocation in integrated sensing and communication (ISAC) systems needs to be optimized to balance the requirements of the communication and sensing modules considering complicated cross-layer data traffic and queue status in dynamic multi-user environments. This paper studies the beam allocation for cross-layer ISAC that achieves low-latency communication and minimizes sensing parameters estimation error. To handle the complex coupling between practical data buffer dynamics and varying wireless channels, we propose a deep reinforcement learning (DRL)-assisted approach. Rather than relying on explicit channel state information, the DRL-assisted beam allocation reduces feedback overhead by leveraging sensing observations. Simulation results verify that the DRL framework effectively takes buffer status into account and adapts to the wireless environment while allocating resources. The proposed multi-beam scheme improves overall throughput with only modest delay increases. Finally, the DRL-assisted beam management achieves both communication and sensing performance close to that of the genie-aided benchmark with perfect angle-of-departure (AoD) knowledge. These contributions advance the state-of-the-art intelligent resource management for ISAC systems.

2604.24347 2026-04-28 eess.IV cs.CV cs.LG

Semantic Segmentation for Histopathology using Learned Regularization based on Global Proportions

Yangping Li, Thomas Pinetz, Michael Hölzel, Marieta Toma, Alexander Effland

详情
英文摘要

In pathology, the spatial distribution and proportions of tissue types are key indicators of disease progression, and are more readily available than fine-grained annotations. However, these assessments are rarely mapped to pixel-wise segmentation. The task is fundamentally underdetermined, as many spatially distinct segmentations can satisfy the same global proportions in the absence of pixel-wise constraints. To address this, we introduce Variational Segmentation from Label Proportions (VSLP), a two-stage framework that infers dense segmentations from global label proportions, without any pixel-level annotations. This framework first leverages a pre-trained transformer model with test-time augmentation to produce a pixel-wise confidence estimate. In the second stage, these estimates are fused by solving a variational optimization problem that incorporates a Wasserstein data fidelity term alongside a learned regularizer. Unlike end-to-end networks, our variational method can visualize the fidelity-regularization energy, resulting in more interpretable segmentation. We validate our approach on two public datasets, achieving superior performance over existing weakly supervised and unsupervised methods. For one of these datasets, proportions have been estimated by an experienced pathologist to provide a realistic benchmark to the community. Furthermore, the method scales to an in-house dataset with noisy pathologist labels, severely outperforming state-of-the-art methods, thereby demonstrating practical applicability. The code and data will be made publicly available upon acceptance at https://github.com/xiaoliangpi/VSLP.

2604.24303 2026-04-28 eess.SP

Two-Layer Microwave Linear Analog Computer (MiLAC)-aided Multi-user MISO Networks

Xiaohua Zhou, Tianyu Fang, Yijie Mao, Bruno Clerckx

详情
英文摘要

Microwave linear analog computer (MiLAC)-aided transmit beamforming, which processes transmitted symbols entirely in the analog domain, has recently emerged as a promising alternative to fully digital or hybrid beamforming architectures for single-user multi-antenna systems. However, recent studies have shown that deploying a single lossless and reciprocal MiLAC at the transmitter cannot achieve the same capacity as fully digital beamforming in multi-user scenarios. To address this limitation, we propose a novel two-layer MiLAC-aided beamforming architecture at the transmitter for a downlink multi-user multiple-input single-output (MISO) network. Leveraging microwave network theory, we first prove that lossless and reciprocal two-layer MiLAC-aided beamforming can achieve the same performance as digital beamforming, and we derive a closed-form mapping from digital beamforming to two-layer MiLAC analog beamforming. Furthermore, we formulate a sum-rate maximization problem and develop an efficient optimization framework to jointly optimize the power allocation and the scattering matrices for the proposed two-layer MiLAC architecture. Numerical results validate our theoretical findings and demonstrate that two-layer MiLAC achieves the same sum-rate performance as fully digital beamforming.

2604.24294 2026-04-28 eess.SY cs.SY

AI-Native Autonomous Infrastructure (ANAI): A Formal Framework for the Next General-Purpose Technology

Hidir Selcuk Nogay

Comments 18 pages, 4 figures

详情
英文摘要

Artificial intelligence is increasingly described as a candidate next generation general purpose technology (GPT). However, existing interpretations predominantly emphasize performance scaling rather than structural transformation. This paper introduces a formal framework for evaluating AI as a systemic infrastructural transition rather than merely a computational breakthrough. We propose the concept of AI Native Autonomous Infrastructure (ANAI), defined as a regime in which decision autonomy becomes embedded within critical infrastructures. The framework operationalizes this transition through three quantitative constructs: the Autonomy Index (AIx), the Infrastructure Coupling Coefficient (ICC), and the Technological Transition Potential (TTP). We formalize the joint scaling dynamics of autonomy and infrastructural embedding, derive threshold conditions for paradigm transition, and introduce a phase-space representation of systemic transformation. A temporal transition model further illustrates how nonlinear coevolution between autonomy and infrastructure integration produces super linear growth in transition potential. Unlike prior GPT cycles, the ANAI regime exhibits a recursive energy computation feedback loop in which AI systems both increase computational demand and optimize the infrastructures that sustain them. This feedback mechanism accelerates infrastructural embedding and differentiates AI driven transformation from previous technological revolutions. By shifting analytical focus from model performance to infrastructural autonomy and coupling intensity, this study offers a conceptual and mathematical foundation for assessing whether artificial intelligence constitutes the next general purpose technology.

2604.17457 2026-04-28 math.OC cs.AI cs.SY eess.SY

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee

详情
英文摘要

Q-value iteration (Q-VI) is usually analyzed through the \(γ\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\barρ+\varepsilon)^k\), where \(\barρ\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\barρ<γ\), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to \(Q^*\), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality \(\barρ<γ\) holds or fails.

2601.02455 2026-04-28 cs.SD cs.CL eess.AS

Diagnostic-Driven Layer-Wise Compensation for Post-Training Quantization of Encoder-Decoder ASR Models

Xinyu Wang, Ziyu Zhao, Yajie Luo, Yihong Wu, Liheng Ma, Jingrui Tian, Lei Ding, Xiao-Wen Chang, Peng Lu

Comments 9 pages, 4 figures, 3 tables

详情
英文摘要

Deploying Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires aggressive low-bit weight quantization. Layer-wise post-training quantization is practical and effective, but it suffers from cross-layer error accumulation. Existing compensation methods typically use a single global strength for all layers, which is ill-suited to encoder-decoder ASR models whose acoustic encoder and linguistic decoder exhibit markedly different sensitivities to quantization noise. We propose FADE, a diagnostic-driven framework that assigns each layer an adaptive compensation coefficient by combining two complementary signals: an intrinsic vulnerability score from weight geometry and a calibration reliability score from the data-driven solution. The resulting layer-wise coefficient balances local quantization fidelity against cross-layer error correction, enabling tailored compensation without retraining or hyperparameter search. Experiments on Whisper, Moonshine, and Qwen3-ASR across four benchmarks show that FADE consistently improves mean Word Error Rate over strong baselines at both 3- and 4-bit precision while substantially reducing run-to-run variance.

2512.16318 2026-04-28 eess.AS

Learning Filters in Feedback Delay Networks from Noisy Room Impulse Responses

Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Comments Submitted to the Journal of Audio Engineering Society

详情
英文摘要

Recursion is a fundamental concept in the design of filters and audio systems. In particular, artificial reverberation systems that use delay networks depend on recursive paths to control both echo density and the decay rate of modal components. The differentiable digital signal processing framework has shown promise in automatically tuning recursive and non-recursive elements using gradient-based optimization with perceptually or physically motivated loss functions, such as energy decay or spectrogram differences. These representations are highly sensitive to model mismatches, which can lead to spurious loss minima. In particular, discrepancies in background noise can result in inaccurate attenuation estimates. This paper addresses the problem of tuning recursive attenuation filters of a feedback delay network when targets are noisy. We analyze the loss profile associated with different optimization objectives and propose a method that explicitly models noise, improving the accuracy of the estimated attenuation filters under low signal-to-noise conditions. We demonstrate the effectiveness of the proposed approach through statistical analysis on both synthetic and real target data. Furthermore, we identify the sensitivity of attenuation filter parameters tuning to perturbations in frequency-independent parameters. These findings provide practical guidelines for more robust and reproducible gradient-based optimization of feedback delay networks.

2511.19084 2026-04-28 eess.SY cs.SY math.OC

PolyOCP.jl -- A Julia Package for Stochastic OCPs and MPC

Ruchuan Ou, Learta Januzi, Jonas Schießl, Michael Heinrich Baumann, Lars Grüne, Timm Faulwasser

详情
英文摘要

The consideration of stochastic uncertainty in optimal and predictive control is a well-explored topic. Recently Polynomial Chaos Expansions (PCE) have received considerable attention for problems involving stochastically uncertain system parameters and also for problems with additive stochastic i.i.d. disturbances. While there exist a number of open-source PCE toolboxes, tailored open-source codes for the solution of OCPs involving additive stochastic i.i.d. disturbances in julia are not available. Hence, this paper introduces the toolbox PolyOCP$.$jl which enables to efficiently solve stochastic OCPs for linear systems subject to a large class of disturbance distributions. We explain the main mathematical concepts between the PCE transcription of stochastic OCPs and how they are provided in the toolbox. We draw upon two examples to illustrate the functionalities of PolyOCP$.$jl.

2509.13989 2026-04-28 eess.AS

Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

Yi-Cheng Lin, Huang-Cheng Chou, Tzu-Chieh Wei, Kuan-Yu Chen, Hung-yi Lee

Comments Accepted to ICASSP 2026

详情
英文摘要

Instruction-guided text-to-speech (ITTS) enables users to control speech generation through natural language prompts, offering a more intuitive interface than traditional TTS. However, the alignment between user style instructions and listener perception remains largely unexplored. This work first presents a perceptual analysis of ITTS controllability across two expressive dimensions (adverbs of degree and graded emotion intensity) and collects human ratings on speaker age and word-level emphasis attributes. To comprehensively reveal the instruction-perception gap, we provide a data collection with large-scale human evaluations, named Expressive VOice Control (E-VOC) corpus. Furthermore, we reveal that (1) gpt-4o-mini-tts is the most reliable ITTS model with great alignment between instruction and generated utterances across acoustic dimensions. (2) The 5 analyzed ITTS systems tend to generate Adult voices even when the instructions ask to use child or Elderly voices. (3) Fine-grained control remains a major challenge, indicating that most ITTS systems have substantial room for improvement in interpreting slightly different attribute instructions.

2509.13330 2026-04-28 eess.SY cs.SY

A hybrid dynamic model and parameter estimation method for accurately simulating overhead cranes with friction

Jorge Vicente-Martinez, Edgar Ramirez-Laboreo

Comments 10 pages, 12 figures. Major changes in all the sections

详情
英文摘要

This paper presents a new approach to accurately simulating 3D overhead cranes with friction. Although nonlinear friction dynamics has a significant impact on these systems, accurately modeling this phenomenon in simulations is a significant challenge. Traditional methods often rely on imprecise approximations of friction or require excessive computational times for reliable results. To address this, we present a hybrid dynamical model that features a trade-off between high-fidelity friction modeling and computational efficiency. Furthermore, we present a step-by-step algorithm for the comprehensive estimation of all unknown system parameters, including friction. This methodology is based on Bayesian Linear Regression and Least Squares (LS) estimations. Finally, experimental validation with a laboratory crane confirms the effectiveness of the proposed modeling and estimation approach.

2509.06027 2026-04-28 cs.SD cs.AI eess.AS

DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

Yi Yuan, Xubo Liu, Haohe Liu, Xiyuan Kang, Zhuo Chen, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

Comments Lastest arxiv version. Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing. Demos are available at https://yyua8222.github.io/DreamAudio_demopage/

详情
英文摘要

With the development of large-scale diffusion-based and language-modeling-based generative models, impressive progress has been achieved in text-to-audio generation. Despite producing high-quality outputs, existing text-to-audio models mainly aim to generate semantically aligned sound and fall short of controlling fine-grained acoustic characteristics of specific sounds. As a result, users who need specific sound content may find it difficult to generate the desired audio clips. In this paper, we present DreamAudio for customized text-to-audio generation (CTTA). Specifically, we introduce a new framework that is designed to enable the model to identify auditory information from user-provided reference concepts for audio generation. Given a few reference audio samples containing personalized audio events, our system can generate new audio samples that include these specific events. In addition, two types of datasets are developed for training and testing the proposed systems. The experiments show that DreamAudio generates audio samples that are highly consistent with the customized audio features and aligned well with the input text prompts. Furthermore, DreamAudio offers comparable performance in general text-to-audio tasks. We also provide a human-involved dataset containing audio events from real-world CTTA cases as the benchmark for customized generation tasks.

2509.03525 2026-04-28 cs.CL cs.AI eess.AS

Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sepehr Karimi, Sina Rashidi, Ali Zolnour, Maryam Dadkhah, Yasaman Haghbin, Hossein AzadMaleki, Maryam Zolnoori

详情
Journal ref
https://ai.jmir.org/2026/1/e82608
英文摘要

Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.

2507.00714 2026-04-28 eess.SP

Physical Layer Group Key Generation With the Aid of Reconfigurable Intelligent Surfaces

Vahid Shahiri, Guyue Li, Hamid Behroozi

Comments Manuscript under review

详情
英文摘要

Reconfigurable intelligent surfaces (RIS) have the ability to alter the wireless environment by making changes in the impinging signal. While RIS has been extensively studied for enhancing wireless communications, its potential for facilitating group key generation (GKG) remains unexplored. In this study, we exploit the RIS to make the aggregate reflecting channels of different user terminals (UTs) as similar as possible to be able to extract common group secret keys from their channels. Specifically, the RIS will adjust its parameters to pave the way for GKG based on the physical channels of the UTs. Our method exploits the already gathered channel state information (CSI) in the RIS to beneficially design the phase shifts and does not impose additional probing burden on the network. We consider both passive RIS (PRIS) and active RIS (ARIS) to generate the group keys. The PRIS is widely adopted in physical layer key generation (PLKG) studies due to its use of passive elements, whereas the ARIS demonstrates superior capability in aligning the aggregate reflected channels among nodes in the GKG scenario, as demonstrated in this study. We will exploit various optimization methods like successive convex approximation (SCA) and semidefinite relaxation with Gaussian randomization (SDR-GR) to address the raised optimization problems. Unlike most of the studies in the literature, our scheme can achieve a high GKG rate in static environments as well. Finally, we will examine the performance of the proposed method by normalized mean squared error (NMSE), key error rate (KER), key generation rate (KGR) and key randomness metrics. Our numerical results verify that for the equal available power budget, the ARIS significantly outperforms PRIS in NMSE and KER, achieving more than four times higher KGR.

2504.11589 2026-04-28 eess.SP cs.SY eess.SY

Accelerated Recovery with RIS: Designing Wireless Resilience in Mission-Critical Environments

Kevin Weinberger, Robert-Jeron Reifert, Aydin Sezgin, Mehdi Bennis

Comments 6 pages, 3 figures, submitted to Asilomar 2026

详情
英文摘要

As 6G and beyond redefine connectivity, wireless networks become the foundation of critical operations, making resilience more essential than ever. With this shift, wireless systems cannot only take on vital services previously handled by wired infrastructures but also enable novel innovative applications that would not be possible with wired systems. As a result, there is a pressing demand for strategies that can adapt to dynamic channel conditions, interference, and unforeseen disruptions, ensuring seamless and reliable performance in an increasingly complex environment. Despite considerable research, existing resilience assessments lack comprehensive key performance indicators (KPIs), especially those quantifying its adaptability, which are vital for identifying a system's capacity to rapidly adapt and reallocate resources. In this work, we bridge this gap by proposing a novel framework that explicitly quantifies the adaption performance by augmenting the gradient of the system's rate function. To further enhance the network resilience, we integrate Reconfigurable Intelligent Surfaces (RISs) into our framework due to their capability to dynamically reshape the propagation environment while providing alternative channel paths. Numerical results show that gradient augmentation enhances resilience by improving adaptability under adverse conditions while proactively preparing for future disruptions.

2411.09764 2026-04-28 eess.SY cs.SY

ModelPredictiveControl.jl: advanced process control made easy in Julia

Francis Gagnon, Alex Thivierge, André Desbiens, Fredrik Bagge Carlson

Comments 11 pages, 12 figures, 1 table

详情
英文摘要

Proprietary closed-source software is still the norm in advanced process control. Transparency and reproducibility are key aspects of scientific research. Free and open-source toolkit can contribute to the development, sharing and advancement of new and efficient control approaches, and the industrial sector will certainly benefit from them. This paper presents ModelPredictiveControl.jl, an open-source software package for designing model predictive controllers in the Julia programming language. It is designed to be easy to use and modular, while providing advanced features like nonlinear control and moving horizon estimation. It relies on powerful control system, mathematical optimization and automatic differentiation frameworks to simplify the construction and testing of state estimators and predictive controllers. It also integrates with the standard plotting library to quickly visualize closed-loop data. The paper presents the main functionalities and illustrates them with two case studies in simulation. The first example is a continuously stirred tank reactor described by linear dynamics. The second one implements a nonlinear, an economic, and a successive linearization model predictive controllers for an inverted pendulum. The solving times are benchmarked against equivalent implementations in MATLAB to show the efficiency of the package.

2302.11969 2026-04-28 eess.SP

XL-MIMO Channel Modeling and Prediction for Wireless Power Transfer

Benjamin J. B. Deutschmann, Thomas Wilding, Maximilian Graber, Klaus Witrisal

详情
英文摘要

Massive antenna arrays form physically large apertures with a beam-focusing capability, leading to outstanding wireless power transfer (WPT) efficiency paired with low radiation levels outside the focusing region. However, leveraging these features requires accurate knowledge of the multipath propagation channel and overcoming the (Rayleigh) fading channel present in typical application scenarios. For that, reciprocity-based beamforming is an optimal solution that estimates the actual channel gains from pilot transmissions on the uplink. But this solution is unsuitable for passive backscatter nodes that are not capable of sending any pilots in the initial access phase. Using measured channel data from an extremely large-scale MIMO (XL-MIMO) testbed, we compare geometry-based planar wavefront and spherical wavefront beamformers with a reciprocity-based beamformer, to address this initial access problem. We also show that we can predict specular multipath components (SMCs) based only on geometric environment information. We demonstrate that a transmit power of 1W is sufficient to transfer more than 1mW of power to a device located at a distance of 12.3m when using a (40x25) array at 3.8GHz. The geometry-based beamformer exploiting predicted SMCs suffers a loss of only 2dB compared with perfect channel state information.