arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.25897 2026-04-29 cs.RO cs.LG cs.SY eess.SY

Variational Neural Belief Parameterizations for Robust Dexterous Grasping under Multimodal Uncertainty

Clinton Enwerem, Shreya Kalyanaraman, John S. Baras, Calin Belta

Comments 11 pages, 10 figures

详情
英文摘要

Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poorly, obstruct gradient-based optimization, and estimate Conditional Value-at-Risk (CVaR) with high-variance approximations. We instead formulate grasp acquisition as variational inference over latent contact parameters and object pose, representing the belief with a differentiable Gaussian mixture. We use Gumbel-Softmax component selection and location-scale reparameterization to express samples as smooth functions of the belief parameters, enabling pathwise gradients through a differentiable CVaR surrogate for direct optimization of tail robustness. In simulation, our variational neural belief improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations while reducing planning time by roughly an order of magnitude relative to particle-filter model-predictive control. On a serial-chain robot arm with a multifingered hand, we validate grasp-and-lift success under object-pose uncertainty against a Gaussian baseline. Both methods succeed on the tested perturbations, but our controller terminates in fewer steps and less wall-clock time while achieving a higher tactile grasp-quality proxy. Our learned belief also calibrates risk more accurately, keeping mean absolute calibration error below 0.14 across tested simulation regimes, compared with 0.58 for a Cross-Entropy Method planner.

2604.25887 2026-04-29 cs.CV cs.AI cs.RO cs.SY eess.SY

No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control

Anas Gamal Aly, Hala ElAarag

Comments © Anas Gamal Aly and Hala ElAarag, 2026. This is the authors' version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Proceedings of the 2026 ACM Southeast Conference (ACMSE 2026)

详情
英文摘要

Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which can leave vulnerable road users (VRUs) such as the elderly, disabled, or distracted pedestrians stranded when the light changes. We introduce No Pedestrian Left Behind (NPLB), a real-time adaptive traffic signal system that monitors VRUs in crosswalks and automatically extends signal timing when needed. We evaluated five state-of-the-art object detection models on the BGVP dataset, with YOLOv12 achieving the highest mean Average Precision at 50% (mAP@0.5) of 0.756. NPLB integrates our fine-tuned YOLOv12 with ByteTrack multi-object tracking and an adaptive controller that extends pedestrian phases when remaining time falls below a critical threshold. Through 10,000 Monte Carlo simulations, we demonstrate that NPLB improves VRU safety by 71.4%, reducing stranding rates from 9.10% to 2.60%, while requiring signal extensions in only 12.1% of crossing cycles.

2604.25815 2026-04-29 eess.SY cs.SY math.AP

Backstepping Observer for the Quasilinear Heat Equation with Linear Design Gains: Beyond Local Stability

Mohamed Camil Belhadjoudja, Kirsten A. Morris

Comments This is a working document of a work in progress

详情
英文摘要

We consider the one-dimensional quasilinear heat equation with state-dependent heat capacity and thermal conductivity, and design a boundary-output observer based on the backstepping design for a linear heat equation with constant coefficients. Viewing the quasilinear system as a perturbation of the linear one, we establish exponential stability of the origin for the observation error dynamics in $H^1$, with an explicit region of attraction depending on the system parameters, observer gains, and the mismatch between the nonlinear diffusivity and the constant design diffusivity. Importantly, the observation error converges to zero rather than merely to a neighborhood scaling with this mismatch, even though, in contrast to backstepping-based stabilization of nonlinear PDEs, the mismatch need not decay along trajectories and may remain bounded away from zero, acting as a persistent state-dependent multiplicative perturbation. A technical challenge was to perform a sufficiently-fine Lyapunov analysis that does not yield overly conservative results such as mere boundedness of the observation error. Interestingly, while in the linear case the relationship between one of the backstepping observer gains and the convergence rate is monotonic, we show that in the nonlinear setting this is no longer the case: there may exist an optimal value of that gain, beyond which further increases deteriorate the system's performance. Such behavior cannot be predicted without our analysis: one might expect a priori the decay rate to be freely tunable at the expense of a region of attraction that shrinks to zero as the prescribed rate tends to infinity. However, our Lyapunov analysis (supported by numerical experiments) reveals that this intuition is incorrect.

2604.25777 2026-04-29 eess.SP cs.DC

SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

Ce Zheng, Xinghan Wang, Jiahong Ning, Yuxuan Shi, Ning Huang, Tingting Yang

Comments IEEE International Symposium on Information Theory (ISIT), 2026

详情
英文摘要

Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inference requires frequent full-model forward passes across workers, severely limiting decoding throughput. Distributed deployment further aggravates this due to a communication bottleneck: each worker must transmit full token probability distributions per draft token, dominating end-to-end latency. To address these challenges, we introduce speculative decoding to enable parallel LLM processing and propose a top-K compressed transmission scheme with two server-side reconstruction strategies. We theoretically analyze the robustness of our method in terms of local reconstruction error, aggregation bias, and acceptance-rate bias, and derive corresponding bounds. Experiments demonstrate that our scheme achieves high generation fidelity while significantly reducing communication overhead.

2604.25757 2026-04-29 cs.CR cs.AI cs.RO cs.SY eess.SY

Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms

Thomas J. Neubert, Laxima Niure Kandel, Berker Peköz

Comments Camera ready accepted for presentation at and publication in the proceedings of 2026 56st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W): Dependable and Secure Autonomous Systems (DSAS)

详情
英文摘要

Open, unclassified research on secure autonomy is constrained by limited access to operational platforms, contested communications infrastructure, and representative adversarial test conditions. This paper presents a threat-oriented digital twinning methodology for cybersecurity evaluation of learning-enabled autonomous platforms. The approach is instantiated as an open-source, modular twin of a representative autonomy stack with separated sensing, autonomy, and supervisory-control functions; confidence-gated multi-modal perception; explicit command and telemetry trust boundaries; and runtime hold-safe behavior. The contribution is methodological: a reproducible design pattern that translates threat analysis into observable, controllable tests for spoofing, replay, malformed-input injection, degraded sensing, and adversarial ML stress. Although the implemented proxy is ground based, the architecture is intentionally framed around stack elements shared with UAV and space systems, including constrained onboard compute, intermittent or high-latency links, probabilistic perception, and mission-critical recovery behavior. The result is an implementable research scaffold for dependable and secure autonomy studies across UAV and space domains.

2604.25738 2026-04-29 eess.SY cs.SY

Local Shifted Passivity Analysis of the Single-Machine Infinite-Bus System

Xinyuan Jiang

Comments 14 pages

详情
英文摘要

This letter presents a shifted passivity analysis of the single-machine infinite-bus system in the stationary ($αβ$) reference frame. We study the attractivity of a periodic synchronous steady state with constant rotor frequency and formulate shifted passivity with respect to this motion. A port-Hamiltonian representation of the machine dynamics is used to construct a local shifted passivity condition from the error Hamiltonian and a correction term adapted to the synchronous steady state. For the infinite-bus interconnection, the resulting dissipation inequality leads to a sufficient stability condition expressed in terms of field excitation magnitude, damping, inertia, and steady-state current. This condition implies local asymptotic stability of the synchronous steady state and yields a sublevel-set estimate of its region of attraction under an additional small-inertia condition. A distinctive feature of the analysis is that it preserves the periodic structure of the rotor angle and provides a compact passivity-based stability certificate for the stationary-frame model.

2604.25728 2026-04-29 eess.SP

Joint Design of Doppler-Resilient Unimodular Discrete-Phase Waveforms and Receiving Filters for MIMO Radars

Junpeng Ma, Yuke Li, Junbo Wang, Yongxing Zhou

Comments 14 pages, 7 figures

详情
英文摘要

Designing Doppler-resilient unimodular discrete phase-coded waveforms (DPWs) with low delay-Doppler sidelobes is critical for multiple-input multiple-output (MIMO) radar. Existing block coordinate descent (BCD) methods suffer from high computational cost for designing long sequences or large waveform sets. Meanwhile, learning-based alternatives such as the soft-quantization network (SQN) only address correlation optimization in the delay domain, without considering ambiguity function (AF) optimization in the joint delay-Doppler domain. To address these issues, this paper proposes a novel Doppler-resilient DPW design framework, termed SQNGD, for joint transmit-receive optimization that simultaneously optimizes the auto-AF, cross-AF (CAF), and signal-to-noise ratio loss (SNRL) under unimodular constraints. To solve the multi-objective optimization problem (MOOP), a joint transmit-receive design and an alternating optimization strategy are developed. The transmit waveforms are optimized via soft-quantization-based differentiable parameterization, while the receive filters are updated by gradient descent (GD) with an energy constraint and SNRL penalty. An FFT-accelerated evaluation of the AF and CAF is further incorporated, reducing the optimization time by 1.9x - 11x compared with the state-of-the-art (SOTA) majorization-minimization-coordinate descent (MMCD) method. Numerical results show that SQNGD achieves a peak sidelobe level (PSL) of approximately -43 dB over the Doppler range [-0.5,0.5] and -31 dB over [-600,600], respectively, outperforming MMCD by 5.85 dB and 3.45 dB, while maintaining the same SNRL of 0.5 dB.

2604.25685 2026-04-29 eess.IV cs.CV

Robustness Evaluation of a Foundation Segmentation Model Under Simulated Domain Shifts in Abdominal CT: Implications for Health Digital Twin Deployment

Sanghati Basu

Comments 8 Pages, 5 Tables, 2 Figures

详情
英文摘要

Foundation segmentation models such as the Segment Anything Model (SAM) have demonstrated strong generalization across natural images; however, their robustness under clinically realistic medical imaging domain shifts remains insufficiently quantified. We present a systematic slice-level robustness audit of SAM (ViT-B) for spleen segmentation in abdominal CT using 1,051 nonempty slices from 41 volumes in the Medical Segmentation Decathlon. A standardized ground-truth-derived bounding-box protocol was used to isolate encoder robustness from prompt uncertainty. Controlled perturbations simulating inter-scanner variability, including Gaussian noise, blur, contrast scaling, gamma correction, and resolution mismatch, were applied across ten conditions. The clean baseline achieved a mean Dice score of 0.9145 (95% CI: [0.909, 0.919]) with a failure rate of 0.67%. Across all perturbations, the absolute mean ΔDice remained below 0.01. Paired Wilcoxon signed-rank tests with Benjamini-Hochberg false discovery rate correction identified statistically significant but small-magnitude changes under selected conditions, while McNemar analysis showed no significant increase in failure probability. These findings indicate that SAM exhibits stable segmentation behavior under moderate CT domain shifts, supporting its role as a robust foundation baseline for medical image segmentation research. As health digital twins increasingly incorporate foundation segmentation models for anatomical modeling and organ-level monitoring, formal characterization of robustness under real-world imaging variability is a necessary step toward trustworthy deployment.

2604.25680 2026-04-29 cs.CV eess.IV

Exploring Remote Photoplethysmography for Neonatal Pain Detection from Facial Videos

Ashutosh Dhamaniya, Anup Kumar Gupta, Trishna Saikia, Puneet Gupta

Comments 25 pages, 9 figures, 10 tables. Proposed rPPG-based method for neonatal pain detection from facial videos, with multimodal (rPPG + audio) analysis and extensive ablation studies on the iCOPEvid dataset

详情
英文摘要

Unaddressed pain in neonates can lead to adverse effects, including delayed development and slower weight gain, emphasising the need for more objective and reliable pain assessment methods. Hence, automated methods using behavioural and physiological pain indicators have been developed to aid healthcare professionals in the Neonatal ICU. Traditional contact-based methods for physiological parameter estimation are unsuitable for long-term monitoring and increase the risk of spreading diseases like COVID-19. We introduce a novel approach using remote photoplethysmography (rPPG) to estimate pulse signals in a non-contact manner and employ them for neonatal pain detection. The temporal signals acquired from regions-of-interest (ROIs) affected by skin deformations may exhibit lower quality and provide erroneous rPPG signals. Therefore, we incorporated a quality parameter to select the temporal signals obtained from ROIs that are least affected by skin deformations. Further, we employed signal-to-noise ratio as a fitness parameter to extract the rPPG signal corresponding to the clip that is least affected by noise. Experimental findings demonstrate that the rPPG signals provide useful information for neonatal pain detection, and signals extracted from the blue colour channel outperform those extracted from other colour channels. We also show that combining rPPG and audio features provides better results than individual modalities.

2604.25650 2026-04-29 cs.SE cs.SY eess.SY

Using Large Language Models for Black-Box Testing of FMU-Based Simulations

Abdullah Mughees, Gaadha Sudheerbabu, Tanwir Ahmad, Dragos Truscan, Mikael Manngård, Kristian Klemets

详情
英文摘要

We propose a human in the loop approach for black-box testing of Functional Mock-up Units (FMUs) using Large Language Models (LLMs). The goal is to reduce the manual effort in defining test scenarios for dynamic simulation models and to improve the interpretability of results. The approach takes the functional and interface specifications of an FMU as input, and prompts an LLM to generate structured scenario goals in Given-When-Then format that define the initial input conditions of the simulation, a possible change in those conditions, and the expected output behaviour of the system against those changes. The corresponding scenario plans specify input patterns and add assertion oracles that describe expected output patterns defined in scenario goals. The approach generates a complete input time series for the scenario plans, runs the FMU simulation, and evaluates assertions on the recorded outputs. It produces human-readable logs and plots that show statistics for each scenario with overlays, aggregate pass rates, and per-goal outcomes. The generated scenarios and results are stored for evaluation and later re-execution. We evaluate the approach on a Lube Oil Cooling system and discuss design choices that make the approach practical for everyday use. Results suggest that LLM-assisted scenario generation can facilitate automatic test design and verification of dynamic simulation models.

2604.25624 2026-04-29 eess.AS

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Chong-Xin Gan, Peter Bell, Man-Wai Mak, Zhe Li, Zezhong Jin, Zilong Huang, Kong Aik Lee

Comments Submitted to Interspeech 2026

详情
英文摘要

The joint training of speech enhancement and speaker embedding networks for speaker recognition is widely adopted under noisy acoustic environments. While effective, this paradigm often fails to leverage the generalization and robustness benefits inherent in large-scale speech enhancement pre-training. Moreover, maintaining the speaker information in the denoised speech is not an explicit objective of the speech enhancement process. To address these limitations, we proposed a scalable \textbf{U}Net-based \textbf{F}usion framework (UF-EMA) that considers the noisy and enhanced speech as a multi-channel input, thereby enabling the speaker encoder to exploit speaker information effectively. In addition, an \textbf{E}xponential \textbf{M}oving \textbf{A}verage strategy is applied to a speaker encoder pre-trained on clean speech to mitigate overfitting and facilitate a smooth transition from clean to noisy conditions. Experimental results on multiple noise-contaminated test sets showcase the superiority of the proposed approach.

2604.25592 2026-04-29 q-bio.NC eess.SP

A geometry aware framework enhances noninvasive mapping of whole human brain dynamics

Song Wang, Kexin Lou, Chen Wei, Zhiyuan Sheng, Jiahao Tang, Kaining Peng, Xinke Shen, Shuhao Mei, Liang Chen, Dongfeng Gu, Quanying Liu

详情
英文摘要

Non-invasive electrophysiology lacks methods that accurately reconstruct whole-brain spatiotemporal dynamics while incorporating individual cortical geometry, leaving current electroencephalography and magnetoencephalography source imaging limited by simplistic or biologically implausible priors. Here, we show that embedding participant-specific Geometric Basis Functions (GBFs), eigenmodes derived from each individual's cortical surface, provides a powerful anatomic constraint that resolves the inverse problem and improves reconstruction fidelity. The method reconstructs neural sources as linear combinations of geometric basis functions, thereby aligning source estimates with the geometric organization of neural dynamics. We validate GBF across the Meta-Source Benchmark, task-evoked data, resting-state networks, intracranial stimulation, and epilepsy data. The results demonstrate that GBF yields high localization accuracy and captures fast spatiotemporal dynamics consistent with anatomical pathways. These findings suggest that both spontaneous and evoked whole-brain activity can be described by hundreds of geometric modes, providing a compact yet accurate representation of neural sources. By linking cortical geometry to electrophysiological dynamics, GBF offers a versatile source imaging tool for both scientific and clinical applications.

2604.25591 2026-04-29 eess.AS cs.AI cs.CL cs.LG cs.SD

Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee

Comments Manuscript in progress

详情
英文摘要

Recent audio-aware large language models (ALLMs) have demonstrated strong capabilities across diverse audio understanding and reasoning tasks, but they still frequently produce hallucinated or overly confident outputs. While uncertainty estimation has been extensively studied in text-only LLMs, it remains largely unexplored for ALLMs, where audio-conditioned generation introduces additional challenges such as perceptual ambiguity and cross-modal grounding. In this work, we present the first systematic empirical study of uncertainty estimation in ALLMs. We benchmark five representative methods, including predictive entropy, length-normalized entropy, semantic entropy, discrete semantic entropy, and P(True), across multiple models and diverse evaluation settings spanning general audio understanding, reasoning, hallucination detection, and unanswerable question answering. Our results reveal two key findings. First, semantic-level and verification-based methods consistently outperform token-level baselines on general audio reasoning benchmarks. Second, on trustworthiness-oriented benchmarks, the relative effectiveness of uncertainty methods becomes notably more model- and benchmark-dependent, indicating that conclusions drawn from general reasoning settings do not straightforwardly transfer to hallucination and unanswerable-question scenarios. We further explore uncertainty-based adaptive inference as a potential downstream application. We hope this study provides a foundation for future research on reliable, uncertainty-aware audio-language systems.

2604.25541 2026-04-29 eess.SP cs.RO

Bridging the Indoor-Outdoor Gap: Cross-Technology Ranging for Seamless Robot Navigation

Paul Schwarzbach

详情
英文摘要

Mobile robots that move between outdoor and indoor environments still struggle with consistent positioning. Satellite-based and terrestrial ranging each work well in their home domains, but combining them at the raw measurement level has received little attention, and the building boundary is precisely where both classes degrade. This paper reports preliminary observations from the HYMN dataset, which time-synchronizes raw measurements from GNSS, Ultra-Wideband (UWB), WiFi Fine Time Measurement (FTM), and Bluetooth Low Energy (BLE) against millimeter-level ground truth in an industrial setting. Per-zone measurement availability and ranging-residual behavior are characterised. The two technology classes turn out to be complementary, and the indoor-outdoor transition is where their weaknesses overlap. The dataset is publicly available.

2604.22821 2026-04-29 cs.SD cs.LG eess.AS

Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use

Ramit Pahwa, Apoorva Beedu, Parivesh Priye, Rutu Gandhi, Saloni Takawale, Aruna Baijal, Zengli Yang

详情
英文摘要

Voice assistants increasingly rely on Speech Language Models (SpeechLMs) to interpret spoken queries and execute complex tasks, yet existing benchmarks lack domain breadth, acoustic diversity, and compositional reasoning complexity to evaluate tool-calling performance. We introduce Audio2Tool, a large-scale dataset comprising approximately 30,000 queries designed to assess tool-calling capabilities of SpeechLMs across three primary domains: Smart Car, Smart Home, and Wearables. Our benchmark features a multi-tier complexity hierarchy, ranging from simple direct commands to complex multi-intent and needle-in-a-haystack extraction to isolate distinct failure modes. To ensure realism, we employ zero-shot voice cloning text-to-speech synthesis and diverse noise profiles to simulate in-the-wild conditions. Evaluations of state-of-the-art SpeechLMs and ASR-LLM pipelines show strong performance on simple commands but significant degradation under compositional and acoustic challenges. Code and dataset are publicly available on the project page: https://audio2tool.github.io/.

2512.22578 2026-04-29 eess.SP

A Novel Geometry-Aware GPR-Based Energy-Efficient and Low-Overhead Channel Estimation Scheme

Syed Luqman Shah, Nurul Huda Mahmood

Comments Submitted for possible publication in IEEE

详情
英文摘要

Accurate channel state information (CSI) acquisition under tight pilot and training-energy constraints is essential for next-generation wireless networks. In this work, we model the wireless channel as a proper complex Gaussian process over the transmit and receive antenna arrays, reducing pilot overhead and training energy by estimating the CSI from partial observations. We formulate the CSI acquisition problem as a highly underdetermined Bayesian linear inverse problem. We develop a Gaussian process regression (GPR) framework that reconstructs the full CSI from sparse and noisy observations by extrapolating to the unknown entries. To incorporate propagation information into the GPR prior, we introduce a novel array-geometry-based kernel and prove that it is Hermitian positive semidefinite. The proposed kernel better captures the channel spatial correlations through richer hyperparameters. Our GPR-based CSI extrapolation approach learns the channel hyperparameters online from sparse, noisy pilot measurements within each coherence block. Numerical results show that the proposed estimator reduces pilot overhead by up to 75 percent and total training energy by up to 93.75 percent, while maintaining lower normalized mean-square error and higher spectral efficiency in the low-to-moderate signal-to-noise-ratio regime.

2512.05552 2026-04-29 eess.SY cs.SY

Inverse Linear-Quadratic Gaussian Differential Games

Lucas Günther, Felix Thömmes, Karl Handwerker, Balint Varga, Sören Hohmann

详情
英文摘要

This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.

2509.08470 2026-04-29 eess.AS cs.AI

Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition

Jing-Tong Tzeng, Carlos Busso, Chi-Chun Lee

Comments Accepted by IEEE Transactions on Audio, Speech and Language Processing (TASLP)

详情
英文摘要

Speech emotion recognition (SER) plays a critical role in building emotion-aware speech systems, but its performance degrades significantly under noisy conditions. Although speech enhancement (SE) can improve robustness, it often introduces artifacts that obscure emotional cues and adds computational overhead to the pipeline. Multi-task learning (MTL) offers an alternative by jointly optimizing SE and SER tasks. However, conventional shared-backbone models frequently suffer from gradient interference and representational conflicts between tasks. To address these challenges, we propose the Sparse Mixture-of-Experts Representation Integration Technique (Sparse MERIT), a flexible MTL framework that applies frame-wise expert routing over self-supervised speech representations. Sparse MERIT incorporates task-specific gating networks that dynamically select from a shared pool of experts for each frame, enabling parameter-efficient and task-adaptive representation learning. Experiments on the MSP-Podcast corpus show that Sparse MERIT consistently outperforms baseline models on both SER and SE tasks. Under the most challenging condition of -5 dB signal-to-noise ratio (SNR), Sparse MERIT improves SER F1-macro by an average of 12.0% over a baseline relying on a SE pre-processing strategy, and by 3.4% over a naive MTL baseline, with statistical significance on unseen noise conditions. For SE, Sparse MERIT improves segmental SNR (SSNR) by 28.2% over the SE pre-processing baseline and by 20.0% over the naive MTL baseline. These results demonstrate that Sparse MERIT provides robust and generalizable performance for both emotion recognition and enhancement tasks in noisy environments.

2501.17653 2026-04-29 cs.LG cs.CE eess.SP

Drivetrain simulation using variational autoencoders

Pallavi Sharma, Jorge-Humberto Urrea-Quintero, Bogdan Bogdan, Adrian-Dumitru Ciotec, Laura Vasilie, Henning Wessels, Matteo Skull

Comments 27 pages

详情
英文摘要

This work proposes variational autoencoders (VAEs) to predict a vehicle's jerk signals from torque demand in the context of limited real-world drivetrain datasets. We implement both unconditional and conditional VAEs, trained on experimental data from two variants of a fully electric SUV with differing torque and drivetrain configurations. The VAEs synthesize jerk signals that capture characteristics from multiple drivetrain scenarios by leveraging the learned latent space. A performance comparison with baseline physics-based and hybrid models confirms the effectiveness of the VAEs, without requiring detailed system parametrization. Unconditional VAEs generate realistic jerk signals without prior system knowledge, while conditional VAEs enable the generation of signals tailored to specific torque inputs. This approach reduces the dependence on costly and time-intensive real-world experiments and extensive manual modeling. The results support the integration of generative models such as VAEs into drivetrain simulation pipelines, both for data augmentation and for efficient exploration of complex operational scenarios, with the potential to streamline validation and accelerate vehicle development.

2501.10842 2026-04-29 eess.SY cs.SY

BOOST: Microgrid Sizing using Ordinal Optimization

Mohamad Chehade, Sami Karaki

详情
英文摘要

Sizing a residential microgrid efficiently requires solving a coupled design-and-operation problem: photovoltaic (PV) and battery capacities should be chosen in a way that reflects how the system will actually be dispatched over time. This paper proposes BOOST, or Battery-solar Ordinal Optimization Sizing Technique, which combines ordinal optimization (OO) with mixed-integer linear programming (MILP). OO is used to screen a large set of candidate battery/PV designs with a simple linear model and then re-evaluate only the most promising designs with a more accurate MILP that captures diesel commitment logic. Relative to the original short paper, this expanded manuscript retains the full methodological narrative but refreshes the quantitative section using a new synthetic benchmark dataset suite generated from the released clean reimplementation. The suite contains five yearly synthetic datasets/configurations: base, cheap battery, cheap PV, expensive diesel, and high peak tariff. On the base synthetic dataset, the best accurate design is a 500 kWh battery with 1833.3 kW of PV, achieving 13.169 c/kWh, while BOOST improves upon dynamic programming and greedy baselines. Across the full 10 x 10 design grid, the LP and MILP rankings are effectively identical (rho = 1.000), the paper-style choice of N = 90 and s = 18 recovers the global accurate optimum, and the OO-based workflow reduces runtime by 51.8% relative to exhaustive accurate evaluation on the refreshed synthetic benchmark run. Because these added datasets are synthetic, they should be read as methodological stress tests rather than as direct empirical claims about any specific real-world site. Code is available at https://github.com/MFHChehade/Microgrid-Optimization.

2412.02315 2026-04-29 eess.SY cs.SY

Topology Reconstruction of a Resistor Network with Limited Boundary Measurements: An Optimization Approach

Shivanagouda Biradar, Deepak U Patil

详情
英文摘要

A problem of reconstruction of the topology and the respective edge resistance values of an unknown circular planar passive resistive network using limitedly available resistance distance measurements is considered. We develop a multistage topology reconstruction method, assuming that the number of boundary and interior nodes, the maximum and minimum edge conductance, and the Kirchhoff index are known apriori. First, a maximal circular planar electrical network consisting of edges with resistors and switches is constructed; no interior nodes are considered. A sparse difference in convex program $\mathbfΠ_1$ accompanied by round down algorithm is posed to determine the switch positions. The solution gives us a topology that is then utilized to develop a heuristic method to place the interior nodes. The heuristic method consists of reformulating $\mathbfΠ_1$ as a difference of convex program $\mathbfΠ_2$ with relaxed edge weight constraints and the quadratic cost. The interior node placement thus obtained may lead to a non-planar topology. We then use the modified Auslander, Parter, and Goldstein algorithm to obtain a set of planar network topologies and re-optimize the edge weights by solving $\mathbfΠ_3$ for each topology. Optimization problems posed are difference of convex programming problem, as a consequence of constraints triangle inequality and the Kalmansons inequality. A numerical example is used to demonstrate the proposed method.

2410.06723 2026-04-29 eess.IV cs.CV cs.LG

Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Fredrik K. Gustafsson, Mattias Rantalainen

详情
英文摘要

Pathology foundation models (PFMs) have emerged as powerful pretrained encoders for computational pathology, but their robustness under clinically relevant distribution shifts remains insufficiently understood. We benchmark the robustness of recent PFMs in the setting of prostate cancer grading from whole-slide images (WSIs). Using the PANDA dataset, we evaluate PFMs as frozen patch-level feature extractors within weakly supervised slide-level grading models, and assess robustness to two important forms of distribution shift: shifts in WSI image appearance across collection sites, and shifts in the label distribution over cancer grade groups. Across in-distribution settings, PFMs consistently achieve strong performance and clearly outperform a natural-image baseline. Under cross-site transfer from Radboud to Karolinska, however, performance drops substantially for all models, showing that large-scale pretraining alone does not guarantee robust downstream generalization. In contrast, PFMs are less sensitive to label-distribution shift, indicating that visually grounded domain shift is the dominant challenge. Representation analysis further supports these findings by revealing persistent domain separation between sites across all PFMs. While grade-related structure is present, it is comparatively weak, indicating that domain-related variation dominates in the learned feature space. Together, these results provide a comprehensive benchmark of PFMs under distribution shift and highlight an important practical message: although PFMs provide strong representations, generalizability remains constrained by the quality and diversity of the data used to train downstream prediction models.

2211.12080 2026-04-29 cs.SD eess.AS

Robust Training for Speaker Verification against Noisy Labels

Zhihua Fang, Liang He, Hanhan Ma, Xiaochen Guo, Lin Li

Comments Accepted by INTERSPEECH 2023

详情
Journal ref
Interspeech 2023
英文摘要

The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a novel two-stage learning method to filter out noisy labels from speaker datasets. Since a DNN will first fit data with clean labels, we first train the model with all data for several epochs. Then, based on this model, the model predictions are compared with the labels using our proposed the OR-Gate with top-k mechanism to select the data with clean labels and the selected data is used to train the model. This process is iterated until the training is completed. We have demonstrated the effectiveness of this method in filtering noisy labels through extensive experiments and have achieved excellent performance on the VoxCeleb (1 and 2) with different added noise rates.

1803.11131 2026-04-29 eess.SP cs.NA math.NA

Novel Fourier Quadrature Transforms and Analytic Signal Representations for Nonlinear and Non-stationary Time Series Analysis

Pushpendra Singh

Comments 25 pages, 13 figures

详情
Journal ref
Royal Society Open Science, November 28, 2018
英文摘要

The Hilbert transform (HT) and associated Gabor analytic signal (GAS) representation are well-known and widely used mathematical formulations for modeling and analysis of signals in various applications. In this study, like the HT, to obtain the quadrature component of a signal, we propose novel discrete Fourier cosine quadrature transforms (FCQTs) and discrete Fourier sine quadrature transforms (FSQTs), designated as Fourier quadrature transforms (FQTs). Using these FQTs, we propose sixteen Fourier quadrature analytic signal (FQAS) representations with following properties: (1) real part of eight FQAS representations is the original signal and imaginary part of each representation is FCQT of real part, (2) imaginary part of eight FQAS representations is the original signal and real part of each representation is FSQT of imaginary part, (3) like the GAS, Fourier spectrum of the all FQAS representations has only positive frequencies, however unlike the GAS, real and imaginary parts of FQAS representations are not orthogonal. The Fourier decomposition method (FDM) is an adaptive data analysis approach to decompose a signal into a set of Fourier intrinsic band functions. This study also proposes new formulations of the FDM using discrete cosine transform with GAS and FQAS representations, and demonstrates its efficacy for improved time-frequency-energy representation and analysis of many real-life nonlinear and non-stationary signals.

2604.25527 2026-04-29 eess.SY cs.SY

Multi-layer barrier adaptation of the discrete-time super-twisting controller

Antoine Thibault Vié, Leonid Fridman, Roberto Galeazzi, Dimitrios Papageorgiou

Comments 6 pages, accepted to 18th International Workshop on Variable Structure Systems

详情
英文摘要

In digital sliding mode control implementations, discretization-induced chattering and inter-sample blindness can severely degrade the closed-loop performance, especially in case of fast perturbations. This paper addresses these challenges for a discrete-time implementation of the super-twisting sliding mode controller. Building upon recent results on barrier-function-modulated super-twisting algorithms, a nested architecture employing multiple barriers is discretized using an eigenvalue-based exact matching approach. The resulting discrete-time controller preserves the adaptive and robustness properties established in continuous time, while ensuring consistent stability behavior at the sampling level. The proposed framework is validated through numerical simulations. The results highlight the effectiveness of multi-layer barrier adaptation for discrete-time sliding mode control applications.

2604.25473 2026-04-29 eess.SY cs.SY

Complex-Vector Power and Cross-Phase Unbalance in Three-Phase Systems

Juan Carlos Bravo-Rodríguez, Juan Carlos del-Pino-López, Francisco Casado-Machado

Comments 8 pages, 1 figure, submitted to IEEE Trans. on Power Delivery

详情
英文摘要

Unbalanced three-phase systems still lack a compact phasor-domain representation of power that makes phase asymmetry explicit while remaining consistent with established apparent-power definitions. This paper addresses that point through a complex-vector power formulation for sinusoidal steady-state operation. The proposed representation supplements the classical dot-product expression of complex power with the cross product of voltage and current phasors, thereby retaining the usual active and reactive terms while making explicit a cross-phase unbalance vector that captures antisymmetric interphase relations. In this way, apparent power is separated into intraphase and cross-phase contributions, and its norm is preserved under the power-invariant Fortescue transformation. The formulation is extended to three-phase four-wire systems by introducing equivalent coordinates that preserve the effective apparent-power norm for the chosen voltage reference. Only standard complex numbers and matrices are required. Numerical examples show operating conditions in which a non-negligible part of the apparent-power structure is associated with cross-phase unbalance and cannot be inferred from active and reactive power alone. The proposed formulation thus provides a compact phasor-based descriptor of unbalance that complements established apparent-power theories by making explicit a component that is not accessible from scalar apparent-power representations.

2604.25468 2026-04-29 eess.SY cs.SY math.OC math.ST stat.TH

Distributed adaptive estimation for stochastic large regression models

Die Gan, Siyu Xie, Zhixin Liu, Xuebo Zhang

Comments 13 pages, submitted to IEEE TAC

详情
英文摘要

This paper studies the distributed adaptiveestimation problems for stochastic large regression modelswith an infinite number of parameters. By constructing a re-cursive local cost function, we propose a novel distributedrecursive least squares algorithm to estimate the unknownsystem parameters, where the growth rate of regressors'dimension is characterized by a non-decreasing positivefunction. The almost sure convergence of the proposedalgorithm is established under a cooperative excitationcondition, which incorporates the temporal information andthe spatial information to reflect the cooperative effectamong multiple agents. Moreover, we analyze the predic-tion error by establishing the asymptotic upper boundof the accumulated regret without any excitation condi-tions. The main difficulty of theoretical analysis lies in howto analyze properties of the product of non-independentand non-stationary random matrices, whose dimensionschange over time simultaneously. Some techniques, suchas stochastic Lyapunov function, double-array martingaletheory and algebraic graph theory, are employed to dealwith the above issue. Our theoretical results are derivedwithout imposing independence or stationarity assump-tions on the regression vectors, thereby not excluding thecorrelated feedback signals.

2604.25453 2026-04-29 eess.SP

Polarization-diverse Detection at Microwave Frequencies Using A Passive Metasurface Aperture

Md. Abrar A Mushfik, Mohammad Ali Kaisar, Mohiminul Islam Bhuiyan Sahed, Idban Alamzadeh

Comments 9 pages, Journal (TAP)

详情
英文摘要

Metasurfaces' ability to control electromagnetic wave propagation has led to a rapid paradigm shift in wireless operation. These metasurfaces are often called reconfigurable intelligent surfaces (RISs) due to active tuning elements distributed across the meta-atoms comprising the metasurface array. However, each of these dynamic meta-atoms requires additional DC power lines and biasing circuitry for active tuning. Additionally, achieving polarization diverse operations using compact metasurface configurations is challenging due to the complexity involved in polarization detection. To address these limitations, we propose a passive metasurface array architecture that is both polarization sensitive and capable of altering radiation patterns with frequency diversity. In particular, we designed a polarization-sensitive meta-atom model with added randomness in the scattering behavior and extended it to a polarization-diverse-frequency-selective array. By capturing the electric fields scattered off from the metasurface, we can numerically acquire the polarization information of the incoming signal. The proposed polarization-diverse array can simplify the polarization measurement techniques and may find its application in polarization sensitive sensing and imaging operations.

2604.25441 2026-04-29 cs.SD cs.CL eess.AS

Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost

Venkata Pushpak Teja Menta

Comments 9 pages, 6 figures, 6 tables. Companion paper to PSP benchmark. Code: https://github.com/praxelhq/praxy ; Model: https://huggingface.co/Praxel/praxy-voice-r6 ; Demo: https://huggingface.co/spaces/Praxel/praxy-voice-demo

详情
英文摘要

Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured phonological dimensions, and the most widely adopted multilingual base (Chatterbox, 23 languages) does not even tokenise Telugu or Tamil. We ask: what is the minimum intervention that brings such a non-Indic-native base to commercial-class output on Telugu, Tamil, and Hindi, without training a new acoustic decoder and without any commercial TTS training data? We combine three pieces: (1) BUPS, a Brahmic Unified Phoneme Space that deterministically romanises seven Indic scripts to ISO-15919 so Chatterbox's Latin tokeniser can process them; (2) a LoRA adapter on only the text-token predictor (Chatterbox's t3), trained on ~1,220h of licensed Indic audio with a Hindi-proxy language_id; (3) a voice-prompt recovery recipe -- an 8-11s same-language reference clip plus three sampling overrides (exaggeration 0.7, temperature 0.6, min_p 0.1; "Config B") -- that recovers commercial-class acoustic output with no acoustic-decoder training. On Hindi, the LoRA regresses accuracy and we instead use vanilla Chatterbox + Config B, giving a two-branch deployment. Evaluated on 10-utterance pilot sets with the companion PSP benchmark, Praxy Voice matches or slightly leads commercial baselines: 26.7% retroflex collapse on Telugu (vs Sarvam Bulbul 33.3%), 71% Tamil-zha collapse (vs commercial trio's 86%), 0.025 LLM-WER on Hindi (tied with Cartesia Sonic-3). For intra-sentential code-mix we add a third branch (IndicF5 + native-script transliteration) that drops code-mix LLM-WER from 0.80-0.85 to 0.14-0.27 across Hi/Te/Ta. We release R6 LoRA weights (Apache-2.0), inference code and router (MIT), and a Gradio demo.

2604.25430 2026-04-29 eess.SY cs.SY eess.SP

A Miniaturized Broadband 1-Bit Coding Reconfigurable Intelligent Surface for NLOS UE Localization and Uplink Communication

Khagendra Joshi, Deepak Kumar Sahoo, Kamalesh Kumar K, Debidas Kundu, Vivek A. Bohara, Amalendu Patnaik

详情
英文摘要

In this paper, a broadband 1-bit coding metasurface-based reconfigurable intelligent surface (RIS) is presented. The unit cell of the metasurface consists of a wide dipole modified with interdigital capacitors and loaded with an SMP 1340-040LF PIN diode. The proposed element offers cell miniaturization and a stable angular response. A phase difference of 180$\degree \pm$ 30$\degree$ is achieved for a frequency range of 4.85-6.05 GHz between the ON and OFF states for the normal incidence of the TE polarized wave, whereas it provides a fairly stable response with reflection loss of less than 3 dB and phase difference of 180$\degree$ $\pm$ 50$\degree$ for oblique incidence up to 45$\degree$. The RF is isolated from the DC on the bias lines using properly designed butterfly-shaped radial stubs. Using this unit cell, a prototype with an array of 16 $\times$ 10 elements is constructed. A low-cost microcontroller-based control circuit is designed, which can be plugged-in for biasing the PIN diodes of such array. The theoretically calculated and full-wave simulated radiation patterns of the array are validated using experiments inside anechoic chamber. Furthermore, the capability of the RIS for non-line of sight (NLOS) user equipment (UE) localization and robust uplink communication is demonstrated using LTE communication framework. This shows great potential of our RIS for applications, such as in unmanned aerial vehicle (UAV) localization and its uplink communication at NLOS or extended range.