arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 110
2604.08521 2026-04-10 math.OC cs.SY eess.SY

Discounted MPC and infinite-horizon optimal control under plant-model mismatch: Stability and suboptimality

Robert H. Moldenhauer, Karl Worthmann, Romain Postoyan, Dragan Nešić, Mathieu Granzotto

Comments Submitted to 65th IEEE Conference on Decision and Control as a regular paper

详情
英文摘要

We study closed-loop stability and suboptimality for MPC and infinite-horizon optimal control solved using a surrogate model that differs from the real plant. We employ a unified framework based on quadratic costs to analyze both finite- and infinite-horizon problems, encompassing discounted and undiscounted scenarios alike. Plant-model mismatch bounds proportional to states and controls are assumed, under which the origin remains an equilibrium. Under continuity of the model and cost-controllability, exponential stability of the closed loop can be guaranteed. Furthermore, we give a suboptimality bound for the closed-loop cost recovering the optimal cost of the surrogate. The results reveal a tradeoff between horizon length, discounting and plant-model mismatch. The robustness guarantees are uniform over the horizon length, meaning that larger horizons do not require successively smaller plant-model mismatch.

2604.08495 2026-04-10 math.OC cs.MA cs.RO cs.SY eess.SY

Density-Driven Optimal Control: Convergence Guarantees for Stochastic LTI Multi-Agent Systems

Kooktae Lee

详情
英文摘要

This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic Density-Driven Optimal Control (D$^2$OC). This is a rigorous Lagrangian framework that bridges the gap between individual agent dynamics and collective distribution matching. By formulating a stochastic MPC-like problem that minimizes the Wasserstein distance as a running cost, our approach ensures that the time-averaged empirical distribution converges to a non-parametric target density under stochastic LTI dynamics. A key contribution is the formal convergence guarantee established via reachability analysis, providing a bounded tracking error even in the presence of process and measurement noise. Numerical results verify that Stochastic D$^2$OC achieves robust, decentralized coverage while outperforming previous heuristic methods in optimality and consistency.

2604.08450 2026-04-10 cs.SD eess.AS

DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection

Yassine El Kheir, Arnab Das, Yixuan Xiao, Xin Wang, Feidi Kallel, Enes Erdem Erdogan, Ngoc Thang Vu, Tim Polzehl, Sebastian Moeller

Comments Deepfense Toolkit

详情
英文摘要

Speech deepfake detection is a well-established research field with different models, datasets, and training strategies. However, the lack of standardized implementations and evaluation protocols limits reproducibility, benchmarking, and comparison across studies. In this work, we present DeepFense, a comprehensive, open-source PyTorch toolkit integrating the latest architectures, loss functions, and augmentation pipelines, alongside over 100 recipes. Using DeepFense, we conducted a large-scale evaluation of more than 400 models. Our findings reveal that while carefully curated training data improves cross-domain generalization, the choice of pre-trained front-end feature extractor dominates overall performance variance. Crucially, we show severe biases in high-performing models regarding audio quality, speaker gender, and language. DeepFense is expected to facilitate real-world deployment with the necessary tools to address equitable training data selection and front-end fine-tuning.

2604.08415 2026-04-10 eess.AS

Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation

Matthew Maciejewski, Samuele Cornell

Comments Submitted to Interspeech 2026

详情
英文摘要

Noisy speech separation systems are typically trained on fully-synthetic mixtures, limiting generalization to real-world scenarios. Though training on mixtures of in-domain (thus often noisy) speech is possible, we show that this leads to undesirable optima where mixture noise is retained in the estimates, due to the inseparability of the background noises and the loss function's symmetry. To address this, we propose ring mixing, a batch strategy of using each source in two mixtures, alongside a new Signal-to-Consistency-Error Ratio (SCER) auxiliary loss penalizing inconsistent estimates of the same source from different mixtures, breaking symmetry and incentivizing denoising. On a WHAM!-based benchmark, our method can reduce residual noise by upwards of half, effectively learning to denoise from only noisy recordings. This opens the door to training more generalizable systems using in-the-wild data, which we demonstrate via systems trained using naturally-noisy speech from VoxCeleb.

2604.08412 2026-04-10 cs.SD cs.AI eess.AS

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI

David Joohun Kim, Daniyal Anjum, Bonny Banerjee, Omar Abbasi

详情
英文摘要

We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker environments with temporally ambiguous utterances, this task is more effectively modelled as a sequential routing problem over interaction history than as an utterance-local classification task. We formalize this as Sequential Device-Addressed Routing (SDAR) and present the Selective Attention System (SAS), an on-device implementation that instantiates this formulation. On a held-out 60-hour multi-speaker English test set, the primary audio-only configuration achieves F1=0.86 (precision=0.89, recall=0.83); with an optional camera, audio+video fusion raises F1 to 0.95 (precision=0.97, recall=0.93). Removing causal interaction history (Stage~3) reduced F1 from 0.95 to 0.57+/-0.03 in the audio+video configuration under our evaluation protocol. Among the tested components, this was the largest observed ablation effect, indicating that short-horizon interaction history carries substantial decision-relevant information in the evaluated setting. SAS runs fully on-device on ARM Cortex-A class hardware (<150 ms latency, <20 MB footprint). All results are from internal evaluation on a proprietary dataset evaluated primarily in English; a 5-hour evaluation subset may be shared for independent verification (Section 8.8).

2604.08403 2026-04-10 eess.SY cs.SY

Data-Driven Power Flow for Radial Distribution Networks with Sparse Real-Time Data

Oleksii Molodchyk, Omid Mokhtari, Samuel Chevalier, Mads R. Almassalkhi, Timm Faulwasser

Comments 8 pages, 5 figures

详情
英文摘要

Real-time control of distribution networks requires accurate information about the system state. In practice, however, such information is difficult to obtain because real-time measurements are available only at a limited number of locations. This paper proposes a novel data-driven power flow (DDPF) framework for balanced radial distribution networks. The proposed algorithm combines the behavioral approach with the DistFlow model and leverages offline historical data to solve power flow problems using only a limited set of real-time measurements. To design DDPF under sparse measurement conditions, we develop a sensor placement problem based on optimal network reductions. This allows us to determine sensor locations subject to a predefined sensor budget and to explicitly account for the radial nature of distribution networks. Unlike approaches that rely on full observability, the proposed framework is designed for practical distribution grids with sparse measurement availability. This enables data-driven power flow for real-time operation while reducing the number of required sensors. On several test cases, the proposed DDPF algorithm could demonstrate accurate voltage magnitude predictions, with a maximum error less than 0.001 p.u., with as little as 25% of total locations equipped with sensors.

2604.08384 2026-04-10 eess.AS cs.AI

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs

Jing Peng, Chenghao Wang, Yi Yang, Lirong Qian, Junjie Li, Yu Xi, Shuai Wang, Kai Yu

详情
英文摘要

Speech LLM post-training increasingly relies on efficient cross-modal alignment and robust low-resource adaptation, yet collecting large-scale audio-text pairs remains costly. Text-only alignment methods such as TASU reduce this burden by simulating CTC posteriors from transcripts, but they provide limited control over uncertainty and error rate, making curriculum design largely heuristic. We propose \textbf{TASU2}, a controllable CTC simulation framework that simulates CTC posterior distributions under a specified WER range, producing text-derived supervision that better matches the acoustic decoding interface. This enables principled post-training curricula that smoothly vary supervision difficulty without TTS. Across multiple source-to-target adaptation settings, TASU2 improves in-domain and out-of-domain recognition over TASU, and consistently outperforms strong baselines including text-only fine-tuning and TTS-based augmentation, while mitigating source-domain performance degradation.

2604.08359 2026-04-10 eess.AS

Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework

Hsiang-Cheng Yang, You-Jin Li, Rong Chao, Yu Tsao, Borching Su, Shao-Yi Chien

Comments Accepted to IEEE ICASSP 2026

详情
英文摘要

This paper presents a Gaze-Guided Audio-Visual Speech Enhancement (GG-AVSE) framework to address the cocktail party problem. A major challenge in conventional AVSE is identifying the listener's intended speaker in multi-talker environments. GG-AVSE addresses this issue by exploiting gaze direction as a supervisory cue for target-speaker selection. Specifically, we propose the GG-VM module, which combines gaze signals with a YOLO5Face detector to extract the target speaker's facial features and integrates them with the pretrained AVSEMamba model through two strategies: zero-shot merging and partial visual fine-tuning. For evaluation, we introduce the AVSEC2-Gaze dataset. Experimental results show that GG-AVSE achieves substantial performance gains over gaze-free baselines: a 10.08% improvement in PESQ (2.370 to 2.609), a 5.18% improvement in STOI (0.8802 to 0.9258), and a 23.69% improvement in SI-SDR (9.16 to 11.33). These results confirm that gaze provides an effective cue for resolving target-speaker ambiguity and highlight the scalability of GG-AVSE for real-world applications.

2604.08330 2026-04-10 eess.SP cs.IT math.IT

Group-invariant moments under tomographic projections

Amnon Balanov, Tamir Bendory, Dan Edidin

详情
英文摘要

Let $f:\mathbb{R}^n\to\mathbb{R}$ be an unknown object, and suppose the observations are tomographic projections of randomly rotated copies of $f$ of the form $Y = P(R\cdot f)$, where $R$ is Haar-uniform in $\mathrm{SO}(n)$ and $P$ is the projection onto an $m$-dimensional subspace, so that $Y:\mathbb{R}^m\to\mathbb{R}$. We prove that, whenever $d\le m$, the $d$-th order moment of the projected data determines the full $d$-th order Haar-orbit moment of $f$, independently of the ambient dimension $n$. We further provide an explicit algorithmic procedure for recovering the latter from the former. As a consequence, any identifiability result for the unprojected model based on $d$-th order group-invariant moment extends directly to the tomographic setting at the same moment order. In particular, for $n=3$, $m=2$, and $d=2$, our result recovers a classical result in the cryo-EM literature: the covariance of the 2D projection images determines the second order rotationally invariant moment of the underlying 3D object.

2604.08329 2026-04-10 eess.IV cs.MM

DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning

Eren Çetin, Lucas Relic, Yuanyi Xue, Markus Gross, Christopher Schroers, Roberto Azevedo

详情
英文摘要

We present a perceptually-driven video compression framework integrating implicit neural representations (INRs) and pre-trained video diffusion models to address the extremely low bitrate regime (<0.05 bpp). Our approach exploits the complementary strengths of INRs, which provide a compact video representation, and diffusion models, which offer rich generative priors learned from large-scale datasets. The INR-based conditioning replaces traditional intra-coded keyframes with bit-efficient neural representations trained to estimate latent features and guide the diffusion process. Our joint optimization of INR weights and parameter-efficient adapters for diffusion models allows the model to learn reliable conditioning signals while encoding video-specific information with minimal parameter overhead. Our experiments on UVG, MCL-JCV, and JVET Class-B benchmarks demonstrate substantial improvements in perceptual metrics (LPIPS, DISTS, and FID) at extremely low bitrates, including improvements on BD-LPIPS up to 0.214 and BD-FID up to 91.14 relative to HEVC, while also outperforming VVC and previous strong state-of-the-art neural and INR-only video codecs. Moreover, our analysis shows that INR-conditioned diffusion-based video compression first composes the scene layout and object identities before refining textural accuracy, exposing the semantic-to-visual hierarchy that enables perceptually faithful compression at extremely low bitrates.

2604.08328 2026-04-10 eess.SY cs.SY

Data-Driven Moving Horizon Estimators for Linear Systems with Sample Complexity Analysis

Peihu Duan, Jiabao He, Yuezu Lv, Guanghui Wen

详情
英文摘要

This paper investigates the state estimation problem for linear systems subject to Gaussian noise, where the model parameters are unknown. By formulating and solving an optimization problem that incorporates both offline and online system data, a novel data-driven moving horizon estimator (DDMHE) is designed. We prove that the expected 2-norm of the estimation error of the proposed DDMHE is ultimately bounded. Further, we establish an explicit relationship between the system noise covariances and the estimation error of the proposed DDMHE. Moreover, through a sample complexity analysis, we show how the length of the offline data affects the estimation error of the proposed DDMHE. We also quantify the performance gap between the proposed DDMHE using noisy data and the traditional moving horizon estimator with known system matrices. Finally, the theoretical results are validated through numerical simulations.

2604.08327 2026-04-10 math.OC cs.SY eess.SY

Finite-time Reachability for Constrained, Partially Uncontrolled Nonlinear Systems

Ram Padmanabhan, Melkior Ornik

Comments 7 pages, 4 figures

详情
英文摘要

This paper presents a technique to drive the state of a constrained nonlinear system to a specified target state in finite time, when the system suffers a partial loss in control authority. Our technique builds on a recent method to control constrained nonlinear systems by building a simple, linear driftless approximation at the initial state. We construct a partition of the finite time horizon into successively smaller intervals, and design controlled inputs based on the approximate dynamics in each partition. Under conditions that bound the length of the time horizon, we prove that these inputs result in bounded error from the target state in the original nonlinear system. As successive partitions of the time horizon become shorter, the error reduces to zero despite the effect of uncontrolled inputs. A simulation example on the model of a fighter jet demonstrates that the designed sequence of controlled inputs achieves the target state despite the system suffering a loss of control authority over one of its inputs.

2604.08309 2026-04-10 eess.SY cs.SY

Bayesian Inference for Estimating Generation Costs in Electricity Markets

Matthias Pirlet, Adrien Bolland, Alexandre Huynen, Quentin Louveaux, Gilles Louppe, Damien Ernst

详情
英文摘要

Estimating generation costs from observed electricity market data is essential for market simulation, strategic bidding, and system planning. To that end, we model the relationship between generation costs and production schedules with a latent variable model. Estimating generation costs from observed schedules is then formulated as Bayesian inference. A prior distribution encodes an initial belief on parameters, and the inference consists of updating the belief with the posterior distribution given observations. We use balanced neural posterior estimation (BNPE) to learn this posterior. Validation on the IEEE RTS-96 test system shows that marginal costs are recovered with narrow credible intervals, while start-up costs remain largely unidentifiable from schedules alone. The method is benchmarked against an inverse-optimization algorithm that exhibits larger parameter errors without uncertainty quantification.

2604.08306 2026-04-10 eess.SP

Temporal Graph Neural Network for ISAC Target Detection and Tracking

Saiedeh Maboud Sanaie, Marcus Grossmann, Markus Landmann, Thomas Dallmann

Comments 4 pages, 6 figures

详情
英文摘要

Integrated sensing and communication (ISAC) is a key enabler of 6G, supporting environment-aware services. A fundamental sensing task in this setting is reliable multi-target detection and tracking. This paper proposes a temporal graph neural network (TGNN)-based tracking method that exploits delay and Doppler information from the wireless channel. The delay-Doppler map is modeled as a sequence of graphs, and tracking is formulated as a temporal node classification problem, enabling joint clustering and data association of dynamic targets. Using ray-tracing-based channel outputs as ground truth, the method is evaluated across multiple scenes with varying target positions, velocities, and trajectories and is compared with a Kalman filter baseline. Results demonstrate reduced normalized mean squared error (NMSE) in delay and Doppler, leading to more accurate multi-target tracking.

2604.08305 2026-04-10 eess.IV cs.AI cs.CV cs.ET cs.LG q-bio.QM

HistDiT: A Structure-Aware Latent Conditional Diffusion Model for High-Fidelity Virtual Staining in Histopathology

Aasim Bin Saleem, Amr Ahmed, Ardhendu Behera, Hafeezullah Amin, Iman Yi Liao, Mahmoud Khattab, Pan Jia Wern, Haslina Makmur

Comments Accepted to ICPR 2026

详情
英文摘要

Immunohistochemistry (IHC) is essential for assessing specific immune biomarkers like Human Epidermal growth-factor Receptor 2 (HER2) in breast cancer. However, the traditional protocols of obtaining IHC stains are resource-intensive, time-consuming, and prone to structural damages. Virtual staining has emerged as a scalable alternative, but it faces significant challenges in preserving fine-grained cellular structures while accurately translating biochemical expressions. Current state-of-the-art methods still rely on Generative Adversarial Networks (GANs) or standard convolutional U-Net diffusion models that often struggle with "structure and staining trade-offs". The generated samples are either structurally relevant but blurry, or texturally realistic but have artifacts that compromise their diagnostic use. In this paper, we introduce HistDiT, a novel latent conditional Diffusion Transformer (DiT) architecture that establishes a new benchmark for visual fidelity in virtual histological staining. The novelty introduced in this work is, a) the Dual-Stream Conditioning strategy that explicitly maintains a balance between spatial constraints via VAE-encoded latents and semantic phenotype guidance via UNI embeddings; b) the multi-objective loss function that contributes to sharper images with clear morphological structure; and c) the use of the Structural Correlation Metric (SCM) to focus on the core morphological structure for precise assessment of sample quality. Consequently, our model outperforms existing baselines, as demonstrated through rigorous quantitative and qualitative evaluations.

2604.08272 2026-04-10 cs.CV eess.IV

Preventing Overfitting in Deep Image Prior for Hyperspectral Image Denoising

Panagiotis Gkotsis, Athanasios A. Rontogiannis

Comments 7 pages, 5 figures

详情
英文摘要

Deep image prior (DIP) is an unsupervised deep learning framework that has been successfully applied to a variety of inverse imaging problems. However, DIP-based methods are inherently prone to overfitting, which leads to performance degradation and necessitates early stopping. In this paper, we propose a method to mitigate overfitting in DIP-based hyperspectral image (HSI) denoising by jointly combining robust data fidelity and explicit sensitivity regularization. The proposed approach employs a Smooth $\ell_1$ data term together with a divergence-based regularization and input optimization during training. Experimental results on real HSIs corrupted by Gaussian, sparse, and stripe noise demonstrate that the proposed method effectively prevents overfitting and achieves superior denoising performance compared to state-of-the-art DIP-based HSI denoising methods.

2604.08270 2026-04-10 eess.SY cs.SY math.OC

Bandwidth reduction methods for packetized MPC over lossy networks

Alberto Mingoia, Matthias Pezzutto, Fernando S Barbosa, David Umsonst

Comments Accepted at the European Control Conference 2026; 8 pages; 5 figures

详情
英文摘要

We study the design of an offloaded model predictive control (MPC) operating over a lossy communication channel. We introduce a controller design that utilizes two complementary bandwidth-reduction methods. The first method is a multi-horizon MPC formulation that decreases the number of optimization variables, and therefore the size of transmitted input trajectories. The second method is a communication-rate reduction mechanism that lowers the frequency of packet transmissions. We derive theoretical guarantees on recursive feasibility and constraint satisfaction under minimal assumptions on packet loss, and we establish reference-tracking performance for the rate-reduction strategy. The proposed methods are validated using a hardware-in-the-loop setup with a real 5G network, demonstrating simultaneous improvements in bandwidth efficiency and computational load.

2604.08244 2026-04-10 cs.NI cs.SY eess.SY

FORSLICE: An Automated Formal Framework for Efficient PRB-Allocation towards Slicing Multiple Network Services

Debarpita Banerjee, Sumana Ghosh, Snigdha Das, Shilpa Budhkar, Rana Pratap Sircar

详情
英文摘要

Network slicing is a modern 5G technology that provides efficient network experience for diverse use cases. It is a technique for partitioning a single physical network infrastructure into multiple virtual networks, called slices, each equipped for specific services and requirements. In this work, we particularly deal with radio access network (RAN) slicing and resource allocation to RAN slices. In 5G, physical resource blocks (PRBs) being the fundamental units of radio resources, our main focus is to allocate PRBs to the slices efficiently. While addressing a spectrum of needs for multiple services or the same services with multi-priorities, we need to ensure two vital system properties: i) fairness to every service type (i.e., providing the required resources and a desired range of throughput) even after prioritizing a particular service type, and ii) PRB-optimality or minimizing the unused PRBs in slices. These serve as the core performance evaluation metrics for PRB-allocation in our work. We adopt the 3-layered hierarchical PRB-partitioning technique for allocating PRBs to network slices. The case-specific, AI-based solution of the state-of-the-art method lacks sufficient correctness to ensure consistent system performance. To achieve guaranteed correctness and completeness, we leverage formal methods and propose the first approach for a fair and optimal PRB distribution to RAN slices. We formally model the PRB-allocation problem as a 3-layered framework, FORSLICE, specifically by employing satisfiability modulo theories. Next, we apply formal verification to ensure that the desired system properties: fairness and PRB-optimality, are satisfied by the model. The proposed method offers an efficient, versatile and automated approach compatible with all 3-layered hierarchical network structure configurations, yielding significant system property improvements compared to the baseline.

2604.08240 2026-04-10 eess.SY cs.SY

From Cut-In to Rated: Multi-Region Floating Offshore Wind Farm Control for Secondary Frequency Regulation

Stephen Ampleman, Dennice F. Gayme

详情
英文摘要

This paper describes a multi-region control framework for floating offshore wind farms. Specifically, we propose a novel generator torque controller that regulates rotor speed in Region 2, corresponding to wind speeds between the cut-in and rated values. In Region 3 (wind speeds at or above rated but below cut-out speed) we employ a PI-LQR for collective blade pitch. Control blending across the transitional wind speeds (Region 2.5) employs a sigmoid weighting function applied to the control variables. Two modeling paradigms are proposed for farm-level power tracking with rotor speed regularization: a nonlinear model predictive controller (NL-MPC) with a dynamic wake model, and a reduced order model predictive controller based on linear parameter varying turbine models with a time delay representation of wake advection (LPVTD-MPC). These approaches are evaluated over three wind inlet conditions using the PJM ancillary service certification criteria for participation in a secondary frequency regulation market. Results show that both approaches achieve scores of at least 89.9\% for the three different testing scenarios, which are well above the qualification threshold of 75\%. However, the LPVTD-MPC approach solves the problem in under half the time versus NL-MPC but with slightly larger fluctuations in farm-level power output, highlighting the trade-off between performance and computational tractability. The control framework is among the first to address multi-region wind turbine dynamics together with market driven power tracking objectives for floating offshore wind farms. Such multi-region control becomes increasingly necessary in the floating turbine setting where large (region spanning) wind speed variations are common due to wave induced platform pitching.

2604.08226 2026-04-10 cs.AI cs.HC cs.SY eess.SY

Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework

Seyed Amir Ahmad Safavi-Naini, Elahe Meftah, Josh Mohess, Pooya Mohammadi Kazaj, Georgios Siontis, Zahra Atf, Peter R. Lewis, Mauricio Reyes, Girish Nadkarni, Roland Wiest, Stephan Windecker, Christoph Grani, Ali Soroush, Isaac Shiri

Comments Code, data (Clinical AI Skill-Mix dimension specifications), and an exploratory dashboard are available at https://github.com/Sdamirsa/Clinical-World-Model

详情
英文摘要

The competency of any intelligent agent is bounded by its formal account of the world in which it operates. Clinical AI lacks such an account. Existing frameworks address evaluation, regulation, or system design in isolation, without a shared model of the clinical world to connect them. We introduce the Clinical World Model, a framework that formalizes care as a tripartite interaction among Patient, Provider, and Ecosystem. To formalize how any agent, whether human or artificial, transforms information into clinical action, we develop parallel decision-making architectures for providers, patients, and AI agents, grounded in validated principles of clinical cognition. The Clinical AI Skill-Mix operationalizes competency through eight dimensions. Five define the clinical competency space (condition, phase, care setting, provider role, and task) and three specify how AI engages human reasoning (assigned authority, agent facing, and anchoring layer). The combinatorial product of these dimensions yields a space of billions of distinct competency coordinates. A central structural implication is that validation within one coordinate provides minimal evidence for performance in another, rendering the competency space irreducible. The framework supplies a common grammar through which clinical AI can be specified, evaluated, and bounded across stakeholders. By making this structure explicit, the Clinical World Model reframes the field's central question from whether AI works to in which competency coordinates reliability has been demonstrated, and for whom.

2604.06810 2026-04-10 eess.AS

EvoTSE: Evolving Enrollment for Target Speaker Extraction

Zikai Liu, Ziqian Wang, Xingchen Li, Yike Zhu, Shuai Wang, Longshuai Xiao, Lei Xie

详情
英文摘要

Target Speaker Extraction (TSE) aims to isolate a specific speaker's voice from a mixture, guided by a pre-recorded enrollment. While TSE bypasses the global permutation ambiguity of blind source separation, it remains vulnerable to speaker confusion, where models mistakenly extract the interfering speaker. Furthermore, conventional TSE relies on static inference pipeline, where performance is limited by the quality of the fixed enrollment. To overcome these limitations, we propose EvoTSE, an evolving TSE framework in which the enrollment is continuously updated through reliability-filtered retrieval over high-confidence historical estimates. This mechanism reduces speaker confusion and relaxes the quality requirements for pre-recorded enrollment without relying on additional annotated data. Experiments across multiple benchmarks demonstrate that EvoTSE achieves consistent improvements, especially when evaluated on out-of-domain (OOD) scenarios. Our code and checkpoints are available.

2604.05140 2026-04-10 math.OC cs.SY eess.SY

Constraint-Induced Redistribution of Social Influence in Nonlinear Opinion Dynamics

Vishnudatta Thota, Anastasia Bizyaeva

Comments 7 pages, 4 figures, Submitted to IEEE Conference on Decision and Control (CDC) 2026

详情
英文摘要

We study how intrinsic hard constraints on the decision dynamics of social agents shape collective decisions on multiple alternatives in a heterogeneous group. Such constraints may arise due to structural and behavioral limitations, such as adherence to belief systems in social networks or hardware limitations in autonomous networks. In this work, agent constraints are encoded as projections in a multi-alternative nonlinear opinion dynamics framework. We prove that projections induce an invariant subspace on which the constraints are always satisfied and study the dynamics of networked opinions on this subspace. We then show that heterogeneous pairwise alignments between individuals' constraint vectors generate an effective weighted social graph on the invariant subspace, even when agents exchange opinions over an unweighted communication graph in practice. With analysis and simulation studies, we illustrate how the effective constraint-induced weighted graph reshapes the centrality of agents in the decision process and the group's sensitivity to distributed inputs.

2603.24589 2026-04-10 eess.AS cs.SD

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Chunbo Hao, Junjie Zheng, Guobin Ma, Yuepeng Jiang, Huakang Chen, Wenjie Tian, Gongyu Chen, Zihao Chen, Lei Xie

详情
英文摘要

Regenerating singing voices with altered lyrics while preserving melody consistency remains challenging, as existing methods either offer limited controllability or require laborious manual alignment. We propose YingMusic-Singer-Plus, a fully diffusion-based model enabling melody-controllable singing voice synthesis with flexible lyric manipulation. The model takes three inputs: an optional timbre reference, a melody-providing singing clip, and modified lyrics, without manual alignment. Trained with curriculum learning and Group Relative Policy Optimization, YingMusic-Singer-Plus achieves stronger melody preservation and lyric adherence than Vevo2, the most comparable baseline supporting melody control without manual alignment. We also introduce LyricEditBench, the first benchmark for melody-preserving lyric modification evaluation. The code, weights, benchmark, and demos are publicly available at https://github.com/ASLP-lab/YingMusic-Singer-Plus.

2603.01482 2026-04-10 eess.AS cs.AI cs.LG eess.SP

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Hashim Ali, Nithin Sai Adupa, Surya Subramani, Hafiz Malik

Comments Accepted at ICASSP

详情
英文摘要

Self-supervised learning (SSL) has transformed speech processing, with benchmarks such as SUPERB establishing fair comparisons across diverse downstream tasks. Despite it's security-critical importance, Audio deepfake detection has remained outside these efforts. In this work, we introduce Spoof-SUPERB, a benchmark for audio deepfake detection that systematically evaluates 20 SSL models spanning generative, discriminative, and spectrogram-based architectures. We evaluated these models on multiple in-domain and out-of-domain datasets. Our results reveal that large-scale discriminative models such as XLS-R, UniSpeech-SAT, and WavLM Large consistently outperform other models, benefiting from multilingual pretraining, speaker-aware objectives, and model scale. We further analyze the robustness of these models under acoustic degradations, showing that generative approaches degrade sharply, while discriminative models remain resilient. This benchmark establishes a reproducible baseline and provides practical insights into which SSL representations are most reliable for securing speech systems against audio deepfakes.

2511.09427 2026-04-10 math.OC cs.LG cs.SY eess.SY

Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach

Georgios Pantazis, Nicola Mignoni, Raffaele Carli, Mariagrazia Dotoli, Sergio Grammatico

详情
英文摘要

We study virtual energy storage services based on the aggregation of EV batteries in parking lots under time-varying, uncertain EV departures and state-of-charge limits. We propose a convex data-driven scheduling framework in which a parking lot manager provides storage services to a prosumer community while interacting with a retailer. The framework yields finite-sample, distribution-free guarantees on constraint violations and allows the parking lot manager to explicitly tune the trade-off between economic performance and operational safety. To enhance reliability under imperfect data, we extend the formulation to adversarial perturbations of the training samples and Wasserstein distributional shifts, obtaining robustness certificates against both corrupted data and out-of-distribution uncertainty. Numerical studies confirm the predicted profit-risk trade-off and show consistency between the theoretical certificates and the observed violation levels.

2511.09106 2026-04-10 eess.SY cs.SY

Unifying Sequential Quadratic Programming and Linear-Parameter-Varying Algorithms for Real-Time Model Predictive Control

Kristóf Floch, Amon Lahr, Roland Tóth, Melanie N. Zeilinger

详情
英文摘要

This paper presents a unified framework that connects sequential quadratic programming (SQP) and the iterative linear-parameter-varying model predictive control (LPV-MPC) technique. Using the differential formulation of the LPV-MPC, we demonstrate how SQP and LPV-MPC can be unified through a specific choice of scheduling variable and the 2nd Fundamental Theorem of Calculus (FTC) embedding technique and compare their convergence properties. This enables the unification of the zero-order approach of SQP with the LPV-MPC scheduling technique to enhance the computational efficiency of robust and stochastic MPC problems. To demonstrate our findings, we compare the two schemes in a simulation example. Finally, we present real-time feasibility and performance of the zero-order LPV-MPC approach by applying it to Gaussian process (GP)-based MPC for autonomous racing with real-world experiments.

2511.08420 2026-04-10 eess.SY cs.SY math.OC

Computable Characterisations of Scaled Relative Graphs of Closed Operators

Talitha Nauta, Richard Pates

Comments 12 pages, 5 figures, accepted to the 2026 European Control Conference (ECC)

详情
英文摘要

The Scaled Relative Graph (SRG) is a promising tool for stability and robustness analysis of multi-input multi-output systems. In this paper, we provide tools for exact and computable constructions of the SRG for closed linear operators, based on maximum and minimum gain computations. The results are suitable for bounded and unbounded operators, and we specify how they can be used to draw SRGs for the typical operators that are used to model linear-time-invariant dynamical systems. Furthermore, for the special case of state-space models, we show how the Bounded Real Lemma can be used to construct the SRG.

2511.00941 2026-04-10 eess.SY cs.SY

Traffic-Aware Microgrid Planning for Dynamic Wireless Electric Vehicle Charging Roadways

Dipanjan Ghose, Junjie Qin, S Sivaranjani

Comments This version provides the updated formulation referenced in the withdrawal of the previous one

详情
英文摘要

Dynamic wireless charging (DWC) is an emerging technology that has the potential to reduce charging downtime and on-board battery size, particularly in heavy-duty electric vehicles (EVs). However, its spatiotemporal, dynamic, high-power demands pose challenges for power system operations. Since DWC demand depends on traffic characteristics such as speed, density, and dwell time, effective infrastructure planning must account for the coupling between traffic behavior and EV energy consumption. In this paper, we propose a novel traffic-aware microgrid planning framework for DWC. First, we use the macroscopic cell transmission model to estimate spatio-temporal EV charging demand along DWC corridors and integrate this demand into an AC optimal power flow formulation to design a supporting microgrid. Our framework explicitly links traffic patterns with energy demand and demonstrates that traffic-aware microgrid planning yields significantly lower system costs than worst-case traffic-based approaches. We demonstrate the performance of our model on a segment of I-210W in California under a wide range of traffic conditions.

2510.03484 2026-04-10 math.OC cs.SY eess.SY

Contingency-Aware Nodal Optimal Power Investments with High Temporal Resolution

Thomas Lee, Andy Sun

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

We present CANOPI, a novel algorithmic framework, for solving the Contingency-Aware Nodal Power Investments problem, a large-scale nonlinear optimization problem that jointly optimizes investments in generation, storage, and transmission upgrades, including representations of unit commitment and long-duration storage. The underlying problem is nonlinear due to the impact of transmission upgrades on impedances, and the problem's large scale arises from the confluence of spatial and temporal resolutions. We propose algorithmic approaches to address these computational challenges. We pose a linear approximation of the overall nonlinear model, and develop a fixed-point algorithm to adjust for the nonlinear impedance feedback effect. We solve the large-scale linear expansion model with a specialized level-bundle method leveraging a novel interleaved approach to contingency constraint generation. We introduce a minimal cycle basis algorithm that improves the numerical sparsity of cycle-based DC power flow formulations, accelerating solve times for the operational subproblems. CANOPI is demonstrated on a 1493-bus Western Interconnection test system built from realistic-geography network data, with hourly operations spanning 52 week-long scenarios and a total possible set of 20 billion individual transmission contingency constraints. Numerical results quantify reliability and economic benefits of incorporating transmission contingencies in integrated planning models and highlight the computational advantages of the proposed methods.

2509.14447 2026-04-10 eess.SP

Dual-Timescale Hebbian Accumulators for Online Spiking Neural Network Decoding in Intracortical Brain Machine Interfaces

Sriram V. C. Nallani, Sahil Shah

Comments 9 pages, 7 figures. Replacement reason - corrections in metadata and PDF

详情
英文摘要

Intracortical brain-machine interfaces require decoders that adapt continuously to neural signal instability while operating within strict memory budgets. We introduce a dual-timescale Hebbian accumulator learning rule for spiking neural networks that enables per-timestep online supervised updates with training memory constant in sequence length, avoiding backpropagation through time. The rule combines synapse-specific fast and slow eligibility traces, error-modulated three-factor updates, and integer-friendly RMS homeostasis, operating without adaptive gradient optimizers (Adam, RMSProp) or replay buffers. On two primate intracortical datasets, the method achieves Pearson correlations of $R \geq 0.81$ on MC~Maze and $R \geq 0.63$ on Zenodo~Indy, with 63--86\% measured memory reduction versus BPTT at sequence length $T = 1000$. Closed-loop simulations demonstrate online adaptation to neural disruptions and learning from scratch without offline calibration.