arXivDaily arXiv每日学术速递 周一至周五更新
2601.20845 2026-01-29 cs.LG eess.SP

PatchFormer: A Patch-Based Time Series Foundation Model with Hierarchical Masked Reconstruction and Cross-Domain Transfer Learning for Zero-Shot Multi-Horizon Forecasting

Olaf Yunus Laitinen Imanov, Derya Umut Kulali, Taner Yilmaz

Comments 5 pages; 2 figures; 7 tables

详情
英文摘要

Time series forecasting is a fundamental problem with applications in climate, energy, healthcare, and finance. Many existing approaches require domain-specific feature engineering and substantial labeled data for each task. We introduce PatchFormer, a patch-based time series foundation model that uses hierarchical masked reconstruction for self-supervised pretraining and lightweight adapters for efficient transfer. PatchFormer segments time series into patches and learns multiscale temporal representations with learnable aggregation across temporal scales. Pretraining uses masked patch reconstruction with dynamic masking and objectives that encourage both local accuracy and global consistency, followed by cross-domain knowledge distillation. Experiments on 24 benchmark datasets spanning weather, energy, traffic, finance, and healthcare demonstrate state-of-the-art zero-shot multi-horizon forecasting, reducing mean squared error by 27.3 percent relative to strong baselines while requiring 94 percent less task-specific training data. The model exhibits near log-linear scaling with more pretraining data up to 100 billion points and processes length-512 sequences 3.8x faster than full-sequence transformers.

2601.20817 2026-01-29 eess.SP

Statistical Properties of Target Localization Using Passive Radar Systems

Mats Viberg, Daniele Gerosa, Tomas McKelvey, Thomas Eriksson

详情
英文摘要

Passive Radar Systems have received tremendous attention during the past few decades, due to their low cost and ability to remain covert during operation. Such systems do not transmit any energy themselves, but rely on a so-called Illuminator-of-Opportunity (IO), for example a commercial TV station. A network of Receiving Nodes (RN) receive the direct signal as well as reflections from possible targets. The RNs transmit information to a Central Node (CN), that performs the final target detection, localization and tracking. A large number of methods and algorithms for target detection and localization have been proposed in the literature. In the present contribution, the focus is on the seminal Extended Cancelation Algorithm (ECA), in which each RN estimates target parameters after canceling interference from the direct-path as well as clutter from unwanted stationary objects. This is done by exploiting a separate Reference Channel (RC), which captures the IO signal without interference apart from receiver noise. We derive the statistical properties of the ECA parameter estimates under the assumption of a high Signal-to-Noise Ratio (SNR), and we give a sufficient condition for the SNR in the RC to enable statistically efficient estimates. The theoretical results are corroborated through computer simulations, which show that the theory agrees well with empirical results above a certain SNR threshold. The results can be used to predict the performance of passive radar systems in given scenarios, which is useful for feasibility studies as well as system design.

2508.03461 2026-01-29 eess.IV cs.CV

Evaluating the Predictive Value of Preoperative MRI for Erectile Dysfunction Following Radical Prostatectomy

Gideon N. L. Rouwendaal, Daniël Boeke, Inge L. Cox, Henk G. van der Poel, Margriet C. van Dijk-de Haan, Regina G. H. Beets-Tan, Thierry N. Boellaard, Wilson Silva

Comments 13 pages, 5 figures, 2 tables. Accepted at PRedictive Intelligence in MEdicine workshop @ MICCAI 2025 (PRIME-MICCAI). This is the submitted manuscript with added link to github repo, funding acknowledgements and authors' names and affiliations. No further post submission improvements or corrections were integrated. Final version not published yet

详情
英文摘要

Accurate preoperative prediction of erectile dysfunction (ED) is important for counseling patients undergoing radical prostatectomy. While clinical features are established predictors, the added value of preoperative MRI remains underexplored. We investigate whether MRI provides additional predictive value for ED at 12 months post-surgery, evaluating four modeling strategies: (1) a clinical-only baseline, representing current state-of-the-art; (2) classical models using handcrafted anatomical features derived from MRI; (3) deep learning models trained directly on MRI slices; and (4) multimodal fusion of imaging and clinical inputs. Imaging-based models (maximum AUC 0.569) slightly outperformed handcrafted anatomical approaches (AUC 0.554) but fell short of the clinical baseline (AUC 0.663). Fusion models offered marginal gains (AUC 0.586) but did not exceed clinical-only performance. SHAP analysis confirmed that clinical features contributed most to predictive performance. Saliency maps from the best-performing imaging model suggested a predominant focus on anatomically plausible regions, such as the prostate and neurovascular bundles. While MRI-based models did not improve predictive performance over clinical features, our findings suggest that they try to capture patterns in relevant anatomical structures and may complement clinical predictors in future multimodal approaches.

2506.11136 2026-01-29 cs.CV eess.IV

JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon, Loick Chambon, Louis Serrano, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome

Comments Code available at https://github.com/PaulCouairon/JAFAR

详情
英文摘要

Foundation Vision Encoders have become essential for a wide range of dense vision tasks. However, their low-resolution spatial feature outputs necessitate feature upsampling to produce the high-resolution modalities required for downstream tasks. In this work, we introduce JAFAR, a lightweight and flexible feature upsampler that enhances the spatial resolution of visual features from any Foundation Vision Encoder to an arbitrary target resolution. JAFAR employs an attention-based module designed to promote semantic alignment between high-resolution queries, derived from low-level image features, and semantically enriched low-resolution keys, using Spatial Feature Transform (SFT) modulation. Notably, despite the absence of high-resolution supervision, we demonstrate that learning at low upsampling ratios and resolutions generalizes remarkably well to significantly higher output scales. Extensive experiments show that JAFAR effectively recovers fine-grained spatial details and consistently outperforms existing feature upsampling methods across a diverse set of downstream tasks. Project page at https://jafar-upsampler.github.io

1704.08605 2026-01-29 eess.SY cs.SY

Failsafe Mechanism Design of Multicopters Based on Supervisory Control Theory

Quan Quan, Zhiyao Zhao, Liyong Lin, Peng Wang, Walter Murray Wonham, Kai-Yuan Cai

Comments 33 pages, 15 figures

Journal ref IET Cyber-Systems and Robotics, 2020, 2(1): 31 - 42

详情
英文摘要

In order to handle undesirable failures of a multicopter which occur in either the pre-flight process or the in-flight process, a failsafe mechanism design method based on supervisory control theory is proposed for the semi-autonomous control mode. Failsafe mechanism is a control logic that guides what subsequent actions the multicopter should take, by taking account of real-time information from guidance, attitude control, diagnosis, and other low-level subsystems. In order to design a failsafe mechanism for multicopters, safety issues of multicopters are introduced. Then, user requirements including functional requirements and safety requirements are textually described, where function requirements determine a general multicopter plant, and safety requirements cover the failsafe measures dealing with the presented safety issues. In order to model the user requirements by discrete-event systems, several multicopter modes and events are defined. On this basis, the multicopter plant and control specifications are modeled by automata. Then, a supervisor is synthesized by monolithic supervisory control theory. In addition, we present three examples to demonstrate the potential blocking phenomenon due to inappropriate design of control specifications. Also, we discuss the meaning of correctness and the properties of the obtained supervisor. This makes the failsafe mechanism convincingly correct and effective. Finally, based on the obtained supervisory controller generated by TCT software, an implementation method suitable for multicopters is presented, in which the supervisory controller is transformed into decision-making codes.

1306.3432 2026-01-29 eess.SY cs.SY math.OC

Supporting Lemmas for RISE-based Control Methods

Rushikesh Kamalapurkar, Joel A. Rosenfeld, Justin Klotz, Ryan J. Downey, Warren E. Dixon

详情
英文摘要

A class of continuous controllers termed Robust Integral of the Signum of the Error (RISE) have been published over the last decade as a means to yield asymptotic convergence of the tracking error for classes of nonlinear systems that are subject to exogenous disturbances and/or modeling uncertainties. The development of this class of controllers relies on a property related to the integral of the signum of an error signal. A proof for this property is not available in previous literature. The stability of some RISE controllers is analyzed using differential inclusions. Such results rely on the hypothesis that a set of points is Lebesgue negligible. This paper states and proves two lemmas related to the properties.

2601.20780 2026-01-29 eess.SP cs.SY eess.SY

Multi-Mode Pinching Antenna Systems Enabled Multi-User Communications

Xiaoxia Xu, Xidong Mu, Yuanwei Liu, Arumugam Nallanathan

Comments 13 pages, 10 figures. Submitted to IEEE

详情
英文摘要

This paper proposes a novel multi-mode pinching-antenna systems (PASS) framework. Multiple data streams can be transmitted within a single waveguide through multiple guided modes, thus facilitating efficient multi-user communications through the mode-domain multiplexing. A physic model is derived, which reveals the mode-selective power radiation feature of pinching antennas (PAs). A two-mode PASS enabled two-user downlink communication system is investigated. Considering the mode selectivity of PA power radiation, a practical PA grouping scheme is proposed, where each PA group matches with one specific guided mode and mainly radiates its signal sequentially. Depending on whether the guided mode leaks power to unmatched PAs or not, the proposed PA grouping scheme operates in either the non-leakage or weak-leakage regime. Based on this, the baseband beamforming and PA locations are jointly optimized for sum rate maximization, subject to each user's minimum rate requirement. 1) A simple two-PA case in non-leakage regime is first considered. To solve the formulated problem, a channel orthogonality based solution is proposed. The channel orthogonality is ensured by large-scale and wavelength-scale equality constraints on PA locations. Thus, the optimal beamforming reduces to maximum-ratio transmission (MRT). Moreover, the optimal PA locations are obtained via a Newton-based one-dimension search algorithm that enforces two-scale PA-location constraints by Newton's method. 2) A general multi-PA case in both non-leakage and weak-leakage regimes is further considered. A low-complexity particle-swarm optimization with zero-forcing beamforming (PSO-ZF) algorithm is developed, thus effectively tackling the high-oscillatory and strong-coupled problem. Simulation results demonstrate the superiority of the proposed multi-mode PASS over conventional single-mode PASS and fixed-antenna structures.

2601.20723 2026-01-29 eess.SY cs.GT cs.SI cs.SY

Distributed Learning over Noisy Communication Networks

Emrah Akyol, Marcos Vasconcelos

Comments draft, submitted to IEEE JSAC Special Issue on Distributed Optimization, Learning, and Inference over Communication-Constrained Networks

详情
英文摘要

We study binary coordination games over graphs under log-linear learning when neighbor actions are conveyed through explicit noisy communication links. Each edge is modeled as either a binary symmetric channel (BSC) or a binary erasure channel (BEC). We analyze two operational regimes. For binary symmetric and binary erasure channels, we provide a structural characterization of the induced learning dynamics. In a fast-communication regime, agents update using channel-averaged payoffs; the resulting learning dynamics coincide with a Gibbs sampler for a scaled coordination potential, where channel reliability enters only through a scalar attenuation coefficient. In a snapshot regime, agents update from a single noisy realization and ignore channel statistics; the induced Markov chain is generally nonreversible, but admits a high-temperature expansion whose drift matches that of the fast Gibbs sampler with the same attenuation. We further formalize a finite-$K$ communication budget, which interpolates between snapshot and fast behavior as the number of channel uses per update grows. This viewpoint yields a communication-theoretic interpretation in terms of retransmissions and repetition coding, and extends naturally to heterogeneous link reliabilities via effective edge weights. Numerical experiments illustrate the theory and quantify the tradeoff between communication resources and steady-state coordination quality.

2601.20721 2026-01-29 eess.SP

Sequential Processing Strategies in Fronthaul Constrained Cell-Free Massive MIMO Networks

Vida Ranjbar, Robbert Beerten, Marc Moonen, Sofie Pollin

Comments Accepted to be published in IEEE Wireless Communications Letters

详情
英文摘要

In a cell-free massive MIMO (CFmMIMO) network with a daisy-chain fronthaul, the amount of information that each access point (AP) needs to communicate with the next AP in the chain is determined by the location of the AP in the sequential fronthaul. Therefore, we propose two sequential processing strategies to combat the adverse effect of fronthaul compression on the sum of users' spectral efficiency (SE): 1) linearly increasing fronthaul capacity allocation among APs and 2) Two-Path users' signal estimation. The two strategies show superior performance in terms of sum SE compared to the equal fronthaul capacity allocation and Single-Path sequential signal estimation.

2601.20711 2026-01-29 eess.IV

Task-Based Adaptive Transmit Beamforming for Efficient Ultrasound Quantification

Oisín Nolan, Wessel L. van Nierop, Louis D. van Harten, Tristan S. W. Stevens, Ruud J. G. van Sloun

详情
英文摘要

Wireless and wearable ultrasound devices promise to enable continuous ultrasound monitoring, but power consumption and data throughput remain critical challenges. Reducing the number of transmit events per second directly impacts both. We propose a task-based adaptive transmit beamforming method, formulated as a Bayesian active perception problem, that adaptively chooses where to scan in order to gain information about downstream quantitative measurements, avoiding redundant transmit events. Our proposed Task-Based Information Gain (TBIG) strategy applies to any differentiable downstream task function. When applied to recovering ventricular dimensions from echocardiograms, TBIG recovers accurate results using fewer than 2% of scan lines typically used, showing potential for large reductions in the power usage and data rates necessary for monitoring. Code is available at https://github.com/tue-bmd/task-based-ulsa.

2601.20667 2026-01-29 eess.SP

Deep Learning based Three-stage Solution for ISAC Beamforming Optimization

Qian Gao, Ruikang Zhong, Yuanwei Liu

详情
英文摘要

In this paper, a general ISAC system where the base station (BS) communicates with multiple users and performs target detection is considered. Then, a sum communication rate maximization problem is formulated, subjected to the constraints of transmit power and the minimum sensing rates of users. To solve this problem, we develop a framework that leverages deep learning algorithms to provide a three-stage solution for ISAC beamforming. The three-stage beamforming optimization solution includes three modules: 1) an unsupervised learning based feature extraction algorithm is proposed to extract fixed-size latent features while keeping its essential information from the variable channel state information (CSI); 2) a reinforcement learning (RL) based beampattern optimization algorithm is proposed to search the desired beampattern according to the extracted features; 3) a supervised learning based beamforming reconstruction algorithm is proposed to reconstruct the beamforming vector from beampattern given by the RL agent. Simulation results demonstrate that the proposed three-stage solution outperforms the baseline RL algorithm by optimizing the intuitional beampattern rather than beamforming.

2601.20658 2026-01-29 eess.SP

Integrated Sensing and Communication for Segmented Waveguide-Enabled Pinching Antenna Systems

Qian Gao, Ruikang Zhong, Hyundong Shin, Yuanwei Liu

详情
英文摘要

In this paper, an integrated sensing and communication (ISAC) design for segmented waveguide-enabled pinching-antenna array (SWAN) systems is proposed to improve the performance of systems by leveraging the low in-waveguide propagation loss of segmented waveguides. The hybrid segment selection and multiplexing (HSSM) protocol is implemented to provide favorable performance with less hardware cost. To achieve this, a joint transmit beamforming optimization, segment selection, and pinching antenna positioning problem is formulated to maximize the sum communication rate with the constraints of sensing performance. To solve the maximization problem, we propose a segment hysteresis based reinforcement learning (SHRL) algorithm to learn segment selection and pinching antenna positions in different progress to explore better strategies. Simulation results demonstrate that 1) the proposed SWAN-ISAC scheme outperforms the other baseline schemes, and 2) the proposed HARL algorithm achieves better performance compared to conventional RL algorithms.

2601.20654 2026-01-29 eess.SP

RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems

Qian Gao, Ruikang Zhong, Yue Liu, Hyundong Shin, Yuanwei Liu

详情
英文摘要

In this paper, a three-dimensional (3D) deployment scheme of pinching antenna array is proposed, aiming to enhances the performance of integrated sensing and communication (ISAC) systems. To fully realize the potential of 3D deployment, a joint antenna positioning, time allocation and transmit power optimization problem is formulated to maximize the sum communication rate with the constraints of target sensing rates and system energy. To solve the sum rate maximization problem, we propose a heterogeneous graph neural network based reinforcement learning (HGRL) algorithm. Simulation results prove that 3D deployment of pinching antenna array outperforms 1D and 2D counterparts in ISAC systems. Moreover, the proposed HGRL algorithm surpasses other baselines in both performance and convergence speed due to the advanced observation construction of the environment.

2601.20583 2026-01-29 math.NA cs.NA eess.SP

Quantitative synthetic aperture radar inversion

Liliana Borcea, Josselin Garnier, Alexander V. Mamonov, Jörn Zimmerling

Comments Submitted to IEEE Transactions on Antennas and Propagation, keywords: SAR, imaging, multiple scattering, inversion

详情
英文摘要

We study an inverse scattering problem for monostatic synthetic aperture radar (SAR): Estimate the wave speed in a heterogeneous, isotropic and nonmagnetic medium probed by waves emitted and measured by a moving antenna. The forward map, from the wave speed to the measurements, is derived from Maxwell's equations. It is a nonlinear map that accounts for multiple scattering and it is very oscillatory at high frequencies. This makes the standard, nonlinear least squares data fitting formulation of the inverse problem difficult to solve. We introduce an alternative, two-step approach: The first step computes the nonlinear map from the measurements to an approximation of the electric field inside the unknown medium aka, the internal wave. This is done for each antenna location in a non-iterative manner. The internal wave fits the data by construction, but it does not solve Maxwell's equations. The second step uses optimization to minimize the discrepancy between the internal wave and the solution of Maxwell's equations, for all antenna locations. The optimization is iterative. The first step defines an imaging function whose computational cost is comparable to that of standard SAR imaging, but it gives a better estimate of the support of targets. Further iterations improve the quantitative estimation of the wave speed. We assess the performance of the method with numerical simulations and compare the results with those of standard inversion.

2601.20575 2026-01-29 eess.IV cs.CV

SegRap2025: A Benchmark of Gross Tumor Volume and Lymph Node Clinical Target Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Jia Fu, Litingyu Wang, He Li, Zihao Luo, Huamin Wang, Chenyuan Bian, Zijun Gao, Chunbin Gu, Xin Weng, Jianghao Wu, Yicheng Wu, Jin Ye, Linhao Li, Yiwen Ye, Yong Xia, Elias Tappeiner, Fei He, Abdul qayyum, Moona Mazher, Steven A Niederer, Junqiang Chen, Chuanyi Huang, Lisheng Wang, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Shichuan Zhang, Shaoting Zhang, Wenjun Liao, Guotai Wang

详情
英文摘要

Accurate delineation of Gross Tumor Volume (GTV), Lymph Node Clinical Target Volume (LN CTV), and Organ-at-Risk (OAR) from Computed Tomography (CT) scans is essential for precise radiotherapy planning in Nasopharyngeal Carcinoma (NPC). Building upon SegRap2023, which focused on OAR and GTV segmentation using single-center paired non-contrast CT (ncCT) and contrast-enhanced CT (ceCT) scans, the SegRap2025 challenge aims to enhance the generalizability and robustness of segmentation models across imaging centers and modalities. SegRap2025 comprises two tasks: Task01 addresses GTV segmentation using paired CT from the SegRap2023 dataset, with an additional external testing set to evaluate cross-center generalization, and Task02 focuses on LN CTV segmentation using multi-center training data and an unseen external testing set, where each case contains paired CT scans or a single modality, emphasizing both cross-center and cross-modality robustness. This paper presents the challenge setup and provides a comprehensive analysis of the solutions submitted by ten participating teams. For GTV segmentation task, the top-performing models achieved average Dice Similarity Coefficient (DSC) of 74.61% and 56.79% on the internal and external testing cohorts, respectively. For LN CTV segmentation task, the highest average DSC values reached 60.24%, 60.50%, and 57.23% on paired CT, ceCT-only, and ncCT-only subsets, respectively. SegRap2025 establishes a large-scale multi-center, multi-modality benchmark for evaluating the generalization and robustness in radiotherapy target segmentation, providing valuable insights toward clinically applicable automated radiotherapy planning systems. The benchmark is available at: https://hilab-git.github.io/SegRap2025_Challenge.

2601.20555 2026-01-29 cs.RO eess.SP

Vibro-Sense: Robust Vibration-based Impulse Response Localization and Trajectory Tracking for Robotic Hands

Wadhah Zai El Amri, Nicolás Navarro-Guerrero

Comments Under Review: Springer Autonomous Robots Journal

详情
英文摘要

Rich contact perception is crucial for robotic manipulation, yet traditional tactile skins remain expensive and complex to integrate. This paper presents a scalable alternative: high-accuracy whole-body touch localization via vibro-acoustic sensing. By equipping a robotic hand with seven low-cost piezoelectric microphones and leveraging an Audio Spectrogram Transformer, we decode the vibrational signatures generated during physical interaction. Extensive evaluation across stationary and dynamic tasks reveals a localization error of under 5 mm in static conditions. Furthermore, our analysis highlights the distinct influence of material properties: stiff materials (e.g., metal) excel in impulse response localization due to sharp, high-bandwidth responses, whereas textured materials (e.g., wood) provide superior friction-based features for trajectory tracking. The system demonstrates robustness to the robot's own motion, maintaining effective tracking even during active operation. Our primary contribution is demonstrating that complex physical contact dynamics can be effectively decoded from simple vibrational signals, offering a viable pathway to widespread, affordable contact perception in robotics. To accelerate research, we provide our full datasets, models, and experimental setups as open-source resources.

2601.20547 2026-01-29 eess.SP

Vehicular Wireless Positioning -- A Survey

Sharief Saleh, Satyam Dwivedi, Russ Whiton, Peter Hammarberg, Musa Furkan Keskin, Julia Equi, Hui Chen, Florent Munier, Olof Eriksson, Fredrik Gunnarsson, Fredrik Tufvesson, Henk Wymeersch

Comments Under review at IEEE Communications Surveys & Tutorials

详情
英文摘要

The rapid advancement of connected and autonomous vehicles has driven a growing demand for precise and reliable positioning systems capable of operating in complex environments. Meeting these demands requires an integrated approach that combines multiple positioning technologies, including wireless-based systems, perception-based technologies, and motion-based sensors. This paper presents a comprehensive survey of wireless-based positioning for vehicular applications, with a focus on satellite-based positioning (such as global navigation satellite systems (GNSS) and low-Earth-orbit (LEO) satellites), cellular-based positioning (5G and beyond), and IEEE-based technologies (including Wi-Fi, ultrawideband (UWB), Bluetooth, and vehicle-to-vehicle (V2V) communications). First, the survey reviews a wide range of vehicular positioning use cases, outlining their specific performance requirements. Next, it explores the historical development, standardization, and evolution of each wireless positioning technology, providing an in-depth categorization of existing positioning solutions and algorithms, and identifying open challenges and contemporary trends. Finally, the paper examines sensor fusion techniques that integrate these wireless systems with onboard perception and motion sensors to enhance positioning accuracy and resilience in real-world conditions. This survey thus offers a holistic perspective on the historical foundations, current advancements, and future directions of wireless-based positioning for vehicular applications, addressing a critical gap in the literature.

2601.20510 2026-01-29 cs.SD cs.AI eess.AS

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models

Robin Singh, Aditya Yogesh Nair, Fabio Palumbo, Florian Barbaro, Anna Dyka, Lohith Rachakonda

Comments This work was performed using HPC resources from GENCI-IDRIS (Grant 2025- AD011016076)

详情
英文摘要

Recent advances in Text-to-Speech (TTS) systems have substantially increased the realism of synthetic speech, raising new challenges for audio deepfake detection. This work presents a comparative evaluation of three state-of-the-art TTS models--Dia2, Maya1, and MeloTTS--representing streaming, LLM-based, and non-autoregressive architectures. A corpus of 12,000 synthetic audio samples was generated using the Daily-Dialog dataset and evaluated against four detection frameworks, including semantic, structural, and signal-level approaches. The results reveal significant variability in detector performance across generative mechanisms: models effective against one TTS architecture may fail against others, particularly LLM-based synthesis. In contrast, a multi-view detection approach combining complementary analysis levels demonstrates robust performance across all evaluated models. These findings highlight the limitations of single-paradigm detectors and emphasize the necessity of integrated detection strategies to address the evolving landscape of audio deepfake threats.

2601.20481 2026-01-29 eess.AS cs.SD

Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech

Myungjin Lee, Eunji Shin, Jiyoung Lee

Comments ICASSP'2026

详情
英文摘要

Modern zero-shot text-to-speech (TTS) models offer unprecedented expressivity but also pose serious crime risks, as they can synthesize voices of individuals who never consented. In this context, speaker unlearning aims to prevent the generation of specific speaker identities upon request. Existing approaches, reliant on retraining, are costly and limited to speakers seen in the training set. We present TruS, a training-free speaker unlearning framework that shifts the paradigm from data deletion to inference-time control. TruS steers identity-specific hidden activations to suppress target speakers while preserving other attributes (e.g., prosody and emotion). Experimental results show that TruS effectively prevents voice generation on both seen and unseen opt-out speakers, establishing a scalable safeguard for speech synthesis. The demo and code are available on http://mmai.ewha.ac.kr/trus.

2601.20445 2026-01-29 eess.SY cs.SY

A Timing-Anomaly Free Dynamic Scheduling on Heterogeneous Systems

Yixuan Zhu, Yinkang Gao, Lei Gong, Binze Jiang, Xiaohang Gong, Zihan Wang, Cheng Tang, Wenqi Lou, Teng Wang, Chao Wang, Xi Li, Xuehai Zhou

详情
英文摘要

Heterogeneous systems commonly adopt dynamic scheduling algorithms to improve resource utilization and enhance scheduling flexibility. However, such flexibility may introduce timing anomalies, wherein locally reduced execution times can lead to an increase in the overall system execution time. This phenomenon significantly complicates the analysis of Worst-Case Response Time (WCRT), rendering conventional analysis either overly pessimistic or unsafe, and often necessitating exhaustive state-space exploration to ensure correctness. To address this challenge, this paper presents the first timing-anomaly-free dynamic scheduling algorithm for heterogeneous systems, referred to as Deterministic Dynamic Execution. It achieves a safe and tight WCRT estimate through a single offline simulation execution. The core idea is to apply deterministic execution constraints, which partially restrict the resource allocation and execution order of tasks at runtime. Based on a formally defined execution progress model for heterogeneous system scheduling, we prove the correctness of the proposed execution constraints and their ability to eliminate timing anomalies. Furthermore, we propose two methods to generate execution constraints. The first method derives execution constraints directly from the execution traces produced by existing scheduling algorithms. The second method is a heuristic-based approach that constructs execution constraints, enabling further reduction of the WCRT. Experimental results on synthetically generated DAG task sets under various system configurations demonstrate that, compared to traditional dynamic scheduling algorithms, our approach not only eliminates timing anomalies but also effectively reduces both the WCRT and response time jitter.

2601.20427 2026-01-29 eess.SY cs.SY

Reducing End-to-End Latency of Cause-Effect Chains with Shared Cache Analysis

Yixuan Zhu, Yinkang Gao, Bo Zhang, Xiaohang Gong, Binze Jiang, Lei Gong, Wenqi Lou, Teng Wang, Chao Wang, Xi Li, Xuehai Zhou

详情
英文摘要

Cause-effect chains, as a widely used modeling method in real-time embedded systems, are extensively applied in various safety-critical domains. End-to-end latency, as a key real-time attribute of cause-effect chains, is crucial in many applications. But the analysis of end-to-end latency for cause-effect chains on multicore platforms with shared caches still presents an unresolved issue. Traditional methods typically assume that the worst-case execution time (WCET) of each task in the cause-effect chain is known. However, in the absence of scheduling information, these methods often assume that all shared cache accesses result in misses, leading to an overestimation of WCET and, consequently, affecting the accuracy of end-to-end latency. However, effectively integrating scheduling information into the WCET analysis process of the chains may introduce two challenges: first, how to leverage the structural characteristics of the chains to optimize shared cache analysis, and second, how to improve analysis accuracy while avoiding state space explosion. To address these issues, this paper proposes a novel end-to-end latency analysis framework designed for multi-chain systems on multicore platforms with shared caches. This framework extracts scheduling information and structural characteristics of cause-effect chains, constructing fine-grained and scalable inter-core memory access contexts at the basic block level for time-sensitive shared cache analysis. This results in more accurate WCET (TSC-WCET) estimates, which are then used to derive the end-to-end latency. Finally, we conduct experiments on dual-core and quad-core systems with various cache configurations, which show that under certain settings, the average maximum end-to-end latency of cause-effect chains is reduced by up to 34% and 26%.

2601.20367 2026-01-29 cs.LG cs.SY eess.SY

Unsupervised Anomaly Detection in Multi-Agent Trajectory Prediction via Transformer-Based Models

Qing Lyu, Zhe Fu, Alexandre Bayen

详情
英文摘要

Identifying safety-critical scenarios is essential for autonomous driving, but the rarity of such events makes supervised labeling impractical. Traditional rule-based metrics like Time-to-Collision are too simplistic to capture complex interaction risks, and existing methods lack a systematic way to verify whether statistical anomalies truly reflect physical danger. To address this gap, we propose an unsupervised anomaly detection framework based on a multi-agent Transformer that models normal driving and measures deviations through prediction residuals. A dual evaluation scheme has been proposed to assess both detection stability and physical alignment: Stability is measured using standard ranking metrics in which Kendall Rank Correlation Coefficient captures rank agreement and Jaccard index captures the consistency of the top-K selected items; Physical alignment is assessed through correlations with established Surrogate Safety Measures (SSM). Experiments on the NGSIM dataset demonstrate our framework's effectiveness: We show that the maximum residual aggregator achieves the highest physical alignment while maintaining stability. Furthermore, our framework identifies 388 unique anomalies missed by Time-to-Collision and statistical baselines, capturing subtle multi-agent risks like reactive braking under lateral drift. The detected anomalies are further clustered into four interpretable risk types, offering actionable insights for simulation and testing.

2601.20345 2026-01-29 math.OC eess.SP

Decentralized Stochastic Constrained Optimization via Prox-Linearization

Shivangi Dubey Sharma, Basil M. Idrees, Lavish Arora, Ketan Rajawat

Comments 18 pages, 6 figures

详情
英文摘要

This paper studies consensus-based decentralized stochastic optimization for minimizing possibly non-convex expected objectives with convex non-smooth regularizers and nonlinear functional inequality constraints. We reformulate the constrained problem using the exact-penalty model and develop two algorithms that require only local stochastic gradients and first-order constraint information. The first method, Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL), combines constraint linearization with a prox-linear step, resulting in a linearly constrained quadratic subproblem per iteration. Building on this approach, we propose a successive convex approximation (SCA) variant, Decentralized SCA Momentum-based Prox-Linear (D-SCAMPL), which handles additional objective structure through strongly convex surrogate subproblems while still allowing infeasible initialization. Both methods incorporate recursive momentum-based gradient estimators and a consensus mechanism requiring only two communication rounds per iteration. Under standard smoothness and regularity assumptions, both algorithms achieve an oracle complexity of $\mathcal{O}(ε^{-3/2})$, matching the optimal rate known for unconstrained centralized stochastic non-convex optimization. Numerical experiments on energy-optimal ocean trajectory planning corroborate the theory and demonstrate improved performance over existing decentralized baselines.

2601.20324 2026-01-29 eess.SY cs.SY

Neural Cooperative Reach-While-Avoid Certificates for Interconnected Systems

Jingyuan Zhou, Haoze Wu, Kaidi Yang

详情
英文摘要

Providing formal guarantees for neural network-based controllers in large-scale interconnected systems remains a fundamental challenge. In particular, using neural certificates to capture cooperative interactions and verifying these certificates at scale is crucial for the safe deployment of such controllers. However, existing approaches fall short on both fronts. To address these limitations, we propose neural cooperative reach-while-avoid certificates with Dynamic-Localized Vector Control Lyapunov and Barrier Functions, which capture cooperative dynamics through state-dependent neighborhood structures and provide decentralized certificates for global exponential stability and safety. Based on the certificates, we further develop a scalable training and verification framework that jointly synthesizes controllers and neural certificates via a constrained optimization objective, and leverages a sufficient condition to ensure formal guarantees considering modeling error. To improve scalability, we introduce a structural reuse mechanism to transfer controllers and certificates between substructure-isomorphic systems. The proposed methodology is validated with extensive experiments on multi-robot coordination and vehicle platoons. Results demonstrate that our framework ensures certified cooperative reach-while-avoid while maintaining strong control performance.

2601.20319 2026-01-29 eess.AS

ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy

Ya-Tse Wu, Chi-Chun Lee

Comments Accepted for publication at IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2025

详情
英文摘要

This work investigates how emotional speech and generative strategies affect ASR performance. We analyze speech synthesized from three emotional TTS models and find that substitution errors dominate, with emotional expressiveness varying across models. Based on these insights, we introduce two generative strategies: one using transcription correctness and another using emotional salience, to construct fine-tuning subsets. Results show consistent WER improvements on real emotional datasets without noticeable degradation on clean LibriSpeech utterances. The combined strategy achieves the strongest gains, particularly for expressive speech. These findings highlight the importance of targeted augmentation for building emotion-aware ASR systems.

2601.20314 2026-01-29 eess.SY cs.SY

Efficient Trajectory Design and Communication Scheduling for Dual-UAV Jamming-Aided Secure Communication Networks

Xinran Wang, Peng Wu, Xiaopeng Yuan, Yulin Hu, Anke Schmeink

详情
英文摘要

We study dual-unmanned aerial vehicle (UAV) jamming-aided secure communication networks, in which one UAV delivers confidential data to multiple ground users (GUs), while a cooperative UAV provides protective interference against a ground eavesdropper. To enforce fairness, we maximize the minimum secrecy throughput across GUs by jointly designing trajectories and communication scheduling. The key difficulty lies in the continuous-time nature of UAV trajectories and the tight space-time coupling between the transmitter and the jammer, which jointly render the problem infinite-dimensional and nonconvex. To address these challenges, we characterize, for the first time, the structure of the optimal trajectories and rigorously prove that they follow a collaborative successive hover-and-fly (co-SHF) structure, where the two UAVs visit a limited number of synchronized co-hovering point pairs, and during each flight segment at least one UAV moves at maximum speed. Leveraging this structure, we reformulate the problem into a finite-dimensional form, without loss of optimality, over hovering and turning points, hovering durations, and scheduling. For tractability, we adopt a minimum-distance approximation of continuous anti-collision constraints and employ concave lower bounds on secrecy throughput within a successive convex approximation (SCA) method, which converges and, thanks to the co-SHF reduction in optimization variables and constraints, achieves low computational complexity. Numerical results show that, compared with time-discretization and no-jamming benchmarks, the proposed co-SHF design improves the min-secrecy and user fairness while requiring significantly less runtime.

2601.20300 2026-01-29 cs.CL eess.AS

MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting

Jing Xu, Minglin Wu, Xueyuan Chen, Xixin Wu, Helen Meng

Comments Accepted by ICASSP2026

详情
英文摘要

Self-supervised learning (SSL) has greatly advanced speech representation learning, but multilingual SSL models remain constrained to languages encountered during pretraining. Retraining from scratch to incorporate new languages is computationally expensive, while sequential training without migitation strategies often leads to catastrophic forgetting. To address this, we propose MiLorE-SSL, a lightweight framework that combines LoRA modules with a soft mixture-of-experts (MoE) mechanism for efficient continual multilingual training. LoRA provides efficient low-rank adaptation, while soft MoE promotes flexible expert sharing across languages, reducing cross-lingual interference. To further mitigate forgetting, we introduce limited replay data from existing languages, avoiding reliance on large historical corpora. Experiments on ML-SUPERB demonstrate that MiLorE-SSL achieves strong performance in new languages and improves the ability in existing ones with only 2.14% trainable parameters.

2601.20298 2026-01-29 eess.SY cs.SY

A Data-Driven Krasovskii-Based Approach for Safety Controller Design of Time-Delayed Uncertain Polynomial Systems

Omid Akbarzadeh, MohammadHossein Ashoori, Amy Nejati, Abolfazl Lavaei

详情
英文摘要

We develop a data-driven framework for the synthesis of robust Krasovskii control barrier certificates (RK-CBC) and corresponding robust safety controllers (R-SC) for discrete-time input-affine uncertain polynomial systems with unknown dynamics, while explicitly accounting for unknown-but-bounded disturbances and time-invariant delays using only observed input-state data. Although control barrier certificates have been extensively studied for safety analysis of control systems, existing work on unknown systems with time delays, particularly in the presence of disturbances, remains limited. The challenge of safety synthesis for such systems stems from two main factors: first, the system's mathematical model is unavailable; and second, the safety conditions should explicitly incorporate the effects of time delays on system evolution during the synthesis process, while remaining robust to unknown disturbances. To address these challenges, we develop a data-driven framework based on Krasovskii control barrier certificates, extending the classical CBC formulation for delay-free systems to explicitly account for time delays by aggregating delayed components within the barrier construction. The proposed framework relies solely on input-state data collected over a finite time horizon, enabling the direct synthesis of RK-CBC and R-SC from observed trajectories without requiring an explicit system model. The synthesis is cast as a data-driven sum-of-squares (SOS) optimization program, yielding a structured design methodology. As a result, robust safety is guaranteed in the presence of unknown disturbances and time delays over an infinite time horizon. The effectiveness of the proposed method is demonstrated through three case studies, including two physical systems.

2601.20291 2026-01-29 cs.LG eess.SP physics.med-ph

A Learning-based Framework for Spatial Impulse Response Compensation in 3D Photoacoustic Computed Tomography

Kaiyi Yang, Seonyeong Park, Gangwon Jeong, Hsuan-Kai Huang, Alexander A. Oraevsky, Umberto Villa, Mark A. Anastasio

Comments Submitted to IEEE TMI

详情
英文摘要

Photoacoustic computed tomography (PACT) is a promising imaging modality that combines the advantages of optical contrast with ultrasound detection. Utilizing ultrasound transducers with larger surface areas can improve detection sensitivity. However, when computationally efficient analytic reconstruction methods that neglect the spatial impulse responses (SIRs) of the transducer are employed, the spatial resolution of the reconstructed images will be compromised. Although optimization-based reconstruction methods can explicitly account for SIR effects, their computational cost is generally high, particularly in three-dimensional (3D) applications. To address the need for accurate but rapid 3D PACT image reconstruction, this study presents a framework for establishing a learned SIR compensation method that operates in the data domain. The learned compensation method maps SIR-corrupted PACT measurement data to compensated data that would have been recorded by idealized point-like transducers. Subsequently, the compensated data can be used with a computationally efficient reconstruction method that neglects SIR effects. Two variants of the learned compensation model are investigated that employ a U-Net model and a specifically designed, physics-inspired model, referred to as Deconv-Net. A fast and analytical training data generation procedure is also a component of the presented framework. The framework is rigorously validated in virtual imaging studies, demonstrating resolution improvement and robustness to noise variations, object complexity, and sound speed heterogeneity. When applied to in-vivo breast imaging data, the learned compensation models revealed fine structures that had been obscured by SIR-induced artifacts. To our knowledge, this is the first demonstration of learned SIR compensation in 3D PACT imaging.

2601.20226 2026-01-29 cs.LG cs.SY eess.SY

Parametric and Generative Forecasts of Day-Ahead Market Curves for Storage Optimization

Julian Gutierrez, Redouane Silvente

Comments 46 pages, 41 figures

详情
英文摘要

We present two machine learning frameworks for forecasting aggregated curves and optimizing storage in the EPEX SPOT day-ahead market. First, a fast parametric model forecasts hourly demand and supply curves in a low-dimensional and grid-robust representation, with minimum and maximum volumes combined with a Chebyshev polynomial for the elastic segment. The model enables daily use with low error and clear interpretability. Second, for a more comprehensive analysis, though less suited to daily operation, we employ generative models that learn the joint distribution of 24-hour order-level submissions given weather and fuel variables. These models generate synthetic daily scenarios of individual buy and sell orders, which, once aggregated, yield hourly supply and demand curves. Based on these forecasts, we optimize a price-making storage strategy, quantify revenue distributions, and highlight the price-compression effect with lower peaks, higher off-peak levels, and diminishing returns as capacity expands.