arXivDaily arXiv每日学术速递 周一至周五更新
2601.15240 2026-01-22 cs.SD eess.AS

WeDefense: A Toolkit to Defend Against Fake Audio

Lin Zhang, Johan Rohdin, Xin Wang, Junyi Peng, Tianchi Liu, You Zhang, Hieu-Thi Luong, Shuai Wang, Chengdong Liang, Anna Silnova, Nicholas Evans

Comments This is an ongoing work. v1 corresponds to the version completed by June 4, 2025 and previously submitted to ASRU 2025

详情
英文摘要

The advances in generative AI have enabled the creation of synthetic audio which is perceptually indistinguishable from real, genuine audio. Although this stellar progress enables many positive applications, it also raises risks of misuse, such as for impersonation, disinformation and fraud. Despite a growing number of open-source fake audio detection codes released through numerous challenges and initiatives, most are tailored to specific competitions, datasets or models. A standardized and unified toolkit that supports the fair benchmarking and comparison of competing solutions with not just common databases, protocols, metrics, but also a shared codebase, is missing. To address this, we propose WeDefense, the first open-source toolkit to support both fake audio detection and localization. Beyond model training, WeDefense emphasizes critical yet often overlooked components: flexible input and augmentation, calibration, score fusion, standardized evaluation metrics, and analysis tools for deeper understanding and interpretation. The toolkit is publicly available at https://github.com/zlin0/wedefense with interactive demos for fake audio detection and localization.

2601.14882 2026-01-22 math.OC cs.SY eess.SY

Practical prescribed-time prescribed performance control with asymptotic convergence -- A vanishing sigma-modification approach

Mehdi Golestani, Yongduan Song, Weizhen Liu, Guangren Duan, He Kong

详情
英文摘要

In this paper, we present a method capable of ensuring practical prescribed-time control with guaranteed performance for a class of nonlinear systems in the presence of time-varying parametric and dynamic uncertainties, and uncertain control coefficients. Our design consists of two key steps. First, we construct a performance-rate function that freezes at and after a user-specified time T, playing a crucial role in achieving desired precision within prescribed time T and dealing with unmodeled dynamics. Next, based on this function and a sigma-modification strategy in which the leakage term starts to vanish at t > T, we develop an adaptive dynamic surface control framework to reduce control complexity, deal with uncertainties, ensure prescribed performance, practical prescribed-time convergence to a specific region, and ultimately achieve asymptotic convergence. The effectiveness of the proposed control method is validated through numerical simulations.

2508.06405 2026-01-22 eess.AS eess.SP

Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models

Guilherme Zucatelli, Ricardo Barioni, Gabriela Dantas

Comments Accepted at ICASSP 2026

详情
英文摘要

Objective non-stationarity measures are resource intensive and impose critical limitations for real-time processing solutions. In this paper, a novel Hard Label Criteria (HLC) algorithm is proposed to generate a global non-stationarity label for acoustic signals, enabling supervised learning strategies to be trained as stationarity estimators. The HLC is first evaluated on state-of-the-art general-purpose acoustic models, demonstrating that these models capture stationarity information. Furthermore, the first-of-its-kind HLC-based Network for Acoustic Non-Stationarity Assessment (NANSA) is proposed. NANSA models outperform competing approaches, achieving up to 99% classification accuracy, while solving the computational infeasibility of traditional objective measures.

2502.06967 2026-01-22 cs.IT eess.SP math.IT

Downlink and Uplink ISAC in Continuous-Aperture Array (CAPA) Systems

Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Hyundong Shin, Yuanwei Liu

Comments 16 pages, 12 figures

Journal ref IEEE Trans. Wireless Commun., vol. 25, pp. 3592-3609, 2026

详情
英文摘要

A continuous-aperture array (CAPA)-based integrated sensing and communications (ISAC) framework is proposed for both downlink and uplink scenarios. Within this framework, continuous operator-based signal models are employed to describe the sensing and communication processes. The performance of communication and sensing is analyzed using two information-theoretic metrics: the communication rate (CR) and the sensing rate (SR). 1) For downlink ISAC, three continuous beamforming designs are proposed: i) the communications-centric (C-C) design that maximizes the CR, ii) the sensing-centric (S-C) design that maximizes the SR, and iii) the Pareto-optimal design that characterizes the Pareto boundary of the CR-SR region. A low-complexity signal subspace-based approach is proposed to derive the closed-form optimal beamformers for the considered designs. On this basis, closed-form expressions are derived for the achievable CRs and SRs, and the downlink rate region achieved by CAPAs is characterized. 2) For uplink ISAC, the C-C and S-C successive interference cancellation-based methods are proposed to manage inter-functionality interference. Using the subspace approach closed-form expressions for the optimal detectors as well as the achievable CRs and SRs are derived. The uplink SR-CR region is characterized based on the time-sharing technique. Numerical results demonstrate that, for both downlink and uplink, CAPA-based ISAC achieves higher CRs and SRs as well as larger CR-SR regions compared to conventional spatially discrete array-based ISAC.

2406.15056 2026-01-22 cs.IT eess.SP math.IT

Continuous-Aperture Array (CAPA)-Based Wireless Communications: Capacity Characterization

Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

Journal ref IEEE Trans. Wireless Commun., vol. 24, no. 12, pp. 10456-10473, Dec. 2025

详情
英文摘要

The capacity limits of continuous-aperture array (CAPA)-based wireless communications are characterized. To this end, an analytically tractable transmission framework is established for both uplink and downlink CAPA systems. Based on this framework, closed-form expressions for the single-user channel capacity are derived. The results are further extended to a multiuser case by characterizing the capacity limits of a two-user channel and proposing the associated capacity-achieving decoding and encoding schemes. In the uplink case, the capacity-achieving detectors and sum-rate capacity are derived, and the capacity region is characterized. In the downlink case, the uplink-downlink duality is established by deriving the uplink-to-downlink and downlink-to-uplink transformations under the same power constraint, based on which the optimal source current distributions and the achieved sum-rate capacity and capacity region are characterized. For comparison, the uplink and downlink sum-rates achieved by the linear zero-forcing scheme are also analyzed. To gain further insights, several case studies are presented by specializing the derived results into various array structures, including the planar CAPA, linear CAPA, and planar spatially discrete array (SPDA). Numerical results are provided to reveal that the channel capacity achieved by CAPAs converges towards a finite upper bound as the aperture size increases; and CAPAs offer superior capacity over the conventional SPDAs.

2601.15196 2026-01-22 eess.SY cs.SY

TTCBF: A Truncated Taylor Control Barrier Function for High-Order Safety Constraints

Jianye Xu, Bassam Alrifaee

详情
英文摘要

Control Barrier Functions (CBFs) enforce safety by rendering a prescribed safe set forward invariant. However, standard CBFs are limited to safety constraints with relative degree one, while High-Order CBF (HOCBF) methods address higher relative degree at the cost of introducing a chain of auxiliary functions and multiple class K functions whose tuning scales with the relative degree. In this paper, we introduce a Truncated Taylor Control Barrier Function (TTCBF), which generalizes standard discrete-time CBFs to consider high-order safety constraints and requires only one class K function, independent of the relative degree. We also propose an adaptive variant, adaptive TTCBF (aTTCBF), that optimizes an online gain on the class K function to improve adaptability, while requiring fewer control design parameters than existing adaptive HOCBF variants. Numerical experiments in a relative-degree-six spring-mass system and a cluttered corridor navigation validate the above theoretical findings.

2601.15135 2026-01-22 eess.SY cs.SY

Stochastic EMS for Optimal 24/7 Carbon-Free Energy Operations

Natanon Tongamrak, Kannapha Amaruchkul, Wijarn Wangdee, Jitkomut Songsiri

Comments 23 pages

详情
英文摘要

This paper proposes a two-stage stochastic optimization formulation to determine optimal operation and procurement plans for achieving a 24/7 carbon-free energy (CFE) compliance at minimized cost. The system in consideration follows primary energy technologies in Thailand including solar power, battery storage, and a diverse portfolio of renewable and carbon-based energy procurement sources. Unlike existing literature focused on long-term planning, this study addresses near real-time operations using a 15-minute resolution. A novel feature of the formulation is the explicit treatment of CFE compliance as a model parameter, enabling flexible targets such as a minimum percentage of hourly matching or a required number of carbon-free days within a multi-day horizon. The mixed-integer linear programming formulation accounts for uncertainties in load and solar generation by integrating deep learning-based forecasting within a receding horizon framework. By optimizing battery profiles and multi-source procurement simultaneously, the proposed system provides a feasible pathway for transitioning to carbon-free operations in emerging energy markets.

2601.15126 2026-01-22 eess.SP

Sparse Sensor Arrays for Active Sensing: Models, Configurations and Applications

Robin Rajamäki, Visa Koivunen

Journal ref Chapter 9 of Amin, Moeness G., ed. "Sparse arrays for radar, sonar, and communications" John Wiley & Sons, 2024

详情
英文摘要

This chapter focuses on active sensing using sparse arrays. In active sensing applications, such as radar, sonar, wireless communications, and medical ultrasound, a collection of sensors probes the environment by emitting self-generated energy. A key benefit of such active multi-sensor arrays is their ability to focus and steer energy in desired directions by beamforming on transmit. Sparse sensor arrays offer several advantages over conventional uniform arrays, including improved resolution using fewer physical sensors and the capability to identify more scatterers than sensors. This is facilitated by the effective transmit-receive virtual array known as the sum co-array, which can have many more virtual sensors than the number of physical transmit or receive sensors. Herein, we focus on the design of low-redundancy sparse array configurations and on employing transmit-receive (Tx-Rx) beamforming using sparse arrays. We discuss the optimal, but computationally intractable Minimum-redundancy array, and a scalable symmetric array framework, which extends many well-known passive sparse array geometries to the active case. We also examine mitigating side lobes arising from spatial undersampling by a synthetic beamforming method known as image addition. We briefly present approaches for finding the physical beamforming weights synthesizing a desired Tx-Rx beampattern, and consider related spatio-temporal trade-offs. We conclude by discussing selected applications of sparse arrays in active sensing.

2601.15102 2026-01-22 cs.LG eess.IV

Field-Space Autoencoder for Scalable Climate Emulators

Johannes Meuer, Maximilian Witte, Étiénne Plésiat, Thomas Ludwig, Christopher Kadow

详情
英文摘要

Kilometer-scale Earth system models are essential for capturing local climate change. However, these models are computationally expensive and produce petabyte-scale outputs, which limits their utility for applications such as probabilistic risk assessment. Here, we present the Field-Space Autoencoder, a scalable climate emulation framework based on a spherical compression model that overcomes these challenges. By utilizing Field-Space Attention, the model efficiently operates on native climate model output and therefore avoids geometric distortions caused by forcing spherical data onto Euclidean grids. This approach preserves physical structures significantly better than convolutional baselines. By producing a structured compressed field, it serves as a good baseline for downstream generative emulation. In addition, the model can perform zero-shot super-resolution that maps low-resolution large ensembles and scarce high-resolution data into a shared representation. We train a generative diffusion model on these compressed fields. The model can simultaneously learn internal variability from abundant low-resolution data and fine-scale physics from sparse high-resolution data. Our work bridges the gap between the high volume of low-resolution ensemble statistics and the scarcity of high-resolution physical detail.

2601.15099 2026-01-22 eess.SY cs.SY eess.SP

Instantaneous Frequency in Power Systems using the Teager-Kaiser Energy Operator

A. Vaca, J. Gutierrez Florensa, F. Milano

详情
英文摘要

This paper develops an instantaneous-frequency (IF) local estimator calculated with the complex Teager-Kaiser energy operator (CTKEO) and the dynamic-signal identity. The contribution is a novel IF expression that makes the envelope-curvature terms explicit, thus correcting the bias that affects conventional estimators used in power systems. The estimator aligns with complex-frequency (CF) kinematics and admits a geometric interpretation (curvature/torsion) without phase unwrapping. Simulations and data-driven examples demonstrate the accuracy of the proposed approach.

2601.15097 2026-01-22 eess.SP cs.SD eess.AS

Neural Tracking of Sustained Attention, Attention Switching, and Natural Conversation in Audiovisual Environments using Mobile EEG

Johanna Wilroth, Oskar Keding, Martin A. Skoglund, Maria Sandsten, Martin Enqvist, Emina Alickovic

Comments Submitted to European Journal of Neuroscience

详情
英文摘要

Everyday communication is dynamic and multisensory, often involving shifting attention, overlapping speech and visual cues. Yet, most neural attention tracking studies are still limited to highly controlled lab settings, using clean, often audio-only stimuli and requiring sustained attention to a single talker. This work addresses that gap by introducing a novel dataset from 24 normal-hearing participants. We used a mobile electroencephalography (EEG) system (44 scalp electrodes and 20 cEEGrid electrodes) in an audiovisual (AV) paradigm with three conditions: sustained attention to a single talker in a two-talker environment, attention switching between two talkers, and unscripted two-talker conversations with a competing single talker. Analysis included temporal response functions (TRFs) modeling, optimal lag analysis, selective attention classification with decision windows ranging from 1.1s to 35s, and comparisons of TRFs for attention to AV conversations versus side audio-only talkers. Key findings show significant differences in the attention-related P2-peak between attended and ignored speech across conditions for scalp EEG. No significant change in performance between switching and sustained attention suggests robustness for attention switches. Optimal lag analysis revealed narrower peak for conversation compared to single-talker AV stimuli, reflecting the additional complexity of multi-talker processing. Classification of selective attention was consistently above chance (55-70% accuracy) for scalp EEG, while cEEGrid data yielded lower correlations, highlighting the need for further methodological improvements. These results demonstrate that mobile EEG can reliably track selective attention in dynamic, multisensory listening scenarios and provide guidance for designing future AV paradigms and real-world attention tracking applications.

2601.15059 2026-01-22 cs.AI cs.SY eess.SY

The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems

Oleg Romanchuk, Roman Bondar

详情
英文摘要

Modern CI/CD pipelines integrating agent-generated code exhibit a structural failure in responsibility attribution. Decisions are executed through formally correct approval processes, yet no entity possesses both the authority to approve those decisions and the epistemic capacity to meaningfully understand their basis. We define this condition as responsibility vacuum: a state in which decisions occur, but responsibility cannot be attributed because authority and verification capacity do not coincide. We show that this is not a process deviation or technical defect, but a structural property of deployments where decision generation throughput exceeds bounded human verification capacity. We identify a scaling limit under standard deployment assumptions, including parallel agent generation, CI-based validation, and individualized human approval gates. Beyond a throughput threshold, verification ceases to function as a decision criterion and is replaced by ritualized approval based on proxy signals. Personalized responsibility becomes structurally unattainable in this regime. We further characterize a CI amplification dynamic, whereby increasing automated validation coverage raises proxy signal density without restoring human capacity. Under fixed time and attention constraints, this accelerates cognitive offloading in the broad sense and widens the gap between formal approval and epistemic understanding. Additional automation therefore amplifies, rather than mitigates, the responsibility vacuum. We conclude that unless organizations explicitly redesign decision boundaries or reassign responsibility away from individual decisions toward batch- or system-level ownership, responsibility vacuum remains an invisible but persistent failure mode in scaled agent deployments.

2601.15024 2026-01-22 eess.SP stat.CO

Physical Layer Security in Massive MIMO: Challenges and Open Research Directions Against Passive Eavesdroppers

Nipun Agarwal

详情
英文摘要

Massive Multiple-Input Multiple-Output (MIMO) has become a crucial enabling technology for 5G and beyond, providing previously unheard-of increases in energy and spectrum efficiency. It is still difficult to guarantee secure communication in these systems, particularly when it comes to passive eavesdroppers whose base station is unaware of their channel state information. By taking advantage of the inherent randomness of wireless channels, Physical Layer Security (PLS) offers a promising paradigm; however, its efficacy in massive MIMO is heavily reliant on resource allocation and transmission strategies. In this work, the performance of secure transmission schemes, such as Maximum Ratio Transmission (MRT), Zero-Forcing (ZF), and Artificial Noise (AN)-aided beamforming, is examined when passive eavesdroppers are present. This work will use extensive Monte Carlo simulations to assess important performance metrics such as energy efficiency, secrecy outage probability, and secrecy sum rate under different system parameters (e.g., number of antennas, Signal-to-Noise Ratio (SNR), power allocation). The results aim to provide comparative insight into the strengths and limitations of different PLS strategies and to highlight open research directions to design scalable, energy-efficient, and robust secure transmission techniques in future 6G networks.

2601.15004 2026-01-22 eess.SP

Alternative Shapes of Modulation Schemes Detailed Exposition and Simulation Methodology

Nipun Agarwal

详情
英文摘要

Modulation constellation design is a core challenge in digital communications, especially under stringent demands on spectral efficiency, robustness, and energy consumption. Classical schemes like PSK and QAM, while analytically tractable, often lose optimality under realistic channels and nonlinear hardware constraints. This paper provides a unified study of constellation design from geometric, probabilistic, optimization, and machine learning perspectives, focusing on symbol error rate (SER), fading robustness, peak-to-average power ratio (PAPR), and energy efficiency. We evaluate classical, lattice-based, asymmetric, probabilistically shaped, Golden Angle, heuristic-optimized, and machine learning assisted constellations under AWGN and Rayleigh fading via large-scale Monte Carlo simulations. Incorporating PAPR-aware and power amplifier models reveals that SER-optimal designs are not always energy-optimal; small SER trade-offs can yield substantial energy savings. Machine learning approaches offer flexible joint optimization of reliability, robustness, and energy efficiency by embedding channel and hardware constraints into the learning objective.

2601.14997 2026-01-22 eess.IV cs.CV

Filtered 2D Contour-Based Reconstruction of 3D STL Model from CT-DICOM Images

K. Punnam Chandar, Y. Ravi Kumar

Comments 8 pages, 18 figures

详情
英文摘要

Reconstructing a 3D Stereo-lithography (STL) Model from 2D Contours of scanned structure in Digital Imaging and Communication in Medicine (DICOM) images is crucial to understand the geometry and deformity. Computed Tomography (CT) images are processed to enhance the contrast, reduce the noise followed by smoothing. The processed CT images are segmented using thresholding technique. 2D contour data points are extracted from segmented CT images and are used to construct 3D STL Models. The 2D contour data points may contain outliers as a result of segmentation of low resolution images and the geometry of the constructed 3D structure deviate from the actual. To cope with the imperfections in segmentation process, in this work we propose to use filtered 2D contour data points to reconstruct 3D STL Model. The filtered 2D contour points of each image are delaunay triangulated and joined layer-by-layer to reconstruct the 3D STL model. The 3D STL Model reconstruction is verified on i) 2D Data points of basic shapes and ii) Region of Interest (ROI) of human pelvic bone and are presented as case studies. The 3D STL model constructed from 2D contour data points of ROI of segmented pelvic bone with and without filtering are presented. The 3D STL model reconstructed from filtered 2D data points improved the geometry of model compared to the model reconstructed without filtering 2D data points.

2601.14984 2026-01-22 eess.SY cs.SY

Stealthy bias injection attack detection based on Kullback-Leibler divergence in stochastic linear systems

Jingwei Dong, André M. H. Teixeira

Comments 26 pages, 10 figures

详情
英文摘要

This paper studies the design of detection observers against stealthy bias injection attacks in stochastic linear systems under Gaussian noise, considering adversaries that exploit noise and inject crafted bias signals into a subset of sensors in a slow and coordinated manner, thereby achieving malicious objectives while remaining stealthy. To address such attacks, we formulate the observer design as a max-min optimization problem to enhance the detectability of worst-case BIAs, which attain a prescribed attack impact with the least detectability evaluated via Kullback-Leibler divergence. To reduce the computational complexity of the derived non-convex design problem, we consider the detectability of worst-case BIAs at three specific time instants: attack onset, one step after attack occurrence, and the steady state. We prove that the Kalman filter is optimal for maximizing the BIA detectability at the attack onset, regardless of the subset of attacked sensors. For the one-step and steady-state cases, the observer design problems are approximated by bi-convex optimization problems, which can be efficiently solved using alternating optimization and alternating direction method of multipliers. Moreover, more tractable linear matrix inequality relaxations are developed. Finally, the effectiveness of the proposed stealth-aware detection framework is demonstrated through an application to a thermal system.

2601.14960 2026-01-22 cs.SD eess.AS

VCNAC: A Variable-Channel Neural Audio Codec for Mono, Stereo, and Surround Sound

Florian Grötschla, Arunasish Sen, Alessandro Lombardi, Guillermo Cámbara, Andreas Schwarz

Comments Submitted to EUSIPCO 2026

详情
英文摘要

We present VCNAC, a variable channel neural audio codec. Our approach features a single encoder and decoder parametrization that enables native inference for different channel setups, from mono speech to cinematic 5.1 channel surround audio. Channel compatibility objectives ensure that multi-channel content maintains perceptual quality when decoded to fewer channels. The shared representation enables training of generative language models on a single set of codebooks while supporting inference-time scalability across modalities and channel configurations. Evaluation using objective spatial audio metrics and subjective listening tests demonstrates that our unified approach maintains high reconstruction quality across mono, stereo, and surround audio configurations.

2601.14942 2026-01-22 cs.LG eess.SP

Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

Hang Zhao, Hongru Li, Dongfang Xu, Shenghui Song, Khaled B. Letaief

详情
英文摘要

Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the \emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, we propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency while maintaining robustness over wireless channels. In Stage~I, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange, thereby reducing the communication cost. In Stage~II, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading. In Stage~III, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting. Experiments on RGB--depth indoor scene classification show that the proposed framework attains higher accuracy with far fewer training communication rounds and remains robust to modality degradation or channel variation, outperforming existing self-supervised and fully supervised baselines.

2601.14933 2026-01-22 math.NA cs.NA cs.SY eess.SY

Rank-one Riemannian Subspace Descent for Nonlinear Matrix Equations

Yogesh Darmwal, Ketan Rajawat

详情
英文摘要

We propose a rank-one Riemannian subspace descent algorithm for computing symmetric positive definite (SPD) solutions to nonlinear matrix equations arising in control theory, dynamic programming, and stochastic filtering. For solution matrices of size $n\times n$, standard approaches for dense matrix equations typically incur $\mathcal{O}(n^3)$ cost per-iteration, while the efficient $\mathcal{O}(n^2)$ methods either rely on sparsity or low-rank solutions, or have iteration counts that scale poorly. The proposed method entails updating along the dominant eigen-component of a transformed Riemannian gradient, identified using at most $\mathcal{O}(\log(n))$ power iterations. The update structure also enables exact step-size selection in many cases at minimal additional cost. For objectives defined as compositions of standard matrix operations, each iteration can be implemented using only matrix--vector products, yielding $\mathcal{O}(n^2)$ arithmetic cost. We prove an $\mathcal{O}(n)$ iteration bound under standard smoothness assumptions, with improved bounds under geodesic strong convexity. Numerical experiments on large-scale CARE, DARE, and other nonlinear matrix equations show that the proposed algorithm solves instances (up to $n=10{,}000$ in our tests) for which the compared solvers, including MATLAB's \texttt{icare}, structure-preserving doubling, and subspace-descent baselines fail to return a solution. These results demonstrate that rank-one manifold updates provide a practical approach for high-dimensional and dense SPD-constrained matrix equations. MATLAB code implementation is publicly available on GitHub : \href{https://github.com/yogeshd-iitk/nonlinear_matrix_equation_R1RSD}{\textcolor{blue}{https://github.com/yogeshd-iitk/nonlinear\_matrix \_equation\_R1RSD}}

2601.14881 2026-01-22 eess.SP

Analysis of Sensing in OFDM-based ISAC under the Influence of Sampling Jitter

Lucas Giroto, Ândrei Camponogara, Yueheng Li, Jiayi Chen, Lukas Sigg, Thomas Zwick, Benjamin Nuss

详情
英文摘要

To enable integrated sensing and communication (ISAC) in cellular networks, a wide range of additional requirements and challenges are either imposed or become more critical. One such impairment is sampling jitter (SJ), which arises due to imperfections in the sampling instants of the clocks of digital-to-analog converters (DACs) and analog-to-digital converters (ADCs). While SJ is already well studied for communication systems based on orthogonal frequency-division multiplexing (OFDM), which is expected to be the waveform of choice for most sixth-generation (6G) scenarios where ISAC could be possible, the implications of SJ on the OFDM-based radar sensing must still be thoroughly analyzed. Considering that phase-locked loop (PLL)-based oscillators are used to derive sampling clocks, which leads to colored SJ, i.e., SJ with non-flat power spectral density, this article analyzes the resulting distortion of the adopted digital constellation modulation and sensing performance in OFDM-based ISAC for both baseband (BB) and bandpass (BP) sampling strategies and different oversampling factors. For BB sampling, it is seen that SJ induces intercarrier interference (ICI), while for BP sampling, it causes carrier phase error and more severe ICI due to a phase noise-like effect at the digital intermediate frequency. Obtained results for a single-input single-output OFDM-based ISAC system with various OFDM signal parameterizations demonstrate that SJ-induced degradation becomes non-negligible for both BB and BP sampling only for root mean square (RMS) SJ values above 10^-11 s at both DAC and ADC, which corresponds to 0.5*10^-2 times the considered critical sampling period without oversampling. Based on the achieved results, it can be concluded that state-of-the-art hardware enables sufficient communication and sensing robustness against SJ, as RMS SJ values in the femtosecond range can be achieved.

2601.14880 2026-01-22 eess.SY cs.SY

Contingency Planning for Safety-Critical Autonomous Vehicles: A Review and Perspectives

Lei Zheng, Luyao Zhang, Peiqi Yu, Yifan Sun, Sergio Grammatico, Jun Ma, Changliu Liu

Comments 23 pages, 6 figures

详情
英文摘要

Contingency planning is the architectural capability that enables autonomous vehicles (AVs) to anticipate and mitigate discrete, high-impact hazards, such as sensor outages and adversarial interactions. This paper presents a comprehensive survey of the field, synthesizing fragmented literature into a unified logic-conditioned hybrid control framework. Within this formalism, we categorize approaches into two distinct paradigms: Reactive Safety, which responds to realized hazards by enforcing safety constraints or executing fail-safe maneuvers; and Proactive Safety, which optimizes for future recourse by branching over potential modal transitions. In addition, we propose a fine-grained taxonomy that partitions the landscape into external contingencies (environmental and interactive hazards) and internal contingencies (system faults). Through a critical comparative analysis, we reveal a fundamental structural divergence: internal faults are predominantly addressed via reactive fail-safe mechanisms, whereas external interaction uncertainties increasingly require proactive branching strategies. Furthermore, we identify a critical methodological divergence: whereas physical hazards are typically managed with formal guarantees, semantic and out-of-distribution anomalies currently rely heavily on empirical validation. We conclude by identifying the open challenges in bridging the gap between theoretical guarantees and practical validation, advocating for hybrid architectures and standardized benchmarking to transition contingency planning from formulation to certifiable real-world deployment.

2601.14820 2026-01-22 eess.SP

Absorption mode broadband 2D MS for proteomics and metabolomics

Maria A van Agthoven, Marek Polák, Jan Fiala, Claude Nelcy Ounounou, Petr Halada, Michael Palasser, Anne Briot-Dietsch, Alan Kádek, Kathrin Breuker, Petr Novák, Carlos Afonso, Marc-André Delsuc

详情
英文摘要

Two-dimensional mass spectrometry (2D MS) is a method for tandem mass spectrometry that enables the correlation between precursor and fragment ions without the need for ion isolation. On a Fourier transform ion cyclotron resonance mass spectrometer, the phase correction functions for absorption mode data processing were found to be linear in the precursor ion dimension and quadratic in the fragment ion dimension. Absorption mode data processing on limited data sets has previously shown improvements in signal-to-noise ratio and resolving power by a factor of 2. Here, we have expanded absorption mode data processing to 2D mass spectra regardless of size and frequency range. We have applied absorption mode 2D MS to top-down analysis of variously oxidized ubiquitin proteoforms generated by fast photochemical oxidation of proteins (FPOP) and to an extract of ergot alkaloids. We show that absorption mode data processing significantly improves both the signal-to-noise ratio and the resolving power of the 2D mass spectrum compared to standard magnitude mode in terms of sequence coverage in top-down proteomics, as well as the accuracy of precursor-fragment correlation in metabolomics.

2601.14793 2026-01-22 eess.IV

LiNUS: Lightweight Automatic Segmentation of Deep Brain Nuclei for Real-Time DBS Surgery

Shuo Zhang, Zihua Wang, Changgeng He, Chunhua Hu

Comments 6 pages, 9 figures

详情
英文摘要

This paper proposes LiNUS, a lightweight deep learning framework for the automatic segmentation of the Subthalamic Nucleus (STN) in Deep Brain Stimulation (DBS) surgery. Addressing the challenges of small target volume and class imbalance in MRI data, LiNUS improves upon the U-Net architecture by introducing spectral normalization constraints, bilinear interpolation upsampling, and a multi-scale feature fusion mechanism. Experimental results on the Tsinghua DBS dataset (TT14) demonstrate that LiNUS achieves a Dice coefficient of 0.679 with an inference time of only 0.05 seconds per subject, significantly outperforming traditional manual and registration-based methods. Further validation on high-resolution data confirms the model's robustness, achieving a Dice score of 0.89. A dedicated Graphical User Interface (GUI) was also developed to facilitate real-time clinical application.

2601.14783 2026-01-22 eess.SP

Integrated Sensing, Communication and Control enabled Agile UAV Swarm

Zhiqing Wei, Yucong Du, Zhiyong Feng, Haotian Liu, Yanpeng Cui, Tao Zhang, Ying Zhou, Huici Wu

详情
英文摘要

Uncrewed aerial vehicle (UAV) swarms are pivotal in the applications such as disaster relief, aerial base station (BS) and logistics transportation. These scenarios require the capabilities in accurate sensing, efficient communication and flexible control for real-time and reliable task execution. However, sensing, communication and control are studied independently in traditional research, which limits the overall performance of UAV swarms. To overcome this disadvantage, we propose a deeply coupled scheme of integrated sensing, communication and control (ISCC) for UAV swarms, which is a systemic paradigm that transcends traditional isolated designs of sensing, communication and control by establishing a tightly-coupled closed-loop through the co-optimization of sensing, communication and control. In this article, we firstly analyze the requirements of scenarios and key performance metrics. Subsequently, the enabling technologies are proposed, including communication-and-control-enhanced sensing, sensing-and-control-enhanced communication, and sensing-and-communication-enhanced control. Simulation results validate the performance of the proposed ISCC framework, demonstrating its application potential in the future.

2601.14763 2026-01-22 quant-ph cs.SY eess.SY

Blended Dynamics and Emergence in Open Quantum Networks

Qinghao Wen, Zihao Ren, Lei Wang, Hyungbo Shim, Guodong Shi

详情
英文摘要

In this paper, we develop a blended dynamics framework for open quantum networks with diffusive couplings. The network consists of qubits interconnected through Hamiltonian couplings, environmental dissipation, and consensus-like diffusive interactions. Such networks commonly arise in spontaneous emission processes and non-Hermitian quantum computing, and their evolution follows a Lindblad master equation. Blended dynamics theory is well established in the classical setting as a tool for analyzing emergent behaviors in heterogeneous networks with diffusive couplings. Its key insight is to blend the local dynamics rather than the trajectories of individual nodes. Perturbation analysis then shows that, under sufficiently strong coupling, all node trajectories tend to stay close to those of the blended system over time. We first show that this theory extends naturally to the reduced-state dynamics of quantum networks, revealing classical-like clustering phenomena in which qubits converge to a shared equilibrium or a common trajectory determined by the quantum blended reduced-state dynamics. We then extend the analysis to qubit coherent states using quantum Laplacians and induced graphs, proving orbit attraction of the network density operator toward the quantum blended coherent dynamics, establishing the emergence of intrinsically quantum and dynamically clustering behaviors. Finally, numerical examples validate the theoretical results.

2601.14744 2026-01-22 cs.SD eess.AS

Unlocking Large Audio-Language Models for Interactive Language Learning

Hongfu Liu, Zhouying Cui, Xiangming Gu, Ye Wang

Comments Accepted to the Findings of EACL 2026

详情
英文摘要

Achieving pronunciation proficiency in a second language (L2) remains a challenge, despite the development of Computer-Assisted Pronunciation Training (CAPT) systems. Traditional CAPT systems often provide unintuitive feedback that lacks actionable guidance, limiting its effectiveness. Recent advancements in audio-language models (ALMs) offer the potential to enhance these systems by providing more user-friendly feedback. In this work, we investigate ALMs for chat-based pronunciation training by introducing L2-Arctic-plus, an English dataset with detailed error explanations and actionable suggestions for improvement. We benchmark cascaded ASR+LLMs and existing ALMs on this dataset, specifically in detecting mispronunciation and generating actionable feedback. To improve the performance, we further propose to instruction-tune ALMs on L2-Arctic-plus. Experimental results demonstrate that our instruction-tuned models significantly outperform existing baselines on mispronunciation detection and suggestion generation in terms of both objective and human evaluation, highlighting the value of the proposed dataset.

2601.14728 2026-01-22 eess.AS cs.AI cs.CL cs.LG cs.SD

AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering

Chun-Yi Kuan, Kai-Wei Chang, Hung-yi Lee

Comments Manuscript in progress

详情
英文摘要

Although text-to-audio generation has made remarkable progress in realism and diversity, the development of evaluation metrics has not kept pace. Widely-adopted approaches, typically based on embedding similarity like CLAPScore, effectively measure general relevance but remain limited in fine-grained semantic alignment and compositional reasoning. To address this, we introduce AQAScore, a backbone-agnostic evaluation framework that leverages the reasoning capabilities of audio-aware large language models (ALLMs). AQAScore reformulates assessment as a probabilistic semantic verification task; rather than relying on open-ended text generation, it estimates alignment by computing the exact log-probability of a "Yes" answer to targeted semantic queries. We evaluate AQAScore across multiple benchmarks, including human-rated relevance, pairwise comparison, and compositional reasoning tasks. Experimental results show that AQAScore consistently achieves higher correlation with human judgments than similarity-based metrics and generative prompting baselines, showing its effectiveness in capturing subtle semantic inconsistencies and scaling with the capability of underlying ALLMs.

2601.14725 2026-01-22 eess.SY cs.SY

Differential Privacy on Affine Manifolds: Geometrically Confined Privacy in Linear Dynamical Systems

Zihao Ren, Lei Wang, Deming Yuan, Guodong Shi

详情
英文摘要

In this paper, we present a comprehensive framework for differential privacy over affine manifolds and validate its usefulness in the contexts of differentially private cloud-based control and average consensus. We consider differential privacy mechanisms for linear queries when the input data are constrained to lie on affine manifolds, a structural property that is assumed to be available as prior knowledge to adversaries. In this setting, the definition of neighborhood adjacency must be formulated with respect to the intrinsic geometry of the manifolds. We demonstrate that such affine-manifold constraints can fundamentally alter the attainable privacy levels relative to the unconstrained case. In particular, we derive necessary and sufficient conditions under which differential privacy can be realized via structured noise injection mechanisms, wherein correlated Gaussian or Laplace noise distributions, rather than i.i.d. perturbations, are calibrated to the dataset. Based on these characterizations, we develop explicit noise calibration procedures that guarantee the tight realization of any prescribed privacy budget with a matching noise magnitude. Finally, we show that the proposed framework admits direct applications to linear dynamical systems ranging from differentially private cloud-based control to privacy-preserving average consensus, all of which naturally involve affine-manifold constraints. The established theoretical results are illustrated through numerical examples.

2601.14721 2026-01-22 eess.AS

NLP-Based Review for Toxic Comment Detection Tailored to the Chinese Cyberspace

Ruixing Ren, Junhui Zhao, Xiaoke Sun, Qiuping Li

Comments 20 pages, 6 figures. This review focuses on toxic comment detection in Chinese cyberspace

详情
英文摘要

With the in-depth integration of mobile Internet and widespread adoption of social platforms, user-generated content in the Chinese cyberspace has witnessed explosive growth. Among this content, the proliferation of toxic comments poses severe challenges to individual mental health, community atmosphere and social trust. Owing to the strong context dependence, cultural specificity and rapid evolution of Chinese cyber language, toxic expressions are often conveyed through complex forms such as homophones and metaphors, imposing notable limitations on traditional detection methods. To address this issue, this review focuses on the core topic of natural language processing based toxic comment detection in the Chinese cyberspace, systematically collating and critically analyzing the research progress and key challenges in this field. This review first defines the connotation and characteristics of Chinese toxic comments, and analyzes the platform ecology and transmission mechanisms they rely on. It then comprehensively reviews the construction methods and limitations of existing public datasets, and proposes a novel fine-grained and scalable framework for toxic comment definition and classification, along with corresponding data annotation and quality assessment strategies. We systematically summarize the evolutionary path of detection models from traditional methods to deep learning, with special emphasis on the importance of interpretability in model design. Finally, we thoroughly discuss the open challenges faced by current research and provide forward-looking suggestions for future research directions.

2601.14704 2026-01-22 eess.SY cs.SY

Hierarchical Optimization Based Multi-objective Dynamic Regulation Scheme for VANET Topology

Ruixing Ren, Minqi Tao, Junhui Zhao, Xiaoke Sun, Qiuping Li

Comments 10 pages, 6 figures. A topology optimization strategy is proposed in this paper to optimize the latency, average path length, and throughput in Vehicular Ad Hoc Networks (VANETs)

详情
英文摘要

As a core technology of intelligent transportation systems, vehicular ad-hoc networks support latency-sensitive services such as safety warning and cooperative perception via vehicle-to-everything communications. However, their highly dynamic topology increases average path length, raises latency, and reduces throughput, severely limiting communication performance. Existing topology optimization methods lack capabilities in multi-objective coordination, dynamic adaptation, and global-local synergy. To address this, this paper proposes a two-layer dynamic topology regulation scheme combining local feature aggregation and global adjustment. The scheme constructs a dynamic multi-objective optimization model integrating average path length, end-to-end latency, and network throughput, and achieves multi-index coordination via link adaptability metrics and a dynamic normalization mechanism. it quickly responds to local link changes via feature fusion of local node feature extraction and dynamic neighborhood sensing, and balances optimization accuracy and real-time performance using a dual-mode adaptive solving strategy for global topology adjustment. It reduces network oscillation risks by introducing a performance improvement threshold and a topology validity verification mechanism. Simulation results on real urban road networks via the SUMO platform show that the proposed scheme outperforms traditional methods in average path length (stabilizing at ~4 hops), end-to-end latency (remaining ~0.01 s), and network throughput.