arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.20165 2026-03-23 cs.SD eess.AS

Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio

Candice R. Gerstner

详情
英文摘要

With the advancements in AI speech synthesis, it is easier than ever before to generate realistic audio in a target voice. One only needs a few seconds of reference audio from the target, quite literally putting words in the target person's mouth. This imposes a new set of forensics-related challenges on speech-based authentication systems, videoconferencing, and audio-visual broadcasting platforms, where we want to detect synthetic speech. At the same time, leveraging AI speech synthesis can enhance the different modes of communication through features such as low-bandwidth communication and audio enhancements - leading to ever-increasing legitimate use-cases of synthetic audio. In this case, we want to verify if the synthesized voice is actually spoken by the user. This will require a mechanism to verify whether a given synthetic audio is driven by an authorized identity, or not. We term this task audio avatar fingerprinting. As a step towards audio forensics in these new and emerging situations, we analyze and extend an off-the-shelf speaker verification model developed outside of forensics context for the task of fake speech detection and audio avatar fingerprinting, the first experimentation of its kind. Furthermore, we observe that no existing dataset allows for the novel task of verifying the authorized use of synthetic audio - a limitation which we address by introducing a new speech forensics dataset for this novel task.

2603.20152 2026-03-23 eess.SY cs.SY

Robust Linear Quadratic Optimal Control of Cementitious Material Extrusion

Mandana Mohammadi Looey, Amrita Basak, Satadru Dey

详情
英文摘要

Extrusion-based 3D printing of cementitious materials enables fabrication of complex structures, however it is highly sensitive to disturbances, material property variations, and process uncertainties that decrease flow stability and dimensional fidelity. To address these challenges, this study proposes a robust linear quadratic optimal control framework for regulating material extrusion in cementitious direct ink writing systems. The printer is modeled using two coupled subsystems: an actuation system representing nozzle flow dynamics and a printing system describing the printed strand flow on the build plate. A hybrid control architecture combining sliding mode control for disturbance rejection with linear quadratic optimal feedback for energy-efficient tracking is developed to ensure robustness and optimality. In simulation case studies, the control architecture guarantees acceptable convergence of nozzle and strand flow tracking errors under bounded disturbances.

2603.20151 2026-03-23 cs.CE cs.AI cs.SY eess.SY

Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case

H. Sinan Bank, Daniel R. Herber, Thomas H. Bradley

Comments 2 figures, 11 pages, Submitted to ASME IDETC 2026 - DAC-09

详情
英文摘要

Engineering system design -- whether mechatronic, control, or embedded -- often proceeds in an ad hoc manner, with requirements left implicit and traceability from intent to parameters largely absent. Existing specification-driven and systematic design methods mostly target software, and AI-assisted tools tend to enter the workflow at solution generation rather than at problem framing. Human--AI collaboration in the design of physical systems remains underexplored. This paper presents Design-OS, a lightweight, specification-driven workflow for engineering system design organized in five stages: concept definition, literature survey, conceptual design, requirements definition, and design definition. Specifications serve as the shared contract between human designers and AI agents; each stage produces structured artifacts that maintain traceability and support agent-augmented execution. We position Design-OS relative to requirements-driven design, systematic design frameworks, and AI-assisted design pipelines, and demonstrate it on a control systems design case using two rotary inverted pendulum platforms -- an open-source SimpleFOC reaction wheel and a commercial Quanser Furuta pendulum -- showing how the same specification-driven workflow accommodates fundamentally different implementations. A blank template and the full design-case artifacts are shared in a public repository to support reproducibility and reuse. The workflow makes the design process visible and auditable, and extends specification-driven orchestration of AI from software to physical engineering system design.

2603.20144 2026-03-23 eess.SY cs.SY

Distributed State Estimation for Discrete-time LTI Systems: the Design Trilemma and a Novel Framework

Ruixuan Zhao, Guitao Yang, James Fleming, Boli Chen

详情
英文摘要

With the advancement of IoT technologies and the rapid expansion of cyber-physical systems, there is increasing interest in distributed state estimation, where multiple sensors collaboratively monitor large-scale dynamic systems. Compared with its continuous-time counterpart, a discrete-time distributed observer faces greater challenges, as it cannot exploit high-gain mechanisms or instantaneous communication. Existing approaches depend on three tightly coupled factors: (i) system observability, (ii) communication frequency and dimension of the exchanged information, and (iii) network connectivity. However, the interdependence among these factors remains underexplored. This paper identifies a fundamental trilemma among these factors and introduces a general design framework that balances them through an iterative semidefinite programming approach. As such, the proposed method mitigates the restrictive assumptions present in existing works. The effectiveness and generality of the proposed approach are demonstrated through a simulation example.

2603.20118 2026-03-23 eess.AS cs.SD

BioDCASE 2026 Challenge Baseline for Cross-Domain Mosquito Species Classification

Yuanbo Hou, Vanja Zdravkovic, Marianne Sinka, Yunpeng Li, Wenwu Wang, Mark D. Plumbley, Kathy Willis, Stephen Roberts

Comments BioDCASE 2026 CD-MSC Baseline, source code and models: https://github.com/Yuanbo2020/CD-MSC

详情
英文摘要

Mosquito-borne diseases affect more than one billion people each year and cause close to one million deaths. Traditional surveillance methods rely on traps and manual identification that are slow, labor-intensive, and difficult to scale. Audio-based mosquito monitoring offers a non-destructive, lower-cost, and more scalable complement to trap-based surveillance, but reliable species classification remains difficult under real-world recording conditions. Mosquito flight tones are narrow-band, often low in signal-to-noise ratio, and easily masked by background noise, and recordings for several epidemiologically relevant species remain limited, creating pronounced class imbalance. Variation across devices, environments, and collection protocols further increases the difficulty of robust classification. Such variation can cause models to rely on domain-specific recording artefacts rather than species-relevant acoustic cues, which makes transfer to new acquisition settings difficult. The BioDCASE 2026 Cross-Domain Mosquito Species Classification (CD-MSC) challenge is designed around this deployment problem by evaluating performance on both seen and unseen domains. This paper presents the official baseline system and evaluation pipeline as a simple, fully reproducible reference for the CD-MSC challenge task. The baseline uses log-mel features and a multitemporal resolution convolutional neural network (MTRCNN) with species and auxiliary domain outputs, together with complete training and test scripts. The baseline system performs strongly on seen domains but degrades markedly on unseen domains, showing that cross-domain generalisation, rather than within-domain recognition, is the central challenge for practical mosquito species classification from multi-source bioacoustic recordings.

2603.20077 2026-03-23 cs.CV cs.RO eess.IV

A Unified Platform and Quality Assurance Framework for 3D Ultrasound Reconstruction with Robotic, Optical, and Electromagnetic Tracking

Lewis Howell, Manisha Waterston, Tze Min Wah, James H. Chandler, James R. McLaughlan

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Three-dimensional (3D) Ultrasound (US) can facilitate diagnosis, treatment planning, and image-guided therapy. However, current studies rarely provide a comprehensive evaluation of volumetric accuracy and reproducibility, highlighting the need for robust Quality Assurance (QA) frameworks, particularly for tracked 3D US reconstruction using freehand or robotic acquisition. This study presents a QA framework for 3D US reconstruction and a flexible open source platform for tracked US research. A custom phantom containing geometric inclusions with varying symmetry properties enables straightforward evaluation of optical, electromagnetic, and robotic kinematic tracking for 3D US at different scanning speeds and insonation angles. A standardised pipeline performs real-time segmentation and 3D reconstruction of geometric targets (DSC = 0.97, FPS = 46) without GPU acceleration, followed by automated registration and comparison with ground-truth geometries. Applying this framework showed that our robotic 3D US achieves state-of-the-art reconstruction performance (DSC-3D = 0.94 +- 0.01, HD95 = 1.17 +- 0.12), approaching the spatial resolution limit imposed by the transducer. This work establishes a flexible experimental platform and a reproducible validation methodology for 3D US reconstruction. The proposed framework enables robust cross-platform comparisons and improved reporting practices, supporting the safe and effective clinical translation of 3D ultrasound in diagnostic and image-guided therapy applications.

2603.20072 2026-03-23 quant-ph cs.LG eess.SP

Antenna Array Beamforming Based on a Hybrid Quantum Optimization Framework

Shuai Zeng

详情
英文摘要

This paper proposes a hybrid quantum optimization framework for large-scale antenna-array beamforming with jointly optimized discrete phases and continuous amplitudes. The method combines quantum-inspired search with classical gradient refinement to handle mixed discrete-continuous variables efficiently. For phase optimization, a Gray-code and odd-combination encoding scheme is introduced to improve robustness and avoid the complexity explosion of higher-order Ising models. For amplitude optimization, a geometric spin-combination encoding and a two-stage strategy are developed, using quantum-inspired optimization for coarse search and gradient optimization for fine refinement. To enhance solution diversity and quality, a rainbow quantum-inspired algorithm integrates multiple optimizers for parallel exploration, followed by hierarchical-clustering-based candidate refinement. In addition, a double outer-product method and an augmented version are proposed to construct the coupling matrix and bias vector efficiently, improving numerical precision and implementation efficiency. Under the scoring rules of the 7th National Quantum Computing Hackathon, simulations on a 32-element antenna array show that the proposed method achieves a score of 461.58 under constraints on near-main-lobe sidelobes, wide-angle sidelobes, beamwidth, and optimization time, nearly doubling the baseline score. The proposed framework provides an effective reference for beamforming optimization in future wireless communication systems.

2603.20067 2026-03-23 eess.SY cs.SY

Grid-Constrained Smart Charging of Large EV Fleets: Comparative Study of Sequential DP and a Full Fleet Solver

Ipek Kuvvetli, Christofer Sundström, Sogol Kharrazi, Erik Frisk

详情
英文摘要

This paper presents a comparative optimization framework for smart charging of electrified vehicle fleets. Using heuristic sequential dynamic programming (SeqDP), the framework minimizes electricity costs while adhering to constraints related to the power grid, charging infrastructure, vehicle availability, and simple considerations of battery aging. Based on real-world operational data, the model incorporates discrete energy states, time-varying tariffs, and state-of-charge (SoC) targets to deliver a scalable and cost-effective solution. Classical DP approach suffers from exponential computational complexity as the problem size increases. This becomes particularly problematic when conducting monthly-scale analyses aimed at minimizing peak power demand across all vehicles. The extended time horizon, coupled with multi-state decision-making, renders exact optimization impractical at larger scales. To address this, a heuristic method is employed to enable systematic aggregation and tractable computation for the Non-Linear Programming (NLP) problem. Rather than seeking a globally optimal solution, this study focuses on a time-efficient smart charging strategy that aims to minimize energy cost while flattening the overall power profile. In this context, a sequential heuristic DP approach is proposed. Its performance is evaluated against a full-fleet solver using Gurobi, a widely used commercial solver in both academia and industry. The proposed algorithm achieves a reduction of the overall cost and peak power by more than 90% compared to uncontrolled schedules. Its relative cost remains within 9\% of the optimal values obtained from the full-fleet solver, and its relative peak-power deviation stays below 15% for larger fleets.

2603.20048 2026-03-23 eess.SP cs.LG

Structured Latent Dynamics in Wireless CSI via Homomorphic World Models

Salmane Naoumi, Mehdi Bennis, Marwa Chafii

Comments ACCEPTED FOR PUBLICATION IN IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC) 2026

详情
英文摘要

We introduce a self-supervised framework for learning predictive and structured representations of wireless channels by modeling the temporal evolution of channel state information (CSI) in a compact latent space. Our method casts the problem as a world modeling task and leverages the Joint Embedding Predictive Architecture (JEPA) to learn action-conditioned latent dynamics from CSI trajectories. To promote geometric consistency and compositionality, we parameterize transitions using homomorphic updates derived from Lie algebra, yielding a structured latent space that reflects spatial layout and user motion. Evaluations on the DICHASUS dataset show that our approach outperforms strong baselines in preserving topology and forecasting future embeddings across unseen environments. The resulting latent space enables metrically faithful channel charts, offering a scalable foundation for downstream applications such as mobility-aware scheduling, localization, and wireless scene understanding.

2603.20045 2026-03-23 eess.IV cs.CV

Investigating a Policy-Based Formulation for Endoscopic Camera Pose Recovery

Jan Emily Mangulabnan, Akshat Chauhan, Laura Fleig, Lalithkumar Seenivasan, Roger D. Soberanis-Mukul, S. Swaroop Vedula, Russell H. Taylor, Masaru Ishii, Gregory D. Hager, Mathias Unberath

详情
英文摘要

In endoscopic surgery, surgeons continuously locate the endoscopic view relative to the anatomy by interpreting the evolving visual appearance of the intraoperative scene in the context of their prior knowledge. Vision-based navigation systems seek to replicate this capability by recovering camera pose directly from endoscopic video, but most approaches do not embody the same principles of reasoning about new frames that makes surgeons successful. Instead, they remain grounded in feature matching and geometric optimization over keyframes, an approach that has been shown to degrade under the challenging conditions of endoscopic imaging like low texture and rapid illumination changes. Here, we pursue an alternative approach and investigate a policy-based formulation of endoscopic camera pose recovery that seeks to imitate experts in estimating trajectories conditioned on the previous camera state. Our approach directly predicts short-horizon relative motions without maintaining an explicit geometric representation at inference time. It thus addresses, by design, some of the notorious challenges of geometry-based approaches, such as brittle correspondence matching, instability in texture-sparse regions, and limited pose coverage due to reconstruction failure. We evaluate the proposed formulation on cadaveric sinus endoscopy. Under oracle state conditioning, we compare short-horizon motion prediction quality to geometric baselines achieving lowest mean translation error and competitive rotational accuracy. We analyze robustness by grouping prediction windows according to texture richness and illumination change indicating reduced sensitivity to low-texture conditions. These findings suggest that a learned motion policy offers a viable alternative formulation for endoscopic camera pose recovery.

2603.20027 2026-03-23 eess.SY cs.SY

Predictor-Feedback Stabilization of Linear Switched Systems with State-Dependent Switching and Input Delay

Andreas Katsanikakis, Nikolaos Bekiaris-Liberis, Delphine Bresch-Pietri

Comments 6 pages, 3 figures, submitted to European Control Conference 2026 (ECC)

详情
英文摘要

We develop a predictor-feedback control design for a class of linear systems with state-dependent switching. The main ingredient of our design is a novel construction of an exact predictor state. Such a construction is possible as for a given, state-dependent switching rule, an implementable formula for the predictor state can be derived in a way analogous to the case of nonlinear systems with input delay. We establish uniform exponential stability of the corresponding closed-loop system via a novel construction of multiple Lyapunov functionals, relying on a backstepping transformation that we introduce. We validate our design in simulation considering a switching rule motivated by communication networks.

2603.20011 2026-03-23 eess.SP

Performance Analysis and Optimization of FAS-ARIS Communications for 6G: System Modeling and Analytical Insights

Hong-Bae Jeon, Kai-Kit Wong, Chan-Byoung Chae

详情
英文摘要

This paper introduces a unified analytical and optimization framework for fluid antenna system-active reconfigurable intelligent surface (FAS-ARIS) communications in 6G. By combining the port reconfigurability of FAS with the signal amplification of ARIS, the proposed design enables more flexible control of the propagation environment and enhanced link reliability beyond what passive solutions can offer. We first derive the optimal ARIS amplification gain under a reflection power constraint to maximize the user's signal-to-noise ratio (SNR). Using a block-diagonal matrix approximation, we obtain a tractable outage expression and a tight independent-antenna equivalent upper-bound. Building on this, we establish the monotonic relationship between outage and effective channel gain, which enables a closed-form solution for ARIS phase optimization under limited channel state information (CSI). To further improve spectral efficiency, we propose a region-partitioned throughput optimization framework that achieves near-optimal performance without exhaustive search, thereby verifying its low computational complexity. Extensive simulations confirm the accuracy of the analysis and demonstrate consistent gains in outage and throughput compared to baselines.

2603.19999 2026-03-23 eess.SP cs.IT math.IT

NCR vs. Passive/Active RIS: How Much NCR Amplification is Required to Beat RIS?

Özlem Tuğfe Demir, Ozan Alp Topal, Cicek Cavdar, Emil Björnson

Comments 13 pages, 10 figures, submitted to IEEE journal

详情
英文摘要

This paper investigates the fundamental tradeoff between reconfigurable intelligent surfaces (RISs) and network-controlled repeaters (NCRs) in terms of achievable signal-to-noise ratio (SNR). Considering an uplink system with a multi-antenna base station (BS) and a single-antenna user equipment (UE), we derive closed-form SNR expressions for passive RIS-, active RIS-, and NCR-assisted communication under line-of-sight propagation between the BS-RIS/NCR and RIS/NCR-UE. Both narrowband and wideband transmissions are analyzed, with and without the presence of a direct BS--UE link. Our analysis reveals a key structural difference: while the SNR achieved with RISs grows unboundedly with the number of RIS elements, the SNR provided by an NCR is fundamentally limited by the UE--repeater channel due to noise amplification. Nevertheless, we show that NCRs can outperform both passive and active RISs when deployed close to the UE, provided that sufficient amplification is available. Numerical results based on realistic path loss models quantify the amplification levels required for NCRs to outperform RISs across different deployment geometries and system dimensions. These findings provide clear design guidelines for the practical integration of RISs and NCRs in future wireless networks.

2603.19994 2026-03-23 cs.CV cs.LG eess.IV eess.SP

Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts

John Turnbull, Shivam Grover, Amin Jalali, Ali Etemad

Comments Accepted at ICASSP 2026

详情
英文摘要

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.

2603.19955 2026-03-23 math.OC cs.LG cs.SI cs.SY eess.SY

Structural Controllability of Large-Scale Hypergraphs

Joshua Pickard, Xin Mao, Can Chen

Comments 14 pages, 4 figures, 1 table

详情
英文摘要

Controlling real-world networked systems, including ecological, biomedical, and engineered networks that exhibit higher-order interactions, remains challenging due to inherent nonlinearities and large system scales. Despite extensive studies on graph controllability, the controllability properties of hypergraphs remain largely underdeveloped. Existing results focus primarily on exact controllability, which is often impractical for large-scale hypergraphs. In this article, we develop a structural controllability framework for hypergraphs by modeling hypergraph dynamics as polynomial dynamical systems. In particular, we extend classical notions of accessibility and dilation from linear graph-based systems to polynomial hypergraph dynamics and establish a hypergraph-based criterion under which the topology guarantees satisfaction of classical Lie-algebraic and Kalman-type rank conditions for almost all parameter choices. We further derive a topology-based lower bound on the minimum number of driver nodes required for structural controllability and leverage this bound to design a scalable driver node selection algorithm combining dilation-aware initialization via maximum matching with greedy accessibility expansion. We demonstrate the effectiveness and scalability of the proposed framework through numerical experiments on hypergraphs with tens to thousands of nodes and higher-order interactions.

2603.19952 2026-03-23 eess.SY cs.SY

On the Capacity of Future Lane-Free Urban Infrastructure

Patrick Malcolm, Klaus Bogenberger

Comments 9 pages, 8 figures, submitted to IEEE Transactions on Intelligent Transportation Systems

详情
英文摘要

In this paper, the potential capacity and spatial efficiency of future autonomous lane-free traffic in urban environments are explored using a combination of analytical and simulation-based approaches. For lane-free roadways, a simple analytical approach is employed, which shows not only that lane-free traffic offers a higher capacity than lane-based traffic for the same street width, but also that the relationship between capacity and street width is continuous under lane-free traffic. To test the potential capacity and properties of lane-free signal-free intersections (automated intersection management), two approaches were simulated and compared, including a novel approach which we call OptWULF. This approach uses a multi-agent conflict-based search approach with a low-level planner that uses a combination of optimization and simple window-based reservation. With these simulations, we confirm the continuous relationship between capacity and street width for intersection scenarios. We also show that OptWULF results in an even utilization of the entire drivable area of the street and intersection area. Furthermore, we show that OptWULF is capable of handling asymmetric demand patterns without any substantial loss in capacity compared to symmetric demand patterns.

2603.19950 2026-03-23 eess.SP

Reduced-Overhead Channel Estimation and Iterative Detection of FTN Signaling Based on Pilot Superimposition and Spectral Interference Alignment

Yuchen Wu, Shinya Sugiura

Comments 6 pages, 3 figures, IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, 8-12 Dec. 2025, pp. 5820-5825

详情
英文摘要

This paper proposes low-overhead and low-complexity channel estimation (CE) of frequency-domain equalization aided faster-than-Nyquist (FTN) signaling. In the proposed CE scheme, the concept of pilot superimposition is employed, where the FTN block is designed to superimpose pilot symbols with information symbols, and thus, no dedicated time and frequency resources nor guard bands are required, resulting in a 50% reduction of the overhead. Furthermore, interference induced by the pilot superimposition is eliminated by invoking a novel scheme, referred to as spectral interference alignment, where a data-dependent sequence is subtracted from transmitted information symbols. The theoretical mean-square error (MSE) of the proposed CE is derived, which verifies that the MSE is no longer affected by interference due to the pilot superimposition.

2603.19925 2026-03-23 eess.IV cs.CV

ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis

Lubin Gan, Jing Zhang, Heng Zhang, Xin Di, Zhifeng Wang, Wenke Huang, Xiaoyan Sun

详情
英文摘要

Whole slide image (WSI) analysis heavily relies on multiple instance learning (MIL). While recent methods benefit from large-scale foundation models and advanced sequence modeling to capture long-range dependencies, they still struggle with two critical issues. First, directly applying frozen, task-agnostic features often leads to suboptimal separability due to the domain gap with specific histological tasks. Second, relying solely on global aggregators can cause over-smoothing, where sparse but critical diagnostic signals are overshadowed by the dominant background context. In this paper, we present ReconMIL, a novel framework designed to bridge this domain gap and balance global-local feature aggregation. Our approach introduces a Latent Space Reconstruction module that adaptively projects generic features into a compact, task-specific manifold, improving boundary delineation. To prevent information dilution, we develop a bi-stream architecture combining a Mamba-based global stream for contextual priors and a CNN-based local stream to preserve subtle morphological anomalies. A scale-adaptive selection mechanism dynamically fuses these two streams, determining when to rely on overall architecture versus local saliency. Evaluations across multiple diagnostic and survival prediction benchmarks show that ReconMIL consistently outperforms current state-of-the-art methods, effectively localizing fine-grained diagnostic regions while suppressing background noise. Visualization results confirm the models superior ability to localize diagnostic regions by effectively balancing global structure and local granularity.

2603.19903 2026-03-23 cs.IT eess.SP math.IT

Repeater-Aided Over-the-Air Phase Synchronization in Distributed MIMO

Unnikrishnan Kunnath Ganesan, Sai Subramanyam Thoota, Erik G. Larsson

Comments Accepted for presentation at IEEE VTC2026-Spring; 5 pages; 2 figures

详情
英文摘要

Phase synchronization of access points (APs) in a distributed multiple-input multiple-output (D-MIMO) system is critical to leverage the performance benefits of D-MIMO. Existing over-the-air phase synchronization methods assume that APs can communicate directly to perform necessary measurements. However, this assumption might not hold in scenarios where inter-AP signaling is too weak for effective communication. To address this, in this paper, we propose a novel over-the-air calibration scheme that uses repeater nodes to facilitate phase synchronization when direct AP signaling is infeasible. We give the steps of the algorithm for phase calibration in closed form, and show how it enables coherent joint transmission (CJT) by the APs. The framework expands the applicability of D-MIMO systems to challenging environments, where existing over-the-air synchronization techniques fall short.

2603.19846 2026-03-23 eess.SP

Supervised Contrastive Learning Framework for Electroencephalography-based Air-writing Recognition

Anant Jain, Ayush Tripathi

详情
英文摘要

Electroencephalography (EEG) - based air-writing recognition offers a human-computer interaction paradigm by decoding neural activity associated with handwriting movements. Despite its potential, reliable EEG-based air-writing recognition remains challenging due to low signal-to-noise ratio and pronounced inter-subject variability. In this study, we examine the use of supervised contrastive learning to improve representation learning for EEG-based air-writing recognition. The analysis is conducted on preprocessed EEG signals and independent component analysis (ICA)-derived neural components obtained from five participants, with trials segmented from -1 to 2 s relative to movement on-set. EEGNet and DeepConvNet architectures are evaluated under both conventional cross-entropy training and a supervised contrastive learning framework using a subject-dependent five-fold cross-validation scheme. The results indicate that supervised contrastive learning consistently improves classification accuracy across architectures and feature representations. For preprocessed EEG signals, the mean accuracy increases from 33.45% to 43.77% and from 29.14% to 38.06% with EEGNet and DeepConvNet, respectively. Using ICA components, higher mean accuracies of 49.21% and 43.32% are achieved with EEGNet and DeepConvNet, respectively. These results suggest that the supervised contrastive learning framework offers an efficient extension to existing EEG-based air-writing recognition approaches.

2603.19831 2026-03-23 eess.AS cs.AI cs.MM

Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?

Lokesh Kumar, Nirmesh Shah, Ashishkumar P. Gudmalwar, Pankaj Wasnik

Comments Accepted at The 2nd International Workshop on Bodily Expressed Emotion Understanding (BEEU) at AAAI 2026 [non-archival]

详情
英文摘要

Human communication seamlessly integrates speech and bodily motion, where hand gestures naturally complement vocal prosody to express intent, emotion, and emphasis. While recent text-to-speech (TTS) systems have begun incorporating multimodal cues such as facial expressions or lip movements, the role of hand gestures in shaping prosody remains largely underexplored. We propose a novel multimodal TTS framework, Gesture2Speech, that leverages visual gesture cues to modulate prosody in synthesized speech. Motivated by the observation that confident and expressive speakers coordinate gestures with vocal prosody, we introduce a multimodal Mixture-of-Experts (MoE) architecture that dynamically fuses linguistic content and gesture features within a dedicated style extraction module. The fused representation conditions an LLM-based speech decoder, enabling prosodic modulation that is temporally aligned with hand movements. We further design a gesture-speech alignment loss that explicitly models their temporal correspondence to ensure fine-grained synchrony between gestures and prosodic contours. Evaluations on the PATS dataset show that Gesture2Speech outperforms state-of-the-art baselines in both speech naturalness and gesture-speech synchrony. To the best of our knowledge, this is the first work to utilize hand gesture cues for prosody control in neural speech synthesis. Demo samples are available at https://research.sri-media-analysis.com/aaai26-beeu-gesture2speech/

2603.19821 2026-03-23 eess.SP

Outlier-Resistant Fusion for Multi-static Positioning using 5G NR Signals

Maximiliano Rivera Figueroa, Jannis Held, Pradyumna Kumar Bishoyi, Marina Petrova

Comments 6 pages, 4 figures. Accepted for Publication in the IEEE ICC 2026 Conference

详情
英文摘要

Indoor positioning faces ongoing challenges due to complex propagation conditions, such as multipath propagation, signal blockages, and intrinsic target characteristics that substantially impact measurement reliability and positioning accuracy. Existing methods, in particular Least Squares (LS), frequently struggle to maintain robustness when confronted with unreliable observations caused by multipath interactions and extended targets. In this work, we propose an outlier-resistant algorithm designed to mitigate the impact of outlier measurements and accurately estimate the position of an extended target in multipath-rich environments. We develop a two-step algorithm in which an initial coarse position estimate is obtained using the angle-of-arrival (AoA) and subsequently refined using the Cauchy loss function to suppress outliers. The numerical results confirm that the proposed algorithm improves robustness and accuracy, outperforming existing benchmark methods, such as Iterative Reweighted Least Squares (IRLS), LS, and Huber loss function, and achieving a positioning error of less than $70$ cm in $90\%$ of cases. Its effectiveness in mitigating multipath effects is further assessed by comparing tracking performance in cluttered and empty room scenarios.

2603.19813 2026-03-23 eess.SY cs.SY math.OC

A Spectral Perspective on Stochastic Control Barrier Functions

Inkyu Jang, Chams E. Mballo, Claire J. Tomlin, H. Jin Kim

Comments 16 pages, 7 figures. This work has been submitted to the IEEE for possible publication

详情
英文摘要

Stochastic control barrier functions (SCBFs) provide a safety-critical control framework for systems subject to stochastic disturbances by bounding the probability of remaining within a safe set. However, synthesizing a valid SCBF that explicitly reflects the true safety probability of the system, which is the most natural measure of safety, remains a challenge. This paper addresses this issue by adopting a spectral perspective, utilizing the linear operator that governs the evolution of the closed-loop system's safety probability. We find that the dominant eigenpair of this Koopman-like operator encodes fundamental safety information of the stochastic system. The dominant eigenfunction is a natural and valid SCBF, with values that explicitly quantify the relative long-term safety of the state, while the dominant eigenvalue indicates the global rate at which the safety probability decays. A practical synthesis algorithm is proposed, termed power-policy iteration, which jointly computes the dominant eigenpair and an optimized backup policy. The method is validated using simulation experiments on safety-critical dynamics models.

2603.19746 2026-03-23 eess.SP

Codebook-Based Self-Sustainable RIS: Optimal Splitting Schemes and Power Allocation

Friedemann Laue, Sebastian Lotter, Nikita Shani, Robert Schober

详情
英文摘要

This paper studies the codebook-based configuration of a reconfigurable intelligent surface (RIS) that extends the coverage of a base station (BS) while utilizing energy harvesting to facilitate self-sustainable operation. For a given coverage area, we design a RIS codebook and propose a mathematical framework for analyzing the efficiency of three common energy harvesting schemes: power splitting (PS), element splitting (ES), and time splitting (TS). Thereby, we use a tile-based architecture at the RIS to exploit the advantages of both radio-frequency (RF) combining and direct-current (DC) combining. Moreover, we account for deterministic and random transmit signals for beam training and data transmission, respectively, and show their impact on the RF-DC conversion efficiencies at the rectifiers. Our main objective is to minimize the average transmit power at the BS by jointly optimizing the splitting ratio for the incident signal at the RIS and the power allocated to each RIS codeword. While the optimal power allocation is derived analytically, we show that the optimal splitting ratio can be determined by performing a grid search over a single optimization variable. Our performance evaluation reveals that the efficiency of the optimized splitting schemes depends on the adopted power consumption model and the number of tiles at the RIS. In particular, our results show that depending on the system parameters a different splitting scheme will achieve the lowest transmit power at the BS.

2603.19707 2026-03-23 eess.SP

LSTM-Based Power Delay Profile Predictions for Intra-Bus Wireless Propagation

Rajeev Shukla, Atharva Verma, Aniruddha Chandra, Ondrej Zeleny, Radek Zavorka, Jiri Blumenstein, Ales Prokes, Jaroslaw Wojtun, Jan M. Kelner, Cezary Ziolkowski, Domenico Ciuonzo

Comments 5 pages, 5 figures, 1 table

详情
Journal ref
2025 35th International Conference Radioelektronika (RADIOELEKTRONIKA), Hnanice, Czech Republic, 12-14 May 2025
英文摘要

Longlshort-term memory (LSTM) is a deep learning model that can capture long-term dependencies of wireless channel models and is highly adaptable to short-term changes in a wireless environment. This paper proposes a simple LSTM model to predict the channel transfer function (CTF) for a given transmitter-receiver location inside a bus for the 60 GHz millimetre wave band. The average error of the derived power delay profile (PDP) taps, obtained from the predicted CTFs, was less than 10% compared to the ground truth.

2603.19706 2026-03-23 eess.SP

A Deep Learning Approach to Multipath Component Detection in Power Delay Profiles

Ondrej Zeleny, Radek Zavorka, Ales Prokes, Tomas Fryza, Jaroslaw Wojtun, Jan M. Kelner, Cezary Ziolkowski, Aniruddha Chandra

Comments 5 pages, 4 figures, 2 tables

详情
Journal ref
2025 35th International Conference Radioelektronika (RADIOELEKTRONIKA), Hnanice, Czech Republic, 12-14 May 2025
英文摘要

Power Delay Profile (PDP) plays a crucial role in wireless communications, providing information on multipath propagation and signal strength variations over time. Accurate detection of peaks within PDP is essential to identify dominant signal paths, which are critical for tasks such as channel estimation, localization, and interference management. Traditional approaches to PDP analysis often struggle with noise, low resolution, and the inherent complexity of wireless environments. In this paper, we evaluate the application of traditional and modern deep learning neural networks to reconstruction-based anomaly detection to detect multipath components within the PDP. To further refine detection and robustness, a framework is proposed that combines autoencoders and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering. To compare the performance of individual models, a relaxed F1 score strategy is defined. The experimental results show that the proposed framework with transformer-based autoencoder shows superior performance both in terms of reconstruction and anomaly detection.

2603.19655 2026-03-23 cs.RO cs.SY eess.SY

Accurate Open-Loop Control of a Soft Continuum Robot Through Visually Learned Latent Representations

Henrik Krauss, Johann Licher, Naoya Takeishi, Annika Raatz, Takehisa Yairi

详情
英文摘要

This work addresses open-loop control of a soft continuum robot (SCR) from video-learned latent dynamics. Visual Oscillator Networks (VONs) from previous work are used, that provide mechanistically interpretable 2D oscillator latents through an attention broadcast decoder (ABCD). Open-loop, single-shooting optimal control is performed in latent space to track image-specified waypoints without camera feedback. An interactive SCR live simulator enables design of static, dynamic, and extrapolated targets and maps them to model-specific latent waypoints. On a two-segment pneumatic SCR, Koopman, MLP, and oscillator dynamics, each with and without ABCD, are evaluated on setpoint and dynamic trajectories. ABCD-based models consistently reduce image-space tracking error. The VON and ABCD-based Koopman models attains the lowest MSEs. Using an ablation study, we demonstrate that several architecture choices and training settings contribute to the open-loop control performance. Simulation stress tests further confirm static holding, stable extrapolated equilibria, and plausible relaxation to the rest state. To the best of our knowledge, this is the first demonstration that interpretable, video-learned latent dynamics enable reliable long-horizon open-loop control of an SCR.

2603.19648 2026-03-23 cs.LG cs.SY eess.SY math.OC stat.ML

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Siddharth Chandak, Anuj Yadav, Ayfer Ozgur, Nicholas Bambos

Comments Submitted to IEEE Transactions on Automatic Control

详情
英文摘要

Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.

2603.19641 2026-03-23 physics.soc-ph cs.MA cs.SY eess.SY

On the existence of fair zero-determinant strategies in the periodic prisoner's dilemma game

Ken Nakamura, Masahiko Ueda

Comments 25 pages

详情
英文摘要

Repeated games are a framework for investigating long-term interdependence of multi-agent systems. In repeated games, zero-determinant (ZD) strategies attract much attention in evolutionary game theory, since they can unilaterally control payoffs. Especially, fair ZD strategies unilaterally equalize the payoff of the focal player and the average payoff of the opponents, and they were found in several games including the social dilemma games. Although the existence condition of ZD strategies in repeated games was specified, its extension to stochastic games is almost unclear. Stochastic games are an extension of repeated games, where a state of an environment exists, and the state changes to another one according to an action profile of players. Because of the transition of an environmental state, the existence condition of ZD strategies in stochastic games is more complicated than that in repeated games. Here, we investigate the existence condition of fair ZD strategies in the periodic prisoner's dilemma game, which is one of the simplest stochastic games. We show that fair ZD strategies do not necessarily exist in the periodic prisoner's dilemma game, in contrast to the repeated prisoner's dilemma game. Furthermore, we also prove that the Tit-for-Tat strategy, which imitates the opponent's action, is not necessarily a fair ZD strategy in the periodic prisoner's dilemma game, whereas the Tit-for-Tat strategy is always a fair ZD strategy in the repeated prisoner's dilemma game. Our results highlight difference between ZD strategies in the periodic prisoner's dilemma game and ones in the standard repeated prisoner's dilemma game.

2603.19632 2026-03-23 cs.RO cs.SY eess.SY

ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers

Vrushabh Zinage, Narek Harutyunyan, Eric Verheyden, Fred Y. Hadaegh, Soon-Jo Chung

Comments Accepted to RA-L journal

详情
英文摘要

Legged locomotion in unstructured environments demands not only high-performance control policies but also formal guarantees to ensure robustness under perturbations. Control methods often require carefully designed reference trajectories, which are challenging to construct in high-dimensional, contact-rich systems such as quadruped robots. In contrast, Reinforcement Learning (RL) directly learns policies that implicitly generate motion, and uniquely benefits from access to privileged information, such as full state and dynamics during training, that is not available at deployment. We present ContractionPPO, a framework for certified robust planning and control of legged robots by augmenting Proximal Policy Optimization (PPO) RL with a state-dependent contraction metric layer. This approach enables the policy to maximize performance while simultaneously producing a contraction metric that certifies incremental exponential stability of the simulated closed-loop system. The metric is parameterized as a Lipschitz neural network and trained jointly with the policy, either in parallel or as an auxiliary head of the PPO backbone. While the contraction metric is not deployed during real-world execution, we derive upper bounds on the worst-case contraction rate and show that these bounds ensure the learned contraction metric generalizes from simulation to real-world deployment. Our hardware experiments on quadruped locomotion demonstrate that ContractionPPO enables robust, certifiably stable control even under strong external perturbations.