arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.03280 2026-03-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, Jitendra Malik

Comments Project page can be found at https://toruowo.github.io/peel

详情
英文摘要

Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characterized not only by contact-rich, force-sensitive dynamics, but also by their "implicit" success criteria: unlike pick-and-place, task quality in these domains is continuous and subjective (e.g. how well a potato is peeled), making quantitative evaluation and reward engineering difficult. We present a learning framework for such tasks, using peeling with a knife as a representative example. Our approach follows a two-stage pipeline: first, we learn a robust initial policy via force-aware data collection and imitation learning, enabling generalization across object variations; second, we refine the policy through preference-based finetuning using a learned reward model that combines quantitative task metrics with qualitative human feedback, aligning policy behavior with human notions of task quality. Using only 50-200 peeling trajectories, our system achieves over 90% average success rates on challenging produce including cucumbers, apples, and potatoes, with performance improving by up to 40% through preference-based finetuning. Remarkably, policies trained on a single produce category exhibit strong zero-shot generalization to unseen in-category instances and to out-of-distribution produce from different categories while maintaining over 90% success rates.

2603.03196 2026-03-04 math.NA cs.IT cs.LG cs.NA eess.SP math.IT math.PR

Infinite dimensional generative sensing

Paolo Angella, Vito Paolo Pastore, Matteo Santacesaria

详情
英文摘要

Deep generative models have become a standard for modeling priors for inverse problems, going beyond classical sparsity-based methods. However, existing theoretical guarantees are mostly confined to finite-dimensional vector spaces, creating a gap when the physical signals are modeled as functions in Hilbert spaces. This work presents a rigorous framework for generative compressed sensing in Hilbert spaces. We extend the notion of local coherence in an infinite-dimensional setting, to derive optimal, resolution-independent sampling distributions. Thanks to a generalization of the Restricted Isometry Property, we show that stable recovery holds when the number of measurements is proportional to the prior's intrinsic dimension (up to logarithmic factors), independent of the ambient dimension. Finally, numerical experiments on the Darcy flow equation validate our theoretical findings and demonstrate that in severely undersampled regimes, employing lower-resolution generators acts as an implicit regularizer, improving reconstruction stability.

2603.03184 2026-03-04 eess.SP

Continuous-Aperture Array-Based ISAC Over Fading Channels

Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

详情
英文摘要

A framework of continuous-aperture array (CAPA)-based integrated sensing and communications (ISAC) under a fading communication channel is proposed. A continuous operator-based signal model is developed, and the statistics of the communication channel gain are characterized via Landau's eigenvalue theorem. On this basis, the performance of the CAPA-based ISAC system is analyzed by considering three continuous beamforming designs: i) the sensing-centric (S-C) design that optimizes sensing performance, ii) the communication-centric (C-C) design that optimizes communication performance, and iii) the Pareto-optimal design that balances the sensing-communication trade-off. For the S-C and C-C design, closed-form expressions for the sensing rate (SR), ergodic communication rate (CR), and outage probability are derived, and high-signal-to-noise ratio asymptotic analysis is conducted to obtain the multiplexing and diversity gains. For the Pareto-optimal design, the Pareto-optimal beamformer achieving the Pareto boundary is derived, and the achievable SR-CR region is characterized. Numerical results demonstrate that the proposed CAPA-ISAC scheme outperforms both conventional spatially discrete arrays-based ISAC and CAPA-based frequency-division sensing and communications.

2603.03173 2026-03-04 eess.SY cs.SY

Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

Hassan Abdelraouf, Jeff S. Shamma

详情
英文摘要

No-regret learning dynamics ensure that a learner asymptotically achieves an average reward no worse than that of any fixed strategy. This no-regret guarantee does not determine the value of the asymptotic average reward. Indeed, it is possible for different no-regret learning dynamics to exhibit different asymptotic average rewards when facing the same environment while both assure the no-regret guarantee. This paper asks whether a "free-lunch" phenomenon can arise among no-regret algorithms. Namely, is it possible for one no-regret learning rule to uniformly outperform another no-regret learning rule across all payoff environments. Stated differently, can a learner regret not using a particular no-regret algorithm? We consider generalized replicator dynamics (RD) as a cascade interconnection between a linear time-invariant (LTI) system and the softmax nonlinearity. Varying this LTI system leads to different realizations of replicator dynamics, including so-called anticipatory RD, exponential RD, and other forms of higher-order RD. Setting the LTI system to be an integrator realizes standard RD, which is known to satisfy the no-regret property. Within this framework, we analyze and compare various realizations of these generalized realizations RD by varying the LTI system. We first formulate performance comparison as a passivity property of an associated comparison system and establish "local" dominance results, i.e., comparing the asymptotic performance near an equilibrium payoff vector. We then cast performance comparison between a form of anticipatory RD and standard RD as an optimal-control problem. We show that the minimal achievable cumulative reward gap is zero, thereby establishing global dominance of anticipatory RD across all payoff environments and establishing a "free lunch" among no-regret learning dynamics.

2603.03127 2026-03-04 eess.SY cs.SY math.DS

Deep Q-Learning-Based Gain Scheduling for Nonlinear Quadcopter Dynamics

Hossein Rastgoftar, Muhammad J. H. Zahed

详情
英文摘要

This paper presents a deep Q-network (DQN)-based gain-scheduling framework for safety-critical quadcopter trajectory tracking. Instead of directly learning control inputs, the proposed approach selects from a finite set of pre-certified stabilizing gain vectors, enabling reinforcement learning to operate within a structured and stability-preserving control architecture. By exploiting the isotropic structure of the translational dynamics, feedback gains are shared across spatial axes to reduce dimensionality while preserving performance. The learned policy adapts feedback aggressiveness in real time, applying high authority during large transients and reducing gains near convergence to limit control effort. Simulation results using a high-fidelity nonlinear quadcopter model demonstrate accurate trajectory tracking, bounded attitude excursions, smooth transition to hover after the final time, and consistent reward improvement, validating the effectiveness and robustness of the proposed learning-based gain scheduling strategy.

2603.03102 2026-03-04 eess.SP

KA band mobile antenna for satellite communication

Sidra Tul. Muntaha, Ahmad Arfeen, Kashan Raza

详情
英文摘要

This research focuses on the design of Ka-band mobile antennas for satellite communication operating at 29 GHz. Starting from a single element and progressing to an 8x8 array, the antennas achieved a gain of up to 21 dB and return losses as low as -30 dB. The design process involves mathematical calculations and software implementation, utilizing parameters like patch dimensions, substrate properties, and effective permittivity. The chosen Ka-band frequency range, known for higher data transfer rates, addresses the demand for swift communication. Challenges in Ka-band mobile antenna design, including signal attenuation, directional accuracy, circular polarization, and impedance matching, are addressed through various configurations, including phased-array and electronically steerable antennas. This research focuses on the design of Ka-band mobile antennas for satellite communication at 29GHz, progressing from a single element to an 8x8 array. The antennas achieved a gain of up to 21 dB and return losses as low as -30 dB through mathematical calculations and software implementation using CST. Challenges in Ka-band antenna design, such as signal attenuation and impedance matching, are addressed through various configurations, including phased-array and electronically steerable antennas. Integration of machine learning techniques aids in optimization. In conclusion, this research advances high-frequency transmission technology, meeting the demands of modern satellite-based communication systems for applications like high-speed internet access and multimedia streaming. Keywords: Ka-band, mobile antennas, satellite communication, 29 GHz, antenna design, CST, high-speedinternet access, multimedia streaming

2603.03082 2026-03-04 eess.SY cs.LG cs.SY math.DS math.OC

Safe and Robust Domains of Attraction for Discrete-Time Systems: A Set-Based Characterization and Certifiable Neural Network Estimation

Mohamed Serry, Maxwell Fitzsimmons, Jun Liu

详情
英文摘要

Analyzing nonlinear systems with attracting robust invariant sets (RISs) requires estimating their domains of attraction (DOAs). Despite extensive research, accurately characterizing DOAs for general nonlinear systems remains challenging due to both theoretical and computational limitations, particularly in the presence of uncertainties and state constraints. In this paper, we propose a novel framework for the accurate estimation of safe (state-constrained) and robust DOAs for discrete-time nonlinear uncertain systems with continuous dynamics, open safe sets, compact disturbance sets, and uniformly locally $\ell_p$-stable compact RISs. The notion of uniform $\ell_p$ stability is quite general and encompasses, as special cases, uniform exponential and polynomial stability. The DOAs are characterized via newly introduced value functions defined on metric spaces of compact sets. We establish their fundamental mathematical properties and derive the associated Bellman-type (Zubov-type) functional equations. Building on this characterization, we develop a physics-informed neural network (NN) framework to learn the corresponding value functions by embedding the derived Bellman-type equations directly into the training process. To obtain certifiable estimates of the safe robust DOAs from the learned neural approximations, we further introduce a verification procedure that leverages existing formal verification tools. The effectiveness and applicability of the proposed methodology are demonstrated through four numerical examples involving nonlinear uncertain systems subject to state constraints, and its performance is compared with existing methods from the literature.

2603.03060 2026-03-04 eess.IV eess.AS

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao

Comments 14 pages, 13 figures, 6 tables, 7 algorithms, 16 references, submitted to ACM/IEEE International Conference on Systems and Software Engineering

详情
英文摘要

We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent window architecture for independent rendering of danmaku (scrolling text), gift and like particle effects, and VIP entrance animations, built around an event-driven WebView2 capture pipeline and a thread-safe event bus. On top of this foundation we contribute an LLM broadcast automation framework comprising: (1) a per-song four-segment prompt scheduling system (T1 opening/transition, T2 empathy, T3 era story/production notes, T4 closing) that generates emotionally coherent radio-style commentary from lyric metadata; (2) a JSON-serializable RadioPersonaConfig schema supporting hot-swap multi-persona broadcasting; (3) a real-time danmaku quick-reaction engine with keyword routing to static urgent speech or LLM-generated empathetic responses; and (4) the Suwan Li AI singer-songwriter persona case study -- over 100 AI-generated songs produced with Suno. A 36-hour stress test demonstrates: zero danmaku overlap, zero deadlock crashes, gift effect P95 latency <= 180 ms, LLM-to-TTS segment P95 latency <= 2.1 s, and TTS integrated loudness gain of 9.5 LUFS. live streaming; danmaku; large language model; prompt engineering; virtual persona; WebView2; WINMM; TTS; Suno; loudness normalization; real-time scheduling

2603.02998 2026-03-04 cs.IT eess.SP math.IT

An Optimization-Based User Scheduling Framework for Multiuser MIMO Systems

Victoria Palhares, Christoph Studer

Comments Submitted to a journal

详情
英文摘要

Resource allocation is a key factor in multiuser (MU) multiple-input multiple-output (MIMO) wireless systems to provide high quality of service to all user equipments (UEs). In congested scenarios, UE scheduling enables UEs to be distributed over time, frequency, or space in order to mitigate inter-UE interference. Many existing UE scheduling methods rely on greedy algorithms, which fail at treating the resource-allocation problem globally. In this work, we propose a UE scheduling framework for MU-MIMO wireless systems that approximately solves a nonconvex optimization problem that treats scheduling globally. Our UE scheduling framework determines subsets of UEs that should transmit simultaneously in a given resource slot and is flexible in the sense that it (i) supports a variety of objective functions (e.g., post-equalization mean squared error, capacity, and achievable sum rate) and (ii) enables precise control over the minimum and maximum number of resources the UEs should occupy. We demonstrate the efficacy of our UE scheduling framework for millimeter-wave massive MU-MIMO and sub-6-GHz cell-free massive MU-MIMO systems, and we show that it outperforms existing scheduling algorithms while approaching the performance of an exhaustive search.

2603.02975 2026-03-04 eess.SY cs.SY

Grid-Forming Control with Assignable Voltage Regulation Guarantees and Safety-Critical Current Limiting

Bhathiya Rathnayake, Sijia Geng

详情
英文摘要

This paper develops a nonlinear grid-forming (GFM) controller with provable voltage-formation guarantees, with over-current limiting enforced via a control-barrier-function (CBF)-based safety filter. The nominal controller follows a droop-based inner-outer architecture, in which voltage references and frequency are generated by droop laws, an outer-loop voltage controller produces current references using backstepping (BS), and an inner-loop current controller synthesizes the terminal voltage. The grid voltage is treated as an unknown bounded disturbance, without requiring knowledge of its bound, and the controller design does not rely on any network parameters beyond the point of common coupling (PCC). To robustify voltage formation against the grid voltage, a deadzone-adapted disturbance suppression (DADS) framework is incorporated, yielding practical voltage regulation characterized by asymptotic convergence of the PCC voltage errors to an assignably small and known residual set. Furthermore, the closed-loop system is proven to be globally well posed, with all physical and adaptive states bounded and voltage error transients (due to initial conditions) decaying exponentially at an assignable rate. On top of the nominal controller, hard over-current protection is achieved through a minimally invasive CBF-based safety filter that enforces strict current limits via a single-constraint quadratic program. The safety filter is compatible with any locally Lipschitz nominal controller. Rigorous analysis establishes forward invariance of the safe-current set and boundedness of all states under current limiting. Numerical results demonstrate improved transient performance and faster recovery during current-limiting events when the proposed DADS-BS controller is used as the nominal control law, compared with conventional PI-based GFM control.

2603.02937 2026-03-04 eess.AS cs.LG

Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection

Kashaf Gulzar, Korbinian Riedhammer, Elmar Nöth, Andreas K. Maier, Paula Andrea Pérez-Toro

Comments 12 pages, 4 figures, 6 tables, Journal paper

详情
英文摘要

Speech-based detection of cognitive impairment (CI) offers a promising non-invasive approach for early diagnosis, yet performance disparities across demographic and clinical subgroups remain underexplored, raising concerns around fairness and generalizability. This study presents a systematic bias analysis of acoustic-based CI and depression classification using the DementiaBank Pitt Corpus. We compare traditional acoustic features (MFCCs, eGeMAPS) with contextualized speech embeddings from Wav2Vec 2.0 (W2V2), and evaluate classification performance across gender, age, and depression-status subgroups. For CI detection, higher-layer W2V2 embeddings outperform baseline features (UAR up to 80.6\%), but exhibit performance disparities; specifically, females and younger participants demonstrate lower discriminative power (\(AUC\): 0.769 and 0.746, respectively) and substantial specificity disparities (\(Δ_{spec}\) up to 18\% and 15\%, respectively), leading to a higher risk of misclassifications than their counterparts. These disparities reflect representational biases, defined as systematic differences in model performance across demographic or clinical subgroups. Depression detection within CI subjects yields lower overall performance, with mild improvements from low and mid-level W2V2 layers. Cross-task generalization between CI and depression classification is limited, indicating that each task depends on distinct representations. These findings emphasize the need for fairness-aware model evaluation and subgroup-specific analysis in clinical speech applications, particularly in light of demographic and clinical heterogeneity in real-world applications.

2603.02914 2026-03-04 eess.AS

Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?

Xin Wang, Ge Wanying, Junichi Yamagishi

Comments Submitted to Interspeech 2026; put on arxiv based on requirement of paper open-access rule; quote from Interspeech: "Interspeech no longer enforces an anonymity period for submissions. While uploading a version online is permitted, your official submission to Interspeech must not contain any author-identifying information"

详情
英文摘要

Building speech deepfake detection models that are generalizable to unseen attacks remains a challenging problem. Although the field has shifted toward a pre-training and fine-tuning paradigm using speech foundation models, most approaches rely solely on supervised fine-tuning (SFT). Inspired by the field of large language models, wherein reinforcement learning (RL) is used for model fine-tuning, we investigate the impact of RL, specifically Group Relative Policy Optimization (GRPO). The results from experiments using multiple detectors and test sets indicate that pure GRPO-based fine-tuning improves performance on out-of-domain test sets while maintaining performance on target-domain test data. This approach outperforms both SFT-only and hybrid setups. Our ablation studies further suggest that the negative reward in GRPO may be a key factor in this improvement.

2603.02877 2026-03-04 eess.AS

DBMIF: a deep balanced multimodal iterative fusion framework for air- and bone-conduction speech enhancement

Yilei Wu, Changyan Zheng, Xingyu Zhang, Yakun Zhang, Chengshi Zheng, Shuang Yang, Ye Yan, Erwei Yin

Comments 10 pages, 7 figures, Applied Intelligence

详情
英文摘要

The performance of conventional speech enhancement systems degrades sharply in extremely low signal-to-noise ratio (SNR) environments where air-conduction (AC) microphones are overwhelmed by ambient noise. Although bone-conduction (BC) sensors offer complementary, noise-tolerant information, existing fusion approaches struggle to maintain consistent performance across a wide range of SNR conditions. To address this limitation, we propose the Deep Balanced Multimodal Iterative Fusion Framework (DBMIF), a three-branch architecture designed to reconstruct high-fidelity speech through rigorous cross-modal interaction. Specifically, grounded in a multi-scale interactive encoder-decoder backbone, the framework orchestrates an iterative attention module and a cross-branch gated module to facilitate adaptive weighting and bidirectional exchange. To complement this dynamic interaction, a balanced-interaction bottleneck is further integrated to learn a compact, stable fused representation. Extensive experiments demonstrate that DBMIF achieves competitive performance compared with recent unimodal and multimodal baselines in both speech quality and intelligibility across diverse noise types. In downstream ASR tasks, the proposed method reduces the character error rate by at least 2.5 percent compared to competing approaches. These results confirm that DBMIF effectively harnesses the robustness of BC speech while preserving the naturalness of AC speech, ensuring reliability in real-world scenarios. The source code is publicly available at github.com/wyl516w/dbmif.

2603.02832 2026-03-04 eess.SP

Exploiting Double-Bounce Paths in Snapshot Radio SLAM: Bounds, Algorithms and Experiments

Xi Zhang, Yu Ge, Ossi Kaltiokallio, Musa Furkan Keskin, Henk Wymeersch, Mikko Valkama

详情
英文摘要

Radio-based simultaneous localization and mapping (SLAM) has the potential to provide precise user equipment (UE) localization and environmental sensing capabilities by exploiting radio signals. Most existing approaches leverage line-of-sight (LoS) and single-bounce non-line-of-sight (NLoS) paths solely, while higher-order NLoS paths are treated as disturbance. In this paper, we investigate the benefits of leveraging double-bounce NLoS paths for solving the bistatic snapshot radio SLAM problem. We derive the Cramer-Rao bound (CRB) for joint estimation of the UE state and landmark positions when double-bounce NLoS paths are present. In addition, we propose an algorithm to identify double-bounce NLoS paths and leverage them into joint UE and landmarks estimation. The derived bounds are validated through simulated data, and the proposed algorithms are evaluated using experimental millimeter wave (mmWave) measurements harnessing beamformed 5G cellular reference signals. The numerical and experimental results demonstrate that the double-bounce NLoS paths which share at least one incidence point (IP) with the single-bounce NLoS paths improve the estimation accuracy of the UE state and existing IPs of single-bounce NLoS paths. Importantly, exploiting double-bounce NLoS paths enhances environmental mapping capabilities by revealing landmarks that are unobservable with single-bounce NLoS paths alone.

2603.02794 2026-03-04 cs.SD cs.AI cs.LG eess.AS

Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising

Riccardo Rota, Kiril Ratmanski, Jozef Coldenhoff, Milos Cernak

Comments Submitted to Interspeech 2026

详情
英文摘要

We present TVF (Time-Varying Filtering), a low-latency speech enhancement model with 1 million parameters. Combining the interpretability of Digital Signal Processing (DSP) with the adaptability of deep learning, TVF bridges the gap between traditional filtering and modern neural speech modeling. The model utilizes a lightweight neural network backbone to predict the coefficients of a differentiable 35-band IIR filter cascade in real time, allowing it to adapt dynamically to non-stationary noise. Unlike ``black-box'' deep learning approaches, TVF offers a completely interpretable processing chain, where spectral modifications are explicit and adjustable. We demonstrate the efficacy of this approach on a speech denoising task using the Valentini-Botinhao dataset and compare the results to a static DDSP approach and a fully deep-learning-based solution, showing that TVF achieves effective adaptation to changing noise conditions.

2603.02768 2026-03-04 eess.SP

Enhancing AAV-Enabled Secure Communications via Synthetic Aperture Beamforming

Bin Qiu, Wenchi Cheng, Hongxiang He, Jiangzhou Wang

详情
英文摘要

In this paper, we consider a synthetic aperture secure beamforming approach for a virtual multiple-input multiple output (MIMO) broadcast channel in the presence of hybrid wiretapping environments. Our goal is to design the flight node deployment constructed by a single-antenna mobile autonomous aerial vehicle (AAV), corresponding transmission symbol strategy, transmit precoding, and received beamforming to maximize the system channel capacity. Leveraging the synthetic aperture beamforming, we aim to provide spatial gain along a predefined angle in free space while reducing it in others and thus enhance physical layer (PHY) security. To this end, we analyze the expression of the asymptotic channel eigenvalues to optimize the AAV flight node deployment. For the optimal precoding design, an energy-efficient method that minimizes the transmit power consumption is studied based on the given virtual MIMO channel, while meeting the quality of service (QoS) for the base station (BS), leakage tolerance of eavesdroppers (Eves), and per-node power constraints. The power minimization problem is a non convex program, which is then reformulated as a tractable form after some mathematical manipulations. Moreover, we design the received beamforming by applying the linearly constrained minimum variance (LCMV) method such that the jamming can be effectively suppressed. Numerical results demonstrate the superiority of the proposed method in promoting capacity.

2603.00728 2026-03-04 cs.LO cs.SE cs.SY eess.SY

Quantitative Monitoring of Signal First-Order Logic

Marek Chalupa, Thomas A. Henzinger, N. Ege Saraç, Emily Yu

Comments Full version of the FM 2026 paper

详情
英文摘要

Runtime monitoring checks, during execution, whether a partial signal produced by a hybrid system satisfies its specification. Signal First-Order Logic (SFO) offers expressive real-time specifications over such signals, but currently comes only with Boolean semantics and has no tool support. We provide the first robustness-based quantitative semantics for SFO, enabling the expression and evaluation of rich real-time properties beyond the scope of existing formalisms such as Signal Temporal Logic. To enable online monitoring, we identify a past-time fragment of SFO and give a pastification procedure that transforms bounded-response SFO formulas into equisatisfiable formulas in this fragment. We then develop an efficient runtime monitoring algorithm for this past-time fragment and evaluate its performance on a set of benchmarks, demonstrating the practicality and effectiveness of our approach. To the best of our knowledge, this is the first publicly available prototype for online quantitative monitoring of full SFO.

2603.00572 2026-03-04 physics.optics cs.SY eess.SY

Depth-adapted adaptive optics for three-photon microscopy

Qi Hu, Jingyu Wang, Huriye Atilgan, Armin Lak, Martin J. Booth

详情
英文摘要

Three-photon (3-P) fluorescence microscopy enables deep in vivo imaging with subcellular resolution, but its performance is fundamentally constrained by the maximum permissible laser power required to avoid tissue heating and photodamage. Under these power-limited conditions, fluorescence signal generation, image contrast, and achievable imaging depth are strongly affected by the illumination beam profile and aberration correction strategy. In this paper, we showed that using a fixed illumination beam size was suboptimal across different imaging depths. We further showed that conventional Zernike-based adaptive optics (AO) correction degrades under reduced Gaussian illumination beam sizes due to loss of modal orthogonality. This degradation results in slow convergence, unintended focal and field-of-view shifts, and excessive wavefront deformations. To overcome these limitations, we introduced a depth-adapted AO framework in which both the illumination beam profile and the aberration correction basis were dynamically matched to the imaging conditions. By combining depth-optimised beam underfilling with a bespoke set of illumination-matched aberration modes, we achieved faster and more stable AO convergence, enhanced fluorescence signal and image quality during deep in vivo multi-channel neuroimaging. Together, these results established a practical and robust AO-enabled three-photon microscopy strategy that maximised imaging performance under realistic power constraints.

2512.22901 2026-03-04 eess.SY cs.AI cs.LG cs.SY eess.SP

A Neural Network-Based Real-time Casing Collar Recognition System for Downhole Instruments

Si-Yu Xiao, Xin-Di Zhao, Xiang-Zhan Wang, Tian-Hao Mao, Ying-Kai Liao, Xing-Yu Liao, Yu-Qiao Chen, Jun-Jie Wang, Shuang Liu, Tu-Pei Chen, Yang Liu

详情
英文摘要

Casing collar locator (CCL) measurements are widely used as reliable depth markers for positioning downhole instruments in cased-hole operations, enabling accurate depth control for operations such as perforation. However, autonomous collar recognition in downhole environments remains challenging because CCL signals are often corrupted by toolstring- or casing-induced magnetic interference, while stringent size and power budgets limit the use of computationally intensive algorithms and specific operations require real-time, in-situ processing. To address these constraints, we propose Collar Recognition Nets (CRNs), a family of domain-specific lightweight 1-D convolutional neural networks for collar signature recognition from streaming CCL waveforms. With depthwise separable convolutions and input pooling, CRNs optimize efficiency without sacrificing accuracy. Our most compact model achieves an F1-score of 0.972 on field data with only 1,985~parameters and 8,208~MACs, and deployed on an ARM Cortex-M7 based embedded system using TensorFlow Lite for Microcontrollers (TFLM) library, the model demonstrates a throughput of 1,000 inference per second and 343.2 μs latency, confirming the feasibility of robust, autonomous, and real-time collar recognition under stringent downhole constraints.

2511.14065 2026-03-04 q-bio.NC cs.SY eess.SY

Intrinsic Resonance depends on Network Size of Coupled-Delayed Interacting Oscillators

Felipe A. Torres, Alejandro Weinstein, Jesus M. Cortes, Wael El-Deredy

Comments 16 pages, 3 figures: 2 figures in the main text and 1 figure in the appendix

详情
英文摘要

The collective frequency that emerges from synchronized neuronal populations--the network resonance--shows a systematic relationship with brain size: whole-brain's large networks oscillate slowly, whereas finer parcellations of fixed volume exhibit faster rhythms. This resonance-size scaling has been reported in delayed neural mass models and human neuroimaging, yet the physical mechanism remained unresolved. Here we show that size-dependent resonance follows directly from propagation delays in delay-coupled phase oscillators. Starting from a Kuramoto model with heterogeneous delays, we linearize around the near-synchronous solution and obtain a closed-form approximation linking the resonance $Ω$ to the mean delay and the effective coupling field. The analysis predicts a generic scaling law: $Ω\approx (\sum_j c_{ij} τ)^{-1}$, so resonance is delay-limited and therefore depends systematically on geometric size or parcellation density. We evaluate four growth scenarios--expanding geometry, fixed-volume parcellation, constant geometry, and an unphysical reference case--and show that only geometry-consistent scaling satisfies the analytical prediction. Numerical simulations with heterogeneous delays validate the law and quantify its error as a function of delay dispersion. These results identify a minimal physical mechanism for size-dependent cortical resonance and provide an analytical framework that unifies numeric simulation outputs.

2510.17270 2026-03-04 cs.RO cs.SY eess.SY

Floating-Base Deep Lagrangian Networks

Lucas Schulze, Juliano Decico Negri, Victor Barasuol, Vivian Suzano Medeiros, Marcelo Becker, Jan Peters, Oleg Arenz

详情
英文摘要

Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve better overall performance on both simulated and real robots, while providing greater physical interpretability.

2510.16953 2026-03-04 eess.SY cs.RO cs.SY

Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach

Ersin Das, William A. Welch, Patrick Spieler, Keenan Albee, Aurelio Noca, Jeffrey Edlund, Jonathan Becktor, Thomas Touma, Jessica Todd, Sriramya Bhamidipati, Stella Kombo, Maira Saboia, Anna Sabel, Grace Lim, Rohan Thakker, Amir Rahmani, Joel W. Burdick

详情
英文摘要

Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.

2510.00256 2026-03-04 eess.AS cs.SD

Subjective quality evaluation of personalized own voice reconstruction systems

Mattes Ohlenbusch, Christian Rollwage, Simon Doclo, Jan Rennies

Comments Submitted to Acta Acustica

详情
英文摘要

Own voice pickup technology for hearable devices facilitates communication in noisy environments. Own voice reconstruction (OVR) systems enhance the quality and intelligibility of the recorded noisy own voice signals. Since disturbances affecting the recorded own voice signals depend on individual factors, personalized OVR systems have the potential to outperform generic OVR systems. In this paper, we propose personalizing OVR systems through data augmentation and fine-tuning, comparing them to their generic counterparts. We investigate the influence of personalization on speech quality assessed by objective metrics and conduct a subjective listening test to evaluate quality under various conditions. In addition, we assess the prediction accuracy of the objective metrics by comparing predicted quality with subjectively measured quality. Our findings suggest that personalized OVR provides benefits over generic OVR for some talkers only. Our results also indicate that performance comparisons between systems are not always accurately predicted by objective metrics. In particular, certain disturbances lead to a consistent overestimation of quality compared to actual subjective ratings.

2507.16733 2026-03-04 eess.SP

Generative Diffusion Models for Wireless Networks: Fundamental, Architecture, and State-of-the-Art

Dayu Fan, Rui Meng, Xiaodong Xu, Yiming Liu, Guoshun Nan, Chenyuan Feng, Shujun Han, Song Gao, Bingxuan Xu, Dusit Niyato, Tony Q. S. Quek, Ping Zhang

Comments 46 pages, 10 figures

详情
英文摘要

With the rapid development of Generative Artificial Intelligence (GAI) technology, Generative Diffusion Models (GDMs) have shown significant empowerment potential in the field of wireless networks due to advantages, such as noise resistance, training stability, controllability, and multimodal generation. Although there have been multiple studies focusing on GDMs for wireless networks, there is still a lack of comprehensive reviews on their technological evolution. Motivated by this, we systematically explore the application of GDMs in wireless networks. Firstly, we identify the core challenges of wireless networks and argue why GDMs are uniquely suited to address them. We then introduce the mathematical principles of GDMs and representative models. Furthermore, we organize our comprehensive review through a structured taxonomy that categorizes GDM-based schemes into the sensing, transmission, and Applications, complemented by a security plane. For each representative scheme, we analyze its innovative points, the role of GDMs, strengths, and weaknesses. Ultimately, we extract key challenges and provide potential solutions, with the aim of providing directional guidance for future research in this field.

2506.23569 2026-03-04 quant-ph cs.SY eess.SY

Alleviating CoD in Renewable Energy Profile Clustering Using an Optical Quantum Computer

Chengjun Liu, Yijun Xu, Wei Gu, Bo Sun, Kai Wen, Shuai Lu, Lamine Mili

详情
Journal ref
published by CSEE Journal of Power & Energy Systems, 2026
英文摘要

The traditional clustering problem of renewable energy profiles is typically formulated as a combinatorial optimization that suffers from the Curse of Dimensionality (CoD) on classical computers. To address this issue, this paper first proposed a kernel-based quantum clustering method. More specifically, the kernel-based similarity between profiles with minimal intra-group distance is encoded into the ground-state of the Hamiltonian in the form of an Ising model. Then, this NP-hard problem can be reformulated into a Quadratic Unconstrained Binary Optimization (QUBO), which a Coherent Ising Machine (CIM) can naturally solve with significant improvement over classical computers. The test results from a real optical quantum computer verify the validity of the proposed method. It also demonstrates its ability to address CoD in an NP-hard clustering problem.

2412.09646 2026-03-04 eess.IV cs.CV cs.GR cs.LG

RealOSR: Latent Guidance Boosts Diffusion-based Real-world Omnidirectional Image Super-Resolutions

Xuhan Sheng, Runyi Li, Bin Chen, Weiqi Li, Xu Jiang, Jian Zhang

详情
英文摘要

Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), catering to the growing demand for detailed visual content across a $ 180^{\circ}\times360^{\circ}$ viewport. Existing ODISR methods are limited by simplified degradation assumptions (e.g., bicubic downsampling), failing to model and exploit the real-world degradation information. Recent latent-based diffusion approaches using condition guidance suffer from slow inference due to their hundreds of updating steps and frequent use of VAE. To tackle these challenges, we propose \textbf{RealOSR}, a diffusion-based framework tailored for real-world ODISR, featuring efficient latent-based condition guidance within a one-step denoising paradigm. Central to efficient latent-based condition guidance is the proposed \textbf{Latent Gradient Alignment Routing (LaGAR)}, a lightweight module that enables effective pixel-latent space interactions and simulates gradient descent directly in the latent space, thereby leveraging the semantic richness and multi-scale features captured by the denoising UNet. Compared to the recent diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released upon acceptance.

2401.01255 2026-03-04 eess.AS cs.AI cs.MM eess.SP

On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals

George P. Kafentzis

详情
英文摘要

In this paper, we examine the parameter estimation performance of three well-known sinusoidal models for speech and audio. The first one is the standard Sinusoidal Model (SM), which is based on the Fast Fourier Transform (FFT). The second is the Exponentially Damped Sinusoidal Model (EDSM) which has been proposed in the last decade, and utilizes a subspace method for parameter estimation, and finally the extended adaptive Quasi-Harmonic Model (eaQHM), which has been recently proposed for AM-FM decomposition, and estimates the signal parameters using Least Squares on a set of basis function that are adaptive to the local characteristics of the signal. The parameter estimation of each model is briefly described and its performance is compared to the others in terms of signal reconstruction accuracy versus window size on a variety of synthetic signals and versus the number of sinusoids on real signals. The latter include highly non stationary signals, such as singing voices and guitar solos. The advantages and disadvantages of each model are presented via synthetic signals and then the application on real signals is discussed. Conclusively, eaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes. Thus, a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.

2307.04842 2026-03-04 eess.AS cs.AI

Predicting Tuberculosis from Real-World Cough Audio Recordings and Metadata

George P. Kafentzis, Stephane Tetsing, Joe Brew, Lola Jover, Mindaugas Galvosas, Carlos Chaccour, Peter M. Small

详情
英文摘要

Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis and primarily affects the lungs, as well as other body parts. TB is spread through the air when an infected person coughs, sneezes, or talks. Medical doctors diagnose TB in patients via clinical examinations and specialized tests. However, coughing is a common symptom of respiratory diseases such as TB. Literature suggests that cough sounds coming from different respiratory diseases can be distinguished by both medical doctors and computer algorithms. Therefore, cough recordings associated with patients with and without TB seems to be a reasonable avenue of investigation. In this work, we utilize a very large dataset of TB and non-TB cough audio recordings obtained from the south-east of Africa, India, and the south-east of Asia using a fully automated phone-based application (Hyfe), without manual annotation. We fit statistical classifiers based on spectral and time domain features with and without clinical metadata. A stratified grouped cross-validation approach shows that an average Area Under Curve (AUC) of approximately 0.70 $\pm$ 0.05 both for a cough-level and a participant-level classification can be achieved using cough sounds alone. The addition of demographic and clinical factors increases performance, resulting in an average AUC of approximately 0.81 $\pm$ 0.05. Our results suggest mobile phone-based applications that integrate clinical symptoms and cough sound analysis could help community health workers and, most importantly, health service programs to improve TB case-finding efforts while reducing costs, which could substantially improve public health.

2110.03427 2026-03-04 cs.LG cs.CL cs.SD eess.AS eess.SP

Is Attention always needed? A Case Study on Language Identification from Speech

Atanu Mandal, Santanu Pal, Indranil Dutta, Mahidas Bhattacharya, Sudip Kumar Naskar

Comments Accepted for publication in Natural Language Engineering

详情
Journal ref
Nat. lang. process. 31 (2025) 250-276
英文摘要

Language Identification (LID) is a crucial preliminary process in the field of Automatic Speech Recognition (ASR) that involves the identification of a spoken language from audio samples. Contemporary systems that can process speech in multiple languages require users to expressly designate one or more languages prior to utilization. The LID task assumes a significant role in scenarios where ASR systems are unable to comprehend the spoken language in multilingual settings, leading to unsuccessful speech recognition outcomes. The present study introduces convolutional recurrent neural network (CRNN) based LID, designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples. Furthermore, we replicate certain state-of-the-art methodologies, specifically the Convolutional Neural Network (CNN) and Attention-based Convolutional Recurrent Neural Network (CRNN with attention), and conduct a comparative analysis with our CRNN-based approach. We conducted comprehensive evaluations on thirteen distinct Indian languages and our model resulted in over 98\% classification accuracy. The LID model exhibits high-performance levels ranging from 97% to 100% for languages that are linguistically similar. The proposed LID model exhibits a high degree of extensibility to additional languages and demonstrates a strong resistance to noise, achieving 91.2% accuracy in a noisy setting when applied to a European Language (EU) dataset.

2603.02721 2026-03-04 eess.SP

Doppler Shift Keying Modulation for Uplink Multiple Access over Doubly-Dispersive Channels

Xuehan Wang, Jintao Wang, Hai Lin, Jinhong Yuan, Xu Shi, Hengyu Zhang, Jian Song

Comments This paper has been accepted by IEEE Transactions on Vehicular Technology (TVT)

详情
英文摘要

The delay-Doppler (DD) domain modulation has been regarded as one of the most competitive candidates to support wireless communications for emerging high-mobility applications in the sixth-generation mobile networks. Unfortunately, most of the existing designs for DD domain modulation suffer from high peak-to-average power ratio (PAPR) and unbearable detection complexity under uplink transmission since large time duration and bandwidth are required to guarantee high DD resolutions. To address these issues, the Doppler shift keying (DSK) modulation based on the orthogonal delay Doppler division multiplexing modulator is proposed in this paper, where the input-output characterization in the DD domain is fully exploited. The principle of the DSK transceiver is first established with the one-hot mapper and low-complexity iterative successive interference cancellation-maximum ratio combining detector for point-to-point scenarios. The proposed scheme is then generalized to the zero auto-correlation sequence-based implementation, which benefits the extension of multi-user (MU) uplink DSK frameworks. For uplink DSK transmission, Zadoff-Chu (ZC) sequences are adopted as the basis sequences. We optimize the assignment of ZC roots to different user equipments (UEs) by minimizing the maximum inter-user interference. This optimization process, which analyzes the root allocation, directly assigns a specific ZC sequence to each UE. The PAPR and bit error rate performance of the proposed DSK modulation with the low-complexity detector is finally verified by extensive simulation results under doubly-dispersive channels, which demonstrates the superiority of DSK modulation especially for uplink multiple access over doubly dispersive channels.