arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.06548 2026-03-09 cs.RO cs.SY eess.SY

Uncertainty-Aware Adaptive Dynamics For Underwater Vehicle-Manipulator Robots

Edward Morgan, Nenyi K Dadson, Corina Barbalata

详情
英文摘要

Accurate and adaptive dynamic models are critical for underwater vehicle-manipulator systems where hydrodynamic effects induce time-varying parameters. This paper introduces a novel uncertainty-aware adaptive dynamics model framework that remains linear in lumped vehicle and manipulator parameters, and embeds convex physical consistency constraints during online estimation. Moving horizon estimation is used to stack horizon regressors, enforce realizable inertia, damping, friction, and hydrostatics, and quantify uncertainty from parameter evolution. Experiments on a BlueROV2 Heavy with a 4-DOF manipulator demonstrate rapid convergence and calibrated predictions. Manipulator fits achieve R2 = 0.88 to 0.98 with slopes near unity, while vehicle surge, heave, and roll are reproduced with good fidelity under stronger coupling and noise. Median solver time is approximately 0.023 s per update, confirming online feasibility. A comparison against a fixed parameter model shows consistent reductions in MAE and RMSE across degrees of freedom. Results indicate physically plausible parameters and confidence intervals with near 100% coverage, enabling reliable feedforward control and simulation in underwater environments.

2603.06541 2026-03-09 eess.SP cs.SY eess.SY

Codebook Design and Baseband Precoding for Pragmatic Array-Fed RIS Hybrid Multiuser MIMO

Krishan Kumar Tiwari, Giuseppe Caire

详情
英文摘要

In our previous work [2], we introduced a hardware- and power-efficient architecture for hybrid digital-analog (HDA) multiuser MIMO (MU-MIMO) based on stacking identical basic modules. Each module consists of a small active multi-antenna feeder (AMAF) placed in the near field of a larger reflective intelligent surface (RIS). Each AMAF is driven by one RF chain and conveys one spatial stream, achieving a multiplexing gain of $K$ with $K$ stacked modules. While [2] focused on module design and efficiency compared to active arrays, performance was evaluated only under pure line-of-sight (LOS) conditions. This work extends our approach in several ways. First, we propose a simple, pragmatic method for designing phase-only flat-top beams for the AMAF-RIS module, enabling wide angular coverage with low ripple and sidelobes. This design supports hierarchical beamforming codebooks for efficient beam acquisition. Second, we evaluate MU-MIMO performance under realistic mmWave multipath channels including both LOS and non-LOS (NLOS) components modeled using a 3D von Mises-Fisher distribution. We propose a low-complexity HDA MU-MIMO framework with: user-beam association via standard beam acquisition; dynamic user grouping (one user per beam); effective baseband MIMO channel estimation using 3GPP-compliant pilots; and downlink transmission with zero-forcing precoding under per-antenna power constraints. Results show high spectral efficiency and multiplexing gain while preserving hardware simplicity and power efficiency. Crucially, the approach is fully compliant with 3GPP 5GNR beam acquisition and sounding reference signaling mechanisms.

2603.06536 2026-03-09 eess.SY cs.SY

Adaptive Data-Driven Min-Max MPC for Linear Time-Varying Systems

Yifan Xie, Julian Berberich, Frank Allgöwer

详情
英文摘要

In this paper, we propose an adaptive data-driven min-max model predictive control (MPC) scheme for discrete-time linear time-varying (LTV) systems. We assume that prior knowledge of the system dynamics and bounds on the variations are known, and that the states are measured online. Starting from an initial state-feedback gain derived from prior knowledge, the algorithm updates the state-feedback gain using online input-state data. To this end, a semidefinite program (SDP) is solved to minimize an upper bound on the infinite-horizon optimal cost and to derive a corresponding state-feedback gain. We prove that the resulting closed-loop system is exponentially stabilized and satisfies the constraints. Further, we extend the proposed scheme to LTV systems with process noise. The resulting closed-loop system is shown to be robustly stabilized to a robust positive invariant (RPI) set. Finally, the proposed methods are demonstrated by numerical simulations.

2603.06515 2026-03-09 eess.SP

A Unified Multicarrier Waveform Framework for Next-generation Wireless Networks: Principles, Performance, and Challenges

Xingyao Zhang, Haoran Yin, Yanqun Tang, Yao Ge, Yong Zeng, Miaowen Wen, Zilong Liu, Yong Liang Guan, Hüseyin Arslan, Giuseppe Caire

Comments This paper has been accepted by the IEEE Communications Surveys & Tutorials

详情
英文摘要

Next-generation wireless networks require enhanced flexibility, efficiency, and reliability in physical layer waveform design to address the challenges posed by heterogeneous channel conditions and stringent quality-of-service demands. To this end, this paper proposes a unified multicarrier waveform framework that provides a systematic characterization and practical implementation guidelines to facilitate waveform selection for the sixth-generation (6G) mobile networks and beyond. We commence by examining the design principles of the state-of-the-art waveforms, which are categorized into one-dimensional modulation waveforms (e.g., orthogonal frequency division multiplexing (OFDM) and affine frequency division multiplexing (AFDM)) and two-dimensional modulation waveforms (e.g., orthogonal time frequency space (OTFS)). Their inherent resilience against various channel-induced interference is further studied, revealing their distinct suitability in diverse channel conditions. Furthermore, an in-depth performance analysis is presented by comparing their key performance indicators (KPIs), followed by an extensive exploration of these advanced waveforms in various applications. Consequently, this work aims to serve as a pivotal reference for waveform adoption in future 6G standardization and network deployment.

2603.06494 2026-03-09 cs.RO cs.SY eess.SY

Control Barrier Corridors: From Safety Functions to Safe Sets

Ömür Arslan, Nikolay Atanasov

Comments 12 pages, 6 figures, an extended preprint version of a conference paper

详情
英文摘要

Safe autonomy is a critical requirement and a key enabler for robots to operate safely in unstructured complex environments. Control barrier functions and safe motion corridors are two widely used but technically distinct safety methods, functional and geometric, respectively, for safe motion planning and control. Control barrier functions are applied to the safety filtering of control inputs to limit the decay rate of system safety, whereas safe motion corridors are geometrically constructed to define a local safe zone around the system state for use in motion optimization and reference-governor design. This paper introduces a new notion of control barrier corridors, which unifies these two approaches by converting control barrier functions into local safe goal regions for reference goal selection in feedback control systems. We show, with examples on fully actuated systems, kinematic unicycles, and linear output regulation systems, that individual state safety can be extended locally over control barrier corridors for convex barrier functions, provided the control convergence rate matches the barrier decay rate, highlighting a trade-off between safety and reactiveness. Such safe control barrier corridors enable safely reachable persistent goal selection over continuously changing barrier corridors during system motion, which we demonstrate for verifiably safe and persistent path following in autonomous exploration of unknown environments.

2512.15994 2026-03-09 cs.RO cs.SY eess.SY

SORS: A Modular, High-Fidelity Simulator for Soft Robots

Manuel Mekkattu, Mike Y. Michelis, Robert K. Katzschmann

Comments This work has been submitted to the IEEE for possible publication. Code and data are available at github.com/srl-ethz/sors

详情
英文摘要

The deployment of complex soft robots in multiphysics environments requires advanced simulation frameworks that not only capture interactions between different types of material, but also translate accurately to real-world performance. Soft robots pose unique modeling challenges due to their large nonlinear deformations, material incompressibility, and contact interactions, which complicate both numerical stability and physical accuracy. Despite recent progress, robotic simulators often struggle with modeling such phenomena in a scalable and application-relevant manner. We present SORS (Soft Over Rigid Simulator), a versatile, high-fidelity simulator designed to handle these complexities for soft robot applications. Our energy-based framework, built on the finite element method, allows modular extensions, enabling the inclusion of custom-designed material and actuation models. To ensure physically consistent contact handling, we integrate a constrained nonlinear optimization based on sequential quadratic programming, allowing for stable and accurate modeling of contact phenomena. We validate our simulator through a diverse set of real-world experiments, which include cantilever deflection, pressure-actuation of a soft robotic arm, and contact interactions from the PokeFlex dataset. In addition, we showcase the potential of our framework for control optimization of a soft robotic leg. These tests confirm that our simulator can capture both fundamental material behavior and complex actuation dynamics with high physical fidelity. By bridging the sim-to-real gap in these challenging domains, our approach provides a validated tool for prototyping next-generation soft robots, filling the gap of extensibility, fidelity, and usability in the soft robotic ecosystem.

2511.07349 2026-03-09 physics.flu-dyn cs.SY eess.SY nlin.CD

Modeling Unsteady Aircraft Aerodynamics Using Lorenz Attractor: A Reduced-Order Approach for Wing Rock

Marcel Menner, Eugene Lavretsky

详情
Journal ref
AIAA SCITECH 2026 Forum
英文摘要

This paper presents a novel modeling approach for unsteady aircraft airflow, leveraging the Lorenz attractor framework. The proposed model is based on the force distribution exerted by a lift-generating wing on the surrounding fluid. It distinguishes between turbulent and nominal components of the force distribution, with the nominal force distribution modeled to peak at the wing and decay linearly into the free stream. This separation allows the turbulent component to be represented by a transport equation that is influenced by flight conditions, specifically dynamic pressure and angle of attack. Consequently, the Navier-Stokes equations, along with the turbulence transport equation, can be transformed into a reduced-order model characterized by three scalar ordinary differential equations - similar to the Lorenz attractor. This resulting system effectively captures chaotic behavior, facilitating the exploration of complex dynamics without the computational demands of solving the full Navier-Stokes equations. A simulation trade study is conducted that models wing rock phenomena at high angles of attack, demonstrating the effectiveness of the proposed approach in capturing the intricate dynamics of unsteady aircraft aerodynamics.

2511.07335 2026-03-09 eess.SY cs.SY

Robust Linear Design for Flight Control Systems with Operational Constraints

Marcel Menner, Eugene Lavretsky

详情
Journal ref
AIAA SCITECH 2026 Forum
英文摘要

This paper presents a systematic approach for designing robust linear proportional-integral (PI) servo-controllers that effectively manage control input and output constraints in flight control systems. The control design leverages the Nagumo Theorem and the Comparison Lemma to prove constraint satisfaction, while employing min-norm optimal controllers in a manner akin to Control Barrier Functions. This results in a continuous piecewise-linear state feedback policy that maintains the analyzability of the closed-loop system through the principles of linear systems theory. Additionally, we derive multi-input multi-output (MIMO) robustness margins, demonstrating that our approach enables robust tracking of external commands even in the presence of operational constraints. Moreover, the proposed control design offers a systematic approach for anti-windup protection. Through flight control trade studies, we illustrate the applicability of the proposed framework to real-world safety-critical aircraft control scenarios. Notably, MIMO margin analysis with active constraints reveals that our method preserves gain and phase margins comparable to those of the unconstrained case, in contrast to controllers that rely on hard saturation heuristics, which suffer significant performance degradation under active constraints. Simulation results using a nonlinear six-degree-of-freedom rigid body aircraft model further validate the effectiveness of our method in achieving constraint satisfaction, robustness, and effective anti-windup protection.

2510.17798 2026-03-09 eess.SY cs.SY stat.AP

Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks

Samuel Talkington, Cameron Khanpour, Rahul K. Gupta, Sergio A. Dorado-Rojas, Daniel Turizo, Hyeongon Park, Dmitrii M. Ostrovskii, Daniel K. Molzahn

Comments 9 pages, 2 figures

详情
英文摘要

This paper presents conservative probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices. This provides a theoretical framework for understanding error bounds of common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations. Additionally, we show that the upper bounds scale as functions of nodal criticality. This network-theoretic quantity captures how uncertainty concentrates at critical nodes for use in contingency analysis. We validate these bounds on IEEE test networks, demonstrating that they correctly capture the scaling behavior of spectral perturbations up to conservative constants.

2510.13378 2026-03-09 quant-ph cs.NA cs.SY eess.SY math.NA

Performance Comparison of Gate-Based and Adiabatic Quantum Computing for AC Power Flow Problem

Zeynab Kaseb, Matthias Moller, Peter Palensky, Pedro P. Vergara

Comments 12 pages, 2 figures, 4 tables

详情
英文摘要

We present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) paradigms for solving the AC power flow (PF) equations. The PF problem is reformulated as a combinatorial optimization problem. For the GQC approach, the Quantum Approximate Optimization Algorithm (QAOA) is employed, while for the AQC approach, the problem is formulated as an Ising model. Numerical experiments on a 4-bus test system evaluate solution accuracy and computational performance. Results obtained using QAOA are benchmarked against those produced by D-Wave's Advantage system and Fujitsu's latest-generation Digital Annealer, implemented through the Quantum-Inspired Integrated Optimization (QIIO) software. The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC and AQC paradigms for PF analysis, highlighting the potential of quantum optimization algorithms to address the computational challenges associated with the operation of modern electricity grids in the fault-tolerant era.

2510.01041 2026-03-09 cs.RO cs.SY eess.SY

ROSplane 2.0: A Fixed-Wing Autopilot for Research

Ian Reid, Joseph Ritchie, Jacob Moore, Brandon Sutherland, Gabe Snow, Phillip Tokumaru, Tim McLain

Comments Submitted to the 2026 International Conference on Unmanned Aerial Systems

详情
英文摘要

Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Built around ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily-understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring costly system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.

2510.00995 2026-03-09 cs.RO cs.SY eess.SY

ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles

Jacob Moore, Phil Tokumaru, Ian Reid, Brandon Sutherland, Joseph Ritchie, Gabe Snow, Tim McLain

Comments Submitted to the 2026 International Conference on Unmanned Aerial Systems

详情
英文摘要

ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.

2508.06490 2026-03-09 eess.IV cs.CV cs.LG eess.SP

Multivariate Fields of Experts for Convergent Image Reconstruction

Stanislas Ducotterd, Michael Unser

详情
英文摘要

We introduce the multivariate fields of experts, a new framework for the learning of image priors. Our model generalizes existing fields of experts methods by incorporating multivariate potential functions constructed via Moreau envelopes of the $\ell_\infty$-norm. We demonstrate the effectiveness of our proposal across a range of inverse problems that include image denoising, deblurring, compressed-sensing magnetic-resonance imaging, and computed tomography. The proposed approach outperforms comparable univariate models and achieves performance close to that of deep-learning-based regularizers while being significantly faster, requiring fewer parameters, and being trained on substantially fewer data. In addition, our model retains a high level of interpretability due to its structured design. It is supported by theoretical convergence guarantees which ensure reliability in sensitive reconstruction tasks.

2508.03381 2026-03-09 cs.IT eess.SP math.IT

Unequal Error Protection for Digital Semantic Communication with Channel Coding

Seonjung Kim, Yongjeong Oh, Yongjune Kim, Namyoon Lee, Yo-Seb Jeon

详情
英文摘要

This paper investigates unequal error protection (UEP) in digital semantic communication, where semantically important bits require substantially higher reliability than less critical ones. To characterize this heterogeneity, we introduce a novel perspective that treats learned bit-flip probabilities of semantic bits as target error protection levels, thereby directly linking semantic importance to bit-level reliability. This formulation reveals that the required protection levels of the semantic bits may differ by several orders of magnitude, making short-block coding more advantageous than conventional long-block designs. Motivated by this, we develop two UEP frameworks that minimize total blocklength under heterogeneous reliability constraints. First, we propose a bit-level UEP framework based on repetition coding, providing an analytically tractable solution that precisely meets per-bit protection requirements. Second, to improve energy and blocklength efficiency, we design a block-level UEP framework in which the semantic bits are partitioned into short blocks with similar protection levels. Guided by finite blocklength capacity analysis, we derive a closed-form threshold condition for beneficial partitioning and develop a systematic algorithm for integrating modern channel codes. Simulation results on image transmission tasks demonstrate substantial gains in both task performance and transmission efficiency compared with conventional equal-protection schemes.

2505.19146 2026-03-09 physics.med-ph eess.SP

Design of a Wearable Parallel Electrical Impedance Imaging System for Healthcare

Bowen Li, Zekun Chen, Xuefei Chen, Luhao Zhang, Shili Liang

详情
英文摘要

A wireless wearable Electrical Impedance Tomography (EIT) system has been developed utilizing the AD5933 chip to achieve real-time imaging of lung respiration. The system employs a voltage excitation method tailored to human impedance characteristics, injecting current by applying a known voltage and measuring the resulting current through the body. Additionally, specific measures have been implemented to effectively suppress signal oscillations and leakage currents caused by parasitic capacitances. To enhance data acquisition speed, the system employs five parallel AD5933 units, with multiple techniques implemented to ensure high synchronization during simultaneous measurements. Performance testing shows that the system achieves a signal-to-noise ratio greater than 50 dB, a relative standard deviation below 0.3%, and a reciprocity error under 0.8%. Imaging experiments using a water tank phantom, human lungs during breathing, and a resting human calf further demonstrate that this portable EIT system can accurately measure biological tissues with high precision and low cost.

2503.11787 2026-03-09 cs.CV eess.IV

ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement

Samuel W. Remedios, Shuwen Wei, Shuo Han, Jinwei Zhang, Aaron Carass, Kurt G. Schilling, Dzung L. Pham, Jerry L. Prince, Blake E. Dewey

详情
英文摘要

In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps between slices. Super-resolution (SR) methods aim to address this problem, but previous methods do not address all of the following: slice profile shape estimation, slice gap, domain shift, and non-integer or arbitrary upsampling factors. In this paper, we propose ECLARE (Efficient Cross-planar Learning for Anisotropic Resolution Enhancement), a self-SR method that addresses each of these factors. ECLARE uses a slice profile estimated from the multi-slice 2D MR volume, trains a network to learn the mapping from low-resolution to high-resolution in-plane patches from the same volume, and performs SR with anti-aliasing. We compared ECLARE to cubic B-spline interpolation, SMORE, and other contemporary SR methods. We used realistic and representative simulations so that quantitative performance against ground truth can be computed, and ECLARE outperformed all other methods in both signal recovery and downstream tasks. Importantly, as ECLARE does not use external training data it cannot suffer from domain shift between training and testing. Our code is open-source and available at https://www.github.com/sremedios/eclare.

2503.04613 2026-03-09 cs.RO cs.SY eess.SY

Whole-Body Model-Predictive Control of Legged Robots with MuJoCo

John Z. Zhang, Taylor A. Howell, Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu, Tom Erez, Yuval Tassa, Zachary Manchester

Comments to appear at ICRA 2026

详情
英文摘要

We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr

2603.06401 2026-03-09 eess.SP cs.LG

U6G XL-MIMO Radiomap Prediction: Multi-Config Dataset and Beam Map Approach

Xiaojie Li, Yu Han, Zhizheng Lu, Shi Jin, Chao-Kai Wen

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

The upper 6 GHz (U6G) band with XL-MIMO is a key enabler for sixth-generation wireless systems, yet intelligent radiomap prediction for such systems remains challenging. Existing datasets support only small-scale arrays (up to 8x8) with predominantly isotropic antennas, far from the 1024-element directional arrays envisioned for 6G. Moreover, current methods encode array configurations as scalar parameters, forcing neural networks to extrapolate array-specific radiation patterns, which fails when predicting radiomaps for configurations absent from training data. To jointly address data scarcity and generalization limitations, this paper advances XL-MIMO radiomap prediction from three aspects. To overcome data limitations, we construct the first XL-MIMO radiomap dataset containing 78400 radiomaps across 800 urban scenes, five frequency bands (1.8-6.7 GHz), and nine array configurations up to 32x32 uniform planar arrays with directional elements. To enable systematic evaluation, we establish a comprehensive benchmark framework covering practical scenarios from coverage estimation without field measurements to generalization across unseen configurations and environments. To enable generalization to arbitrary beam configurations without retraining, we propose the beam map, a physics-informed spatial feature that analytically computes array-specific coverage patterns. By decoupling deterministic array radiation from data learned multipath propagation, beam maps shift generalization from neural network extrapolation to physics-based computation. Integrating beam maps into existing architectures reduces mean absolute error by up to 60.0% when generalizing to unseen configurations and up to 50.5% when transferring to unseen environments. The complete dataset and code are publicly available at https://lxj321.github.io/MulticonfigRadiomapDataset/.

2603.06373 2026-03-09 eess.AS

Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction

Séverin Baroudi, Yanis Labrak, Shashi Kumar, Joonas Kalda, Sergio Burdisso, Pawel Cyrta, Juan Ignacio Alvarez-Trejos, Petr Motlicek, Hervé Bredin, Ricard Marxer

Comments Submitted for review at Interspeech 2026

详情
英文摘要

Extracting patient medical conditions from code-switched clinical spoken dialogues is challenging due to rapid turn-taking and highly overlapped speech. We present a robust system evaluated on the DISPLACE-M dataset of real-world Hinglish medical conversations. We propose an End-to-End Neural Diarization with Vector Clustering approach (EEND-VC) to accurately resolve dense and speaker overlaps in Doctor-Patient Conversations (DoPaCo). For transcription, we adapt a Qwen3 ASR model via domain-specific fine-tuning, Devanagari script normalization, and dialogue-level LLM error correction, achieving an 18.59% tcpWER. We benchmark open and proprietary LLMs on medical condition extraction, comparing our text-based cascade system against a multimodal End-to-End (E2E) audio framework. While proprietary E2E models set the performance ceiling, our open cascaded architecture is highly competitive, as it achieved first place out of 25 participants in the DISPLACE-M challenge. All implementations are publicly released.

2603.06361 2026-03-09 cs.LG cs.AI cs.SY eess.SY

CLAIRE: Compressed Latent Autoencoder for Industrial Representation and Evaluation -- A Deep Learning Framework for Smart Manufacturing

Mohammadhossein Ghahramani, Mengchu Zhou

Comments 13 pages. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2026

详情
英文摘要

Accurate fault detection in high-dimensional industrial environments remains a major challenge due to the inherent complexity, noise, and redundancy in sensor data. This paper introduces CLAIRE, i.e., a hybrid end-to-end learning framework that integrates unsupervised deep representation learning with supervised classification for intelligent quality control in smart manufacturing systems. It employs an optimized deep autoencoder to transform raw input into a compact latent space, effectively capturing the intrinsic data structure while suppressing irrelevant or noisy features. The learned representations are then fed into a downstream classifier to perform binary fault prediction. Experimental results on a high-dimensional dataset demonstrate that CLAIRE significantly outperforms conventional classifiers trained directly on raw features. Moreover, the framework incorporates a post hoc phase, using a game-theory-based interpretability technique, to analyze the latent space and identify the most informative input features contributing to fault predictions. The proposed framework highlights the potential of integrating explainable AI with feature-aware regularization for robust fault detection. The modular and interpretable nature of the proposed framework makes it highly adaptable, offering promising applications in other domains characterized by complex, high-dimensional data, such as healthcare, finance, and environmental monitoring.

2603.06338 2026-03-09 eess.IV cs.AI cs.LG cs.SY eess.SY physics.med-ph

AI End-to-End Radiation Treatment Planning Under One Second

Simon Arberet, Riqiang Gao, Martin Kraus, Florin C. Ghesu, Wilko Verbakel, Mamadou Diallo, Anthony Magliari, Venkatesan Karuppusamy, Sushil Beriwal, REQUITE Consortium, Ali Kamen, Dorin Comaniciu

详情
英文摘要

Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differentiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homogeneity (HI = 0.10 $\pm$ 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow.

2603.06332 2026-03-09 eess.AS

Cross-linguistic Prosodic Analysis of Autistic and Non-autistic Child Speech in Finnish, French and Slovak

Ida-Lotta Myllylä, Sofoklis Kakouros

Comments Accepted to Speech Prosody 2026

详情
英文摘要

Prosodic differences in autism are well-documented, but cross-linguistic evidence remains limited. This study investigates prosody in autism across a multilingual corpus of Finnish, French, and Slovak speakers. 88 acoustic features from over 5,000 inter-pausal units were extracted, and data were reduced via Principal Component Analysis (PCA) and analyzed using Linear Mixed-Effects Models (LMMs). Cross-linguistically, autistic speakers exhibited increased general intensity variability and a clearer, less breathy voice quality (higher Harmonics-to-Noise Ratio and alpha ratio), alongside reduced temporal intensity dynamics and lower central f0. Monolingual analyses revealed language-specific nuances: Slovak results aligned with cross-linguistic f0 patterns but diverged on voice quality, while Finnish results mirrored the broader voice quality findings. These results emphasize including voice quality and intensity dynamics in the study of possible language-independent markers of autism, alongside traditional pitch measures. The findings challenge deficiency-based models, suggesting instead a complex, acoustically distinct prosodic profile across languages.

2603.06327 2026-03-09 eess.AS

Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak

Sofoklis Kakouros, Ida-Lotta Myllylä

Comments Accepted to Speech Prosody 2026

详情
英文摘要

We present a cross-linguistic study of speech in autistic and non-autistic children speaking Finnish, French, and Slovak. We combine supervised classification with within-language and cross-corpus transfer experiments to evaluate classification performance within and across languages and to probe which acoustic cues are language-specific versus language-general. Using a large set of acoustic-prosodic features, we implement speaker-level classification benchmarks as an analytical tool rather than to seek state-of-the-art performance. Within-language models, evaluated with speaker-level cross-validation, yielded heterogeneous results. The Finnish model performed best (Accuracy 0.84, F1 0.88), followed by Slovak (Accuracy 0.63, F1 0.68) and French (Accuracy 0.68, F1 0.56). We then tested cross-language generalization. A model trained on all pooled corpora reached an overall Accuracy of 0.61 and F1 0.68. Leave-one-corpus-out experiments, which test transfer to an unseen language, showed moderate success when testing on Slovak (F1 0.70) and Finnish (F1 0.78), but poor transfer to French (F1 0.42). Feature-importance analyses across languages highlighted partially shared, but not fully language-invariant, acoustic markers of autism. These findings suggest that some autism-related speech cues generalize across typologically distinct languages, but robust cross-linguistic classifiers will likely require language-aware modeling and more homogeneous recording conditions.

2603.06279 2026-03-09 cs.CV cs.RO eess.IV

Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise

Wenxin Li, Kunyu Peng, Di Wen, Junwei Zheng, Jiale Wei, Mengfei Duan, Yuheng Zhang, Rui Fan, Kailun Yang

Comments The benchmark and source code will be made publicly available at https://github.com/mylwx/OccNL

详情
英文摘要

3D semantic occupancy prediction is a cornerstone of robotic perception, yet real-world voxel annotations are inherently corrupted by structural artifacts and dynamic trailing effects. This raises a critical but underexplored question: can autonomous systems safely rely on such unreliable occupancy supervision? To systematically investigate this issue, we establish OccNL, the first benchmark dedicated to 3D occupancy under occupancy-asymmetric and dynamic trailing noise. Our analysis reveals a fundamental domain gap: state-of-the-art 2D label noise learning strategies collapse catastrophically in sparse 3D voxel spaces, exposing a critical vulnerability in existing paradigms. To address this challenge, we propose DPR-Occ, a principled label noise-robust framework that constructs reliable supervision through dual-source partial label reasoning. By synergizing temporal model memory with representation-level structural affinity, DPR-Occ dynamically expands and prunes candidate label sets to preserve true semantics while suppressing noise propagation. Extensive experiments on SemanticKITTI demonstrate that DPR-Occ prevents geometric and semantic collapse under extreme corruption. Notably, even at 90% label noise, our method achieves significant performance gains (up to 2.57% mIoU and 13.91% IoU) over existing label noise learning baselines adapted to the 3D occupancy prediction task. By bridging label noise learning and 3D perception, OccNL and DPR-Occ provide a reliable foundation for safety-critical robotic perception in dynamic environments. The benchmark and source code will be made publicly available at https://github.com/mylwx/OccNL.

2603.06254 2026-03-09 cs.CV cs.RO eess.IV

NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

Kai Luo, Xu Wang, Rui Fan, Kailun Yang

Comments Code will be available at https://github.com/xifen523/NOVA

详情
英文摘要

Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' heuristics. To address this, we propose Next-step Open-Vocabulary Autoregression (NOVA), an innovative paradigm that shifts 3D tracking from traditional fragmented distance-based matching toward generative spatio-temporal semantic modeling. NOVA reformulates 3D trajectories as structured spatio-temporal semantic sequences, enabling the simultaneous encoding of physical motion continuity and deep linguistic priors. By leveraging the autoregressive capabilities of Large Language Models (LLMs), we transform the tracking task into a principled process of next-step sequence completion. This mechanism allows the model to explicitly utilize the hierarchical structure of language space to resolve fine-grained semantic ambiguities and maintain identity consistency across complex long-range sequences through high-level commonsense reasoning. Extensive experiments on nuScenes, V2X-Seq-SPD, and KITTI demonstrate the superior performance of NOVA. Notably, on the nuScenes dataset, NOVA achieves an AMOTA of 22.41% for Novel categories, yielding a significant 20.21% absolute improvement over the baseline. These gains are realized through a compact 0.5B autoregressive model. Code will be available at https://github.com/xifen523/NOVA.

2603.06247 2026-03-09 astro-ph.EP astro-ph.IM cs.SY eess.SY

Star-based Navigation in the Outer Solar System

Vittorio Franzese

Comments Accepted for publication in the Journal of Guidance, Control, and Dynamics. This is the author's accepted manuscript. The final version of record will be published by AIAA and will be available at the JGCD website

详情
英文摘要

This paper investigates an autonomous navigation method for spacecraft operating in the outer solar system, up to 250 AU from the Sun, using the parallactic shifts of nearby stars. These measurements enable estimation of the spacecraft trajectory while distant stars provide attitude information through conventional star-pattern matching. Stellar observation models are developed, accounting for delta light-time, parallax, and aberration effects. Navigation performance is assessed using two approaches: (1) a least-squares estimator using simultaneous multi-star measurements, and (2) a Kalman filter processing sequential single-star observations along deep-space trajectories. Monte Carlo simulations on trajectories representative of Voyager 1, Voyager 2, Pioneer 10, Pioneer 11, and New Horizons missions show sub-AU position accuracies at 250 AU, and velocity accuracies better than 0.00004 AU/day, under realistic spacecraft and instrumentation uncertainties. These values correspond to relative errors below 0.4% in position and velocity with respect to the reference trajectories. Although less precise than radiometric tracking, this performance can support navigation in the outer solar system without reliance on Earth. When ground-based navigation remains necessary, this approach can be employed during long cruising phases, lowering the number of ground contacts. The method additionally shows potential for future missions venturing farther from the Sun.

2603.06221 2026-03-09 eess.SP

Set-Prediction-Based J-Peak Detection for Pillow-Based Ballistocardiography

Shengwei Guo, Guobing Sun

详情
英文摘要

J-peak detection in ballistocardiography (BCG) is a key component of unobtrusive heart rate monitoring during sleep. Most existing approaches formulate this task as a dense time-point segmentation problem and rely on heuristic post-processing to convert continuous responses into discrete peak events, resulting in redundant model structures and sensitivity to parameter settings. In this work, we construct and publicly release a pillow-based BCG--ECG dataset consisting of multi-subject, multi-night natural sleep recordings with manually annotated BCG J-peaks. Based on this dataset, we propose a set-prediction-based J-peak detection framework that directly models peaks as discrete temporal events, eliminating the need for high-resolution segmentation heads and explicit peak suppression. Experimental results show that, under a shared convolutional backbone, the proposed method achieves superior detection performance compared to a U-Net-based segmentation baseline, while substantially reducing model parameters and computational complexity. These results indicate that event-level set prediction provides a concise and efficient modeling paradigm for BCG J-peak detection in sleep monitoring.

2603.06217 2026-03-09 cs.AI cs.MA cs.SY eess.SY

Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI

Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl, Hans Auer

Comments 6 pages, 2 figures. Code available at: https://github.com/RedaElMakroum/cdr

详情
英文摘要

Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer little possibility for informed decision-making. This paper introduces Conversational Demand Response (CDR), a coordination mechanism where aggregators and prosumers interact through bidirectional natural language, enabled through agentic AI. A two-tier multi-agent architecture is developed in which an aggregator agent dispatches flexibility requests and a prosumer Home Energy Management System (HEMS) assesses deliverability and cost-benefit by calling an optimization-based tool. CDR also enables prosumer-initiated upstream communication, where changes in preferences can reach the aggregator directly. Proof-of-concept evaluation shows that interactions complete in under 12 seconds. The architecture illustrates how agentic AI can bridge the aggregator-prosumer coordination gap, providing the scalability of automated DR while preserving the transparency, explainability, and user agency necessary for sustained prosumer participation. All system components, including agent prompts, orchestration logic, and simulation interfaces, are released as open source to enable reproducibility and further development.

2603.06206 2026-03-09 eess.SP

MAD: A Multimodal and Multi-perspective Affective Dataset with Hierarchical Annotations

Shengwei Guo, Yunqing Qiao, Wenzhan Zhang, Bo Liu, Yong Wang, Guobing Sun

详情
英文摘要

This work presents MAD (Multimodal Affection Dataset), a multimodal emotion dataset designed for affective computing and neurophysiological modeling. MAD is built upon synchronous collection of diverse physiological signals (EEG, ECG, EOG, EMG, PPG, and BCG) together with tri-view RGB-D facial videos, enabling the observation of emotional dynamics from neural, physiological, and behavioral perspectives. The dataset consists of synchronized recordings from 18 participants and introduces two key contributions. First, it provides temporally aligned multimodal data that jointly capture central neural activity, peripheral physiological responses, and overt facial expressions. Second, it incorporates a three-level emotion annotation framework spanning stimulus elicitation, subjective cognition, and behavioral expression, supporting joint modeling of the full emotion process. To validate the dataset, we conduct systematic benchmark experiments covering intra-subject EEG emotion recognition, cross-subject EEG transfer learning, consistency analysis and emotion classification with cardiac-related signals, multimodal physiological fusion, and multi-view facial emotion recognition. The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.

2603.06193 2026-03-09 cs.SD cs.AI eess.AS

Whisper-CD: Accurate Long-Form Speech Recognition using Multi-Negative Contrastive Decoding

Hoseong Ahn, Jeongyun Chae, Yoonji Park, Kyuhong Shim

Comments Submitted to Interspeech 2026

详情
英文摘要

Long-form speech recognition with large encoder-decoder models such as Whisper often exhibit hallucinations, repetition loops, and content omissions. These errors can accumulate and be further amplified when the previous segment's transcription is used as decoding context. We propose Whisper-CD, a training-free contrastive decoding framework that contrasts clean-audio logits against negative logits computed from three acoustically motivated perturbations: Gaussian noise injection, silence signal, and audio temporal shift. We aggregate these negatives via the log-sum-exp operator, building a unified multi-negative objective for token-by-token decoding. Across five English long-form benchmarks, Whisper-CD reduces WER by up to 24.3pp on CORAAL and shows 48% faster token generation throughput than beam search. Because Whisper-CD operates purely at inference time, it can be applied as a drop-in replacement to already-deployed Whisper systems without retraining.