arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.24578 2026-03-26 cs.CV eess.IV

Vision-Language Models vs Human: Perceptual Image Quality Assessment

Imran Mehmood, Imad Ali Shah, Ming Ronnier Luo, Brian Deegan

详情
英文摘要

Psychophysical experiments remain the most reliable approach for perceptual image quality assessment (IQA), yet their cost and limited scalability encourage automated approaches. We investigate whether Vision Language Models (VLMs) can approximate human perceptual judgments across three image quality scales: contrast, colorfulness and overall preference. Six VLMs four proprietary and two openweight models are benchmarked against psychophysical data. This work presents a systematic benchmark of VLMs for perceptual IQA through comparison with human psychophysical data. The results reveal strong attribute dependent variability models with high human alignment for colorfulness (ρup to 0.93) underperform on contrast and vice-versa. Attribute weighting analysis further shows that most VLMs assign higher weights to colorfulness compared to contrast when evaluating overall preference similar to the psychophysical data. Intramodel consistency analysis reveals a counterintuitive tradeoff: the most self consistent models are not necessarily the most human aligned suggesting response variability reflects sensitivity to scene dependent perceptual cues. Furthermore, human-VLM agreement is increased with perceptual separability, indicating VLMs are more reliable when stimulus differences are clearly expressed.

2603.24566 2026-03-26 eess.SY cs.SY

Integral Control Barrier Functions with Input Delay: Prediction, Feasibility, and Robustness

Adam K. Kiss, Ersin Das, Tamas G. Molnar, Aaron D. Ames

详情
英文摘要

Time delays in feedback control loops can cause controllers to respond too late, and with excessively large corrective actions, leading to unsafe behavior (violation of state constraints) and controller infeasibility (violation of input constraints). To address this problem, we develop a safety-critical control framework for nonlinear systems with input delay using dynamically defined (integral) controllers. Building on the concept of Integral Control Barrier Functions (ICBFs), we concurrently address two fundamental challenges: compensating the effect of delays, while ensuring feasibility when state and input constraints are imposed jointly. To this end, we embed predictor feedback into a dynamically defined control law to compensate for delays, with the predicted state evolving according to delay-free dynamics. Then, utilizing ICBFs, we formulate a quadratic program for safe control design. For systems subject to simultaneous state and input constraints, we derive a closed-form feasibility condition for the resulting controller, yielding a compatible ICBF pair that guarantees forward invariance under delay. We also address robustness to prediction errors (e.g., caused by delay uncertainty) using tunable robust ICBFs. Our approach is validated on an adaptive cruise control example with actuation delay.

2603.24540 2026-03-26 eess.SY cs.SY

A Modular Platooning and Vehicle Coordination Simulator for Research and Education

Kevin Jamsahar, Adrian Wiltz, Maria Charitidou, Dimos V. Dimarogonas

Comments 6 pages

详情
英文摘要

This work presents a modular, Python-based simulator that simplifies the evaluation of novel vehicle control and coordination algorithms in complex traffic scenarios while keeping the implementation overhead low. It allows researchers to focus primarily on developing the control and coordination strategies themselves, while the simulator manages the setup of complex road networks, vehicle configuration, execution of the simulation and the generation of video visualizations of the results. It is thereby also well-suited to support control education by allowing instructors to create interactive exercises providing students with direct visual feedback. Thanks to its modular architecture, the simulator remains easily customizable and extensible, lowering the barrier for conducting advanced simulation studies in vehicle and traffic control research.

2603.24509 2026-03-26 eess.SY cs.SY

Communication-Aware Dissipative Output Feedback Control

Ingyu Jang, Leila J. Bridgeman

Comments 6 pages, 2 figures, Submitted to IEEE Control Systems Letters (LCSS)

详情
英文摘要

Communication-aware control is essential to reduce costs and complexity in large-scale networks. This work proposes a method to design dissipativity-augmented output feedback controllers with reduced online communication. The contributions of this work are three fold: a generalized well-posedness condition for the controller network, a convex relaxation for the constraints that infer stability of a network from dissapativity of its agents, and a synthesis algorithm integrating the Network Dissipativity Theorm, alternating direction method of multipliers, and iterative convex overbounding. The proposed approach yields a sparsely interconnected controller that is both robust and applicable to networks with heterogeneous nonlinear agents. The efficiency of these methods is demonstrated on heterogeneous networks with uncertain and unstable agents, and is compared to standard $\cH_\infty$ control.

2603.24505 2026-03-26 eess.SP

JSSAnet: Theory-Guided Subchannel Partitioning and Joint Spatial Attention for Near-Field Channel Estimation

Zhiming Zhu, Shu Xu, Chunguo Li, Yongming Huang, Luxi Yang

详情
英文摘要

The deployment of extremely large-scale antenna array (ELAA) in sixth-generation (6G) communication systems introduces unique challenges for efficient near-field channel estimation. To tackle these issues, this paper presents a theory-guided approach that incorporates angular information into an attention-based estimation framework. A piecewise Fourier representation is proposed to implicitly encode the near-field channel's inherent nonlinearity, enabling the entire channel to be segmented into multiple subchannels, each mapped to the angular domain via the discrete Fourier transform (DFT). Then, we develop a joint subchannel-spatial-attention network (JSSAnet) to extract the spatial features of both intra- and inter-subchannels. To guide theoretically the design of the joint attention mechanism, we derive upper and lower bounds based on approximation criteria and DFT quantization loss mitigation, respectively. Following by both bounds, a JSSA layer of an attention block is constructed to assign independent and adaptive spatial attention weights to each subchannel in parallel. Subsequently, a feed-forward network (FFN) of an attention block further captures and refines the residual nonlinear dependencies across subchannels. Moreover, the proposed JSSA map is linearly computed via element-wise product combining large-kernel convolutions (DLKC), maintaining strong contextual learning capability. Numerical results verify the effectiveness of embedding sparsity information into the attention network and demonstrate JSSAnet achieves superior estimation performance compared with existing methods.

2603.24503 2026-03-26 cs.LG cs.RO cs.SY eess.SY

Towards Safe Learning-Based Non-Linear Model Predictive Control through Recurrent Neural Network Modeling

Mihaela-Larisa Clement, Mónika Farsang, Agnes Poks, Johannes Edelmann, Manfred Plöchl, Radu Grosu, Ezio Bartocci

详情
英文摘要

The practical deployment of nonlinear model predictive control (NMPC) is often limited by online computation: solving a nonlinear program at high control rates can be expensive on embedded hardware, especially when models are complex or horizons are long. Learning-based NMPC approximations shift this computation offline but typically demand large expert datasets and costly training. We propose Sequential-AMPC, a sequential neural policy that generates MPC candidate control sequences by sharing parameters across the prediction horizon. For deployment, we wrap the policy in a safety-augmented online evaluation and fallback mechanism, yielding Safe Sequential-AMPC. Compared to a naive feedforward policy baseline across several benchmarks, Sequential-AMPC requires substantially fewer expert MPC rollouts and yields candidate sequences with higher feasibility rates and improved closed-loop safety. On high-dimensional systems, it also exhibits better learning dynamics and performance in fewer epochs while maintaining stable validation improvement where the feedforward baseline can stagnate.

2603.24475 2026-03-26 cs.LG cs.SY eess.SY

Conformalized Transfer Learning for Li-ion Battery State of Health Forecasting under Manufacturing and Usage Variability

Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Marcello Canova

Comments Submitted to the 2026 American Control Conference (ACC)

详情
英文摘要

Accurate forecasting of state-of-health (SOH) is essential for ensuring safe and reliable operation of lithium-ion cells. However, existing models calibrated on laboratory tests at specific conditions often fail to generalize to new cells that differ due to small manufacturing variations or operate under different conditions. To address this challenge, an uncertainty-aware transfer learning framework is proposed, combining a Long Short-Term Memory (LSTM) model with domain adaptation via Maximum Mean Discrepancy (MMD) and uncertainty quantification through Conformal Prediction (CP). The LSTM model is trained on a virtual battery dataset designed to capture real-world variability in electrode manufacturing and operating conditions. MMD aligns latent feature distributions between simulated and target domains to mitigate domain shift, while CP provides calibrated, distribution-free prediction intervals. This framework improves both the generalization and trustworthiness of SOH forecasts across heterogeneous cells.

2603.24419 2026-03-26 eess.SY cs.SY

Robust Optimal Operation of Virtual Power Plants Under Decision-Dependent Uncertainty of Price Elasticity

Tao Tan, Rui Xie, Meng Yang, Yue Chen

Comments 9 pages, 9 figures

详情
英文摘要

The rapid deployment of distributed energy resources (DERs) is one of the essential efforts to mitigate global climate change. However, a vast number of small-scale DERs are difficult to manage individually, motivating the introduction of virtual power plants (VPPs). A VPP operator coordinates a group of DERs by setting suitable prices, and aggregates them for interaction with the power grid. In this context, optimal pricing plays a critical role in VPP operation. This paper proposes a robust optimal operation model for VPPs that considers uncertainty in the price elasticity of demand. Specifically, the demand elasticity is found to be influenced by the pricing decision, giving rise to decision-dependent uncertainty (DDU). An improved column-and-constraint (C&CG) algorithm, together with tailored transformation and reformulation techniques, is developed to solve the robust model with DDU efficiently. Case studies based on actual electricity consumption data of London households demonstrate the effectiveness of the proposed model and algorithm.

2603.24385 2026-03-26 eess.AS

ArrayDPS-Refine: Generative Refinement of Discriminative Multi-Channel Speech Enhancement

Zhongweiyang Xu, Ashutosh Pandey, Juan Azcarreta, Zhaoheng Ni, Sanjeel Parekh, Buye Xu

Comments Accepted to ICASSP 2026

详情
英文摘要

Multi-channel speech enhancement aims to recover clean speech from noisy multi-channel recordings. Most deep learning methods employ discriminative training, which can lead to non-linear distortions from regression-based objectives, especially under challenging environmental noise conditions. Inspired by ArrayDPS for unsupervised multi-channel source separation, we introduce ArrayDPS-Refine, a method designed to enhance the outputs of discriminative models using a clean speech diffusion prior. ArrayDPS-Refine is training-free, generative, and array-agnostic. It first estimates the noise spatial covariance matrix (SCM) from the enhanced speech produced by a discriminative model, then uses this estimated noise SCM for diffusion posterior sampling. This approach allows direct refinement of any discriminative model's output without retraining. Our results show that ArrayDPS-Refine consistently improves the performance of various discriminative models, including state-of-the-art waveform and STFT domain models. Audio demos are provided at https://xzwy.github.io/ArrayDPSRefineDemo/.

2603.24381 2026-03-26 physics.soc-ph cs.SY eess.SY

On a Co-evolving Opinion-Leadership Model in Social Networks

Martina Alutto, Lorenzo Zino, Karl H. Johansson, Angela Fontan

Comments 8 pages, 6 figures

详情
英文摘要

Leadership in social groups is often a dynamic characteristic that emerges from interactions and opinion exchange. Empirical evidence suggests that individuals with strong opinions tend to gain influence, at the same time maintaining alignment with the social context is crucial for sustained leadership. Motivated by the social psychology literature that supports these empirical observations, we propose a novel dynamical system in which opinions and leadership co-evolve within a social network. Our model extends the Friedkin-Johnsen framework by making susceptibility to peer influence time-dependent, turning it into the leadership variable. Leadership strengthens when an agent holds strong yet socially aligned opinions, and declines when such alignment is lost, capturing the trade-off between conviction and social acceptance. After illustrating the emergent behavior of this complex system, we formally analyze the coupled dynamics, establishing sufficient conditions for convergence to a non-trivial equilibrium, and examining two time-scale separation regimes reflecting scenarios where opinion and leadership evolve at different speeds.

2603.24328 2026-03-26 eess.SP

Towards Semantic-based Agent Communication Networks: Vision, Technologies, and Challenges

Ping Zhang, Rui Meng, Xiaodong Xu, Yaheng Wang, Zixuan Huang, Yiming Liu, Ruichen Zhang, Yinqiu Liu, Haonan Tong, Huishi Song, Gang Wu, Zhaoming Lu, Jiawen Kang, Geng Sun, Qinghe Du, Zhaohui Yang, Jingxuan Zhang, Han Meng, Lexi Xu, Haitao Zhao, Zesong Fei, Yiqing Zhou, Pei Xiao, Meixia Tao, Qinyu Zhang, Shuguang Cui, Rahim Tafazolli

Comments 46 pages, 15 figures

详情
英文摘要

The International Telecommunication Union (ITU) identifies "Artificial Intelligence (AI) and Communication" as one of six key usage scenarios for 6G. Agentic AI, characterized by its ca-pabilities in multi-modal environmental sensing, complex task coordination, and continuous self-optimization, is anticipated to drive the evolution toward agent-based communication net-works. Semantic communication (SemCom), in turn, has emerged as a transformative paradigm that offers task-oriented efficiency, enhanced reliability in complex environments, and dynamic adaptation in resource allocation. However, comprehensive reviews that trace their technologi-cal evolution in the contexts of agent communications remain scarce. Addressing this gap, this paper systematically explores the role of semantics in agent communication networks. We first propose a novel architecture for semantic-based agent communication networks, structured into three layers, four entities, and four stages. Three wireless agent network layers define the logical structure and organization of entity interactions: the intention extraction and understanding layer, the semantic encoding and processing layer, and the distributed autonomy and collabora-tion layer. Across these layers, four AI agent entities, namely embodied agents, communication agents, network agents, and application agents, coexist and perform distinct tasks. Furthermore, four operational stages of semantic-enhanced agentic AI systems, namely perception, memory, reasoning, and action, form a cognitive cycle guiding agent behavior. Based on the proposed architecture, we provide a comprehensive review of the state-of-the-art on how semantics en-hance agent communication networks. Finally, we identify key challenges and present potential solutions to offer directional guidance for future research in this emerging field.

2603.24268 2026-03-26 eess.SP

Incremental Learning-Based Open-Set Classification of Unknown UAVs via RF Signal Semantics

Julie Liu, Irshad A. Meer, Cicek Cavdar, Mustafa Ozger

Comments Accepted in ICC 2026

详情
英文摘要

The proliferation of civilian and commercial unmanned aerial vehicles (UAVs) has heightened the demand for reliable radio frequency (RF)-based drone identification systems that can operate under dynamic and uncertain airspace conditions. Most existing RF-based recognition methods adopt a closed-set assumption, where all UAV types are known during training. Such an assumption becomes unrealistic in practical deployments, as new or unknown UAVs frequently emerge, leading to overconfident misclassifications and inefficient retraining cycles. To address these challenges, this paper proposes a unified incremental open-set learning framework for RF-based UAV recognition that enables both novel class discovery and incremental adaptation. The framework first performs open-set recognition to separate unknown signals from known classes in the semantic feature space, followed by an unsupervised clustering module that discovers new UAV categories by selecting between K-Means and Gaussian Mixture Models (GMM) based on composite validity scores. Subsequently, a lightweight incremental learning module integrates the newly discovered classes through a memory-bounded replay mechanism that mitigates catastrophic forgetting. Experiments on a real-world UAV RF dataset comprising 24 classes (18 known and 6 unknown) show effective open-set detection, promising clustering performance under the evaluated noise settings, and stable incremental adaptation with minimal storage cost, supporting the potential of the proposed framework for open-world UAV recognition.

2603.24251 2026-03-26 eess.SY cs.SY

Spatial Correlation, Non-Stationarity, and Degrees of Freedom of Holographic Curvature-Reconfigurable Apertures

Liuxun Xue, Shu Sun, Ruifeng Gao, Xiaoqian Yi

Comments 16 pages, 14figures

详情
英文摘要

Low-altitude wireless platforms increasingly require lightweight, conformal, and densely sampled antenna array apertures with high array gain and spatial selectivity. However, when deployed on nonplanar surfaces, curvature alters the array manifold, local visibility, and propagation support, potentially invalidating spatial-stationarity assumptions. In this paper, we investigate a holographic curvature-reconfigurable aperture (HoloCuRA), modeled as a curvature-controllable holographic surface, and develop a visibility-aware spatial characterization framework for its low-altitude applications. Specifically, the framework jointly quantifies array-domain spatial non-stationarity (SnS), and spatial degrees of freedom (DoF) in line-of-sight, 3GPP non-line-of-sight, and isotropic-scattering propagation environments. For SnS, a novel Power-balanced, Visibility-aware Correlation-Matrix Distance (PoVi-CMD) and a two-stage subarray-screening procedure are introduced. For DoF, the Rényi-2 effective rank is adopted, and tractable spatial-correlation expressions under isotropic scattering are developed for efficient DoF analysis. Furthermore, a realizable antenna port mode is introduced to connect SnS with DoF. Numerical results reveal that curvature and propagation support are the primary determinants of both SnS and DoF in HoloCuRA: array domain SnS determines whether subarray statistics can be treated as locally consistent, whereas DoF limits the global spatial modes. The findings provide useful guidance for low-altitude antenna-system design.

2603.24241 2026-03-26 eess.SY cs.LG cs.SY

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Guihlerme Daubt, Adrian Redder

详情
英文摘要

Safe navigation in complex environments remains a central challenge for reinforcement learning (RL) in robotics. This paper introduces Continuous Space-Time Empowerment for Physics-informed (C-STEP) safe RL, a novel measure of agent-centric safety tailored to deterministic, continuous domains. This measure can be used to design physics-informed intrinsic rewards by augmenting positive navigation reward functions. The reward incorporates the agents internal states (e.g., initial velocity) and forward dynamics to differentiate safe from risky behavior. By integrating C-STEP with navigation rewards, we obtain an intrinsic reward function that jointly optimizes task completion and collision avoidance. Numerical results demonstrate fewer collisions, reduced proximity to obstacles, and only marginal increases in travel time. Overall, C-STEP offers an interpretable, physics-informed approach to reward shaping in RL, contributing to safety for agentic mobile robotic systems.

2603.24180 2026-03-26 cs.IT eess.SP math.IT

RIS-Assisted D-MIMO for Energy-Efficient 6G Indoor Networks

Akshay Vayal Parambath, Jose Flordelis, Venkatesh Tentu, Charitha Madapatha, Fredrik Rusek, Erik Bengtsson, Tommy Svensson

Comments 6 pages, 5 figures, Accepted to the IEEE International Conference on Communications (ICC) 2026

详情
英文摘要

We propose an alternating optimization framework for maximizing energy efficiency (EE) in reconfigurable intelligent surface (RIS) assisted distributed MIMO (D-MIMO) systems under both coherent and non-coherent reception modes. The framework jointly optimizes access point (AP) power allocation and RIS phase configurations to improve EE under per-AP power and signal-to-interference-plus-noise ratio (SINR) constraints. Using majorization-minimization for power allocation together with per-element RIS adaptation, the framework achieves tractable optimization of this non-convex problem. Simulation results for indoor deployments with realistic power-consumption models show that the proposed scheme outperforms equal-power and random-scatterer baselines, with clear EE gains. We evaluate the performance of both reception modes and quantify the impact of RIS phase-shift optimization, RIS controller architectures (centralized vs. per-RIS control), and RIS size, providing design insights for practical RIS-assisted D-MIMO deployments in future 6G networks.

2603.22554 2026-03-26 eess.SY cs.SY

A Model Predictive Control Approach to Dual-Axis Agrivoltaic Panel Tracking

Anna Stuhlmacher, Panupong Srisuthankul, Johanna L. Mathieu, Peter Seiler

Comments 10 pages

详情
英文摘要

Agrivoltaic systems--photovoltaic (PV) panels installed above agricultural land--have emerged as a promising dual-use solution to address competing land demands for food and energy production. In this paper, we propose a model predictive control (MPC) approach to dual-axis agrivoltaic panel tracking control that dynamically adjusts panel positions in real time to maximize power production and crop yield given solar irradiance and ambient temperature measurements. We apply convex relaxations and shading factor approximations to reformulate the MPC optimization problem as a convex second-order cone program that determines the PV panel position adjustments away from the sun-tracking trajectory. Through case studies, we demonstrate our approach, exploring the Pareto front between i) an approach that maximizes power production without considering crop needs and ii) crop yield with no agrivoltaics. We also conduct a case study exploring the impact of forecast error on MPC performance. We find that dynamically adjusting agrivoltaic panel position helps us actively manage the trade-offs between power production and crop yield, and that active panel control enables the agrivoltaic system to achieve land equivalent ratio values of up to 1.897.

2603.19995 2026-03-26 eess.IV

Goal-Oriented Framework for Optical Flow-based Multi-User Multi-Task Video Transmission

Yujie Xu, Shutong Chen, Nan Li, Yansha Deng, Jinhong Yuan, Robert Schober

详情
英文摘要

Efficient multi-user multi-task video transmission is an important research topic within the realm of current wireless communication systems. To reduce the transmission burden and save communication resources, we propose a goal-oriented semantic communication framework for optical flow-based multi-user multi-task video transmission (OF-GSC). At the transmitter, we design a semantic encoder that consists of a motion extractor and a patch-level optical flow-based semantic representation extractor to effectively identify and select important semantic representations. At the receiver, we design a transformer-based semantic decoder for high-quality video reconstruction and video classification tasks. To minimize the communication time, we develop a deep deterministic policy gradient (DDPG)-based bandwidth allocation algorithm for multi-user transmission. For video reconstruction tasks, our OF-GSC framework achieves a significant improvement in the received video quality, as evidenced by a 13.47% increase in the structural similarity index measure (SSIM) score in comparison to DeepJSCC. For video classification tasks, OF-GSC achieves a Top-1 accuracy slightly surpassing the performance of VideoMAE with only 25% required data under the same mask ratio of 0.3. For bandwidth allocation optimization, our DDPG-based algorithm reduces the maximum transmission time by 25.97% compared with the baseline equal-bandwidth allocation scheme.

2603.15934 2026-03-26 math.OC cs.MS cs.SY eess.SY

Fast Relax-and-Round Unit Commitment with Economic Horizons

Shaked Regev, Eve Tsybina, Slaven Peles

Comments 6 pages (journal limit), 6 figures

详情
英文摘要

We expand our novel computational method for unit commitment (UC) to include long-horizon planning. We introduce a fast novel algorithm to commit hydro-generators, provably accurately. We solve problems with thousands of generators at 5 minute market intervals. We show that our method can solve interconnect size UC problems in approximately 1 minute on a commodity hardware and that an increased planning horizon leads to sizable operational cost savings (our objective). This scale is infeasible for current state-of-the-art tools. We attain this runtime improvement by introducing a heuristic tailored for UC problems. Our method can be implemented using existing continuous optimization solvers and adapted for different applications. Combined, the two algorithms would allow an operator operating large systems with hydro units to make horizon-aware economic decisions.

2602.11842 2026-03-26 eess.SY cs.SY

A day-ahead market model for power systems: benchmarking and security implications

Andrej Stankovski, Blazhe Gjorgiev, James Ciyu Qin, Giovanni Sansavini

详情
英文摘要

Power system security assessments, e.g. via cascading outage models, often use operational set-points based on optimal power flow (OPF) dispatch. However, driven by cost minimization, OPF provides an ideal, albeit unrealistic, clearing of the generating units that disregards the complex interactions among market participants. In addition, existing market modeling tools often utilize economic dispatch and unit commitment to minimize total system costs, often disregarding the profit-driven behavior of market participants. The security of the system, therefore, may be overestimated. To address this gap, we introduce a social-welfare-based day-ahead market-clearing model. The security implications are analyzed using Cascades, a model for cascading failure analysis. We apply this model to the IEEE-118 bus system with three independent control zones. The results show that market dispatch leads to an increase in demand not served (DNS) of up to 80% higher than OPF, highlighting a significant security overestimation. This is especially pronounced in large-scale cascading events with DNS above 100MW. A key driver is the increased dispatch of storage and gas units, which can place the system in critical operating conditions. Operators can use this information to properly estimate the impact of the market on system security and plan efficient expansion strategies.

2601.15368 2026-03-26 cs.CV eess.IV

Aligned Stable Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Junqiu Yu, Chenjie Cao, Xiangyang Xue, Yanwei Fu

Comments Extension of our CVPR 2025 highlight paper: arXiv:2312.04831. The paper was submitted to cs.CV but was classified under eess.IV. The authors made an appeal but have not received a response for one month. Therefore, we update the comment to clarify the category

详情
英文摘要

Generative image inpainting can produce realistic, high-fidelity results even with large, irregular masks. However, existing methods still face key issues that make inpainted images look unnatural. In this paper, we identify two main problems: (1) Unwanted object insertion: generative models may hallucinate arbitrary objects in the masked region that do not match the surrounding context. (2) Color inconsistency: inpainted regions often exhibit noticeable color shifts, leading to smeared textures and degraded image quality. We analyze the underlying causes of these issues and propose efficient post-hoc solutions for pre-trained inpainting models. Specifically, we introduce the principled framework of Aligned Stable inpainting with UnKnown Areas prior (ASUKA). To reduce unwanted object insertion, we use reconstruction-based priors to guide the generative model, suppressing hallucinated objects while preserving generative flexibility. To address color inconsistency, we design a specialized VAE decoder that formulates latent-to-image decoding as a local harmonization task. This design significantly reduces color shifts and produces more color-consistent results. We implement ASUKA on two representative inpainting architectures: a U-Net-based model and a DiT-based model. We analyze and propose lightweight injection strategies that minimize interference with the model's original generation capacity while ensuring the mitigation of the two issues. We evaluate ASUKA using the Places2 dataset and MISATO, our proposed diverse benchmark. Experiments show that ASUKA effectively suppresses object hallucination and improves color consistency, outperforming standard diffusion, rectified flow models, and other inpainting methods. Dataset, models and codes will be released in github.

2510.12684 2026-03-26 cs.RO cs.SY eess.SY

Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning

Alvaro Belmonte-Baeza, Miguel Cazorla, Gabriel J. García, Carlos J. Pérez-Del-Pulgar, Jorge Pomares

Comments This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published

详情
Journal ref
2025 International Conference on Space Robotics (iSpaRo)
英文摘要

Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.

2510.08161 2026-03-26 eess.SP

Attitude and Heading Estimation in Symmetrical Inertial Arrays

Yaakov Libero, Itzik Klein

详情
英文摘要

Attitude and heading reference systems (AHRS) play a central role in autonomous navigation systems on land, air and maritime platforms. AHRS utilize inertial sensor measurements to estimate platform orientation. In recent years, there has been increasing interest in multiple inertial measurement units (MIMU) arrays to improve navigation accuracy and robustness. A particularly challenging MIMU implementation is the gyro-free (GF) configuration, in which angular velocity is derived solely from accelerometer measurements. While the GF configurations have multiple benefits, including outlier detection and in angular acceleration measurements, their main drawbacks are inherent instability and an increased divergence rate. To address these shortcomings, we introduce a novel symmetrical MIMU formulation, in which the IMUs are arranged in symmetric diagonal pairs to decouple linear and rotational acceleration components. To this end, we derive the theoretical foundations for the symmetrical MIMU formulation of the GF equations, develop a nonlinear least squares estimation process, and integrate statistical hypothesis testing into an AHRS error-state extended Kalman filter. We validate our approach using real-world datasets containing 85 minutes of navigation data recorded on both airborne and land platforms. Our results demonstrated a 30\% average reduction in attitude estimation errors, rotation detection accuracy exceeding 95\% improvement, and significantly improved stability compared to a standard GF implementation. These results enable reliable GF navigation in applications where gyroscopes are unavailable, unreliable, or energy-constrained. Common examples include miniature platforms, computational-constraint platforms, and long-endurance marine platforms.

2506.20334 2026-03-26 eess.SY cs.LG cs.SY

Recurrent neural network-based robust control systems with regional properties and application to MPC design

Daniele Ravasio, Alessio La Bella, Marcello Farina, Andrea Ballarino

Comments 27 pages, 5 figures

详情
英文摘要

This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark.

2406.03138 2026-03-26 cs.SD eess.AS

An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech

Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Comments 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

详情
英文摘要

Speech-based depression detection tools could aid early screening. Here, we propose an interpretable speech foundation model approach to enhance the clinical applicability of such tools. We introduce a speech-level Audio Spectrogram Transformer (AST) to detect depression using long-duration speech instead of short segments, along with a novel interpretation method that reveals prediction-relevant acoustic features for clinician interpretation. Our experiments show the proposed model outperforms a segment-level AST, highlighting the impact of segment-level labelling noise and the advantage of leveraging longer speech duration for more reliable depression detection. Through interpretation, we observe our model identifies reduced loudness and F0 as relevant depression signals, aligning with documented clinical findings. This interpretability supports a responsible AI approach for speech-based depression detection, rendering such tools more clinically applicable.

2312.00357 2026-03-26 eess.IV cs.CV cs.LG

A Generalizable Deep Learning System for Cardiac MRI

Rohan Shad, Cyril Zakka, Dhamanpreet Kaur, Mrudang Mathur, Robyn Fong, Joseph Cho, Ross Warren Filice, John Mongan, Kimberly Kalianos, Nishith Khandwala, David Eng, Matthew Leipzig, Walter R. Witschey, Alejandro de Feria, Victor A. Ferrari, Euan A. Ashley, Michael A. Acker, Curtis Langlotz, William Hiesinger

Comments Published in Nature Biomedical Engineering; Supplementary Appendix available on publisher website. Code: https://github.com/rohanshad/cmr_transformer

详情
Journal ref
Nat. Biomed. Eng (2026)
英文摘要

Cardiac MRI allows for a comprehensive assessment of myocardial structure, function and tissue characteristics. Here we describe a foundational vision system for cardiac MRI, capable of representing the breadth of human cardiovascular disease and health. Our deep-learning model is trained via self-supervised contrastive learning, in which visual concepts in cine-sequence cardiac MRI scans are learned from the raw text of the accompanying radiology reports. We train and evaluate our model on data from four large academic clinical institutions in the United States. We additionally showcase the performance of our models on the UK BioBank and two additional publicly available external datasets. We explore emergent capabilities of our system and demonstrate remarkable performance across a range of tasks, including the problem of left-ventricular ejection fraction regression and the diagnosis of 39 different conditions such as cardiac amyloidosis and hypertrophic cardiomyopathy. We show that our deep-learning system is capable of not only contextualizing the staggering complexity of human cardiovascular disease but can be directed towards clinical problems of interest, yielding impressive, clinical-grade diagnostic accuracy with a fraction of the training data typically required for such tasks.

2306.17466 2026-03-26 eess.IV cs.CV

MedAugment: Universal Automatic Data Augmentation Plug-in for Medical Image Analysis

Zhaoshan Liu, Qiujie Lv, Yifan Li, Ziduo Yang, Lei Shen

Comments Knowledge-Based Systems Accepted

详情
英文摘要

Data augmentation (DA) has been widely leveraged in computer vision to alleviate data shortage, while its application in medical imaging faces multiple challenges. The prevalent DA approaches in medical image analysis encompass conventional DA, synthetic DA, and automatic DA. However, these approaches may result in experience-driven design and intensive computation costs. Here, we propose a suitable yet general automatic DA method for medical images termed MedAugment. We propose pixel and spatial augmentation spaces and exclude the operations that can break medical details and features. Besides, we propose a sampling strategy by sampling a limited number of operations from the two spaces. Moreover, we present a hyperparameter mapping relationship to produce a rational augmentation level and make the MedAugment fully controllable using a single hyperparameter. These configurations settle the differences between natural and medical images. Extensive experimental results on four classification and four segmentation datasets demonstrate the superiority of MedAugment. Compared with existing approaches, the proposed MedAugment prevents producing color distortions or structural alterations while involving negligible computational overhead. Our method can serve as a plugin without an extra training stage, offering significant benefits to the community and medical experts lacking a deep learning foundation. The code is available at https://github.com/NUS-Tim/MedAugment.

2603.24144 2026-03-26 cs.SD eess.AS

Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model

Kangxiang Xia, Bingshen Mu, Xian Shi, Jin Xu, Lei Xie

Comments Accepted by ICME 2026

详情
英文摘要

Achieving natural full-duplex interaction in spoken dialogue systems (SDS) remains a challenge due to the difficulty of accurately detecting user interruptions. Current solutions are polarized between "trigger-happy" VAD-based methods that misinterpret backchannels and robust end-to-end models that exhibit unacceptable response delays. Moreover, the absence of real-world benchmarks and holistic metrics hinders progress in the field. This paper presents a comprehensive frame-work to overcome these limitations. We first introduce SID-Bench, the first benchmark for semantic-aware interruption detection built entirely from real-world human dialogues. To provide a rigorous assessment of the responsiveness-robustness trade-off, we propose the Average Penalty Time (APT) metric, which assigns a temporal cost to both false alarms and late responses. Building on this framework, we design an LLM-based detection model optimized through a novel training paradigm to capture subtle semantic cues of intent. Experimental results show that our model significantly outperforms mainstream baselines, achieving a nearly threefold reduction in APT. By successfully resolving the long-standing tension between speed and stability, our work establishes a new state-of-the-art for intelligent interruption handling in SDS. To facilitate future research, SID-Bench and the associated code are available at: https://github.com/xkx-hub/SID-bench.

2603.24138 2026-03-26 cs.LG cs.SY eess.SY

Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

Lukas Theiner, Maik Pfefferkorn, Yongpeng Zhao, Sebastian Hirt, Rolf Findeisen

Comments 8 pages, 4 figures, accepted for ECC 2026

详情
英文摘要

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.

2603.24130 2026-03-26 cs.RO cs.SY eess.SY

Equivariant Filter Transformations for Consistent and Efficient Visual--Inertial Navigation

Chungeng Tian, Fenghua He, Ning Hao

Comments 28 papes, 11 figures

详情
英文摘要

This paper presents an equivariant filter (EqF) transformation approach for visual--inertial navigation. By establishing analytical links between EqFs with different symmetries, the proposed approach enables systematic consistency design and efficient implementation. First, we formalize the mapping from the global system state to the local error-state and prove that it induces a nonsingular linear transformation between the error-states of any two EqFs. Second, we derive transformation laws for the associated linearized error-state systems and unobservable subspaces. These results yield a general consistency design principle: for any unobservable system, a consistent EqF with a state-independent unobservable subspace can be synthesized by transforming the local coordinate chart, thereby avoiding ad hoc symmetry analysis. Third, to mitigate the computational burden arising from the non-block-diagonal Jacobians required for consistency, we propose two efficient implementation strategies. These strategies exploit the Jacobians of a simpler EqF with block-diagonal structure to accelerate covariance operations while preserving consistency. Extensive Monte Carlo simulations and real-world experiments validate the proposed approach in terms of both accuracy and runtime.

2603.24109 2026-03-26 eess.IV cs.AI cs.CV

Comparative analysis of dual-form networks for live land monitoring using multi-modal satellite image time series

Iris Dumeur, Jérémy Anger, Gabriele Facciolo

详情
英文摘要

Multi-modal Satellite Image Time Series (SITS) analysis faces significant computational challenges for live land monitoring applications. While Transformer architectures excel at capturing temporal dependencies and fusing multi-modal data, their quadratic computational complexity and the need to reprocess entire sequences for each new acquisition limit their deployment for regular, large-area monitoring. This paper studies various dual-form attention mechanisms for efficient multi-modal SITS analysis, that enable parallel training while supporting recurrent inference for incremental processing. We compare linear attention and retention mechanisms within a multi-modal spectro-temporal encoder. To address SITS-specific challenges of temporal irregularity and unalignment, we develop temporal adaptations of dual-form mechanisms that compute token distances based on actual acquisition dates rather than sequence indices. Our approach is evaluated on two tasks using Sentinel-1 and Sentinel-2 data: multi-modal SITS forecasting as a proxy task, and real-world solar panel construction monitoring. Experimental results demonstrate that dual-form mechanisms achieve performance comparable to standard Transformers while enabling efficient recurrent inference. The multimodal framework consistently outperforms mono-modal approaches across both tasks, demonstrating the effectiveness of dual mechanisms for sensor fusion. The results presented in this work open new opportunities for operational land monitoring systems requiring regular updates over large geographic areas.