arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.13204 2026-03-16 eess.AS eess.IV

Bounds on Agreement between Subjective and Objective Measurements

Jaden Pieper, Stephen D. Voran

Comments Currently under review at IEEE Transactions on Multimedia. Submitted 5 November 2025, revised 3 March 2026

详情
英文摘要

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test. Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are "fully data-driven." We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes. Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.

2603.13162 2026-03-16 eess.IV cs.CV

DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

Junqi Shi, Ming Lu, Xingchen Li, Anle Ke, Ruiqi Zhang, Zhan Ma

详情
英文摘要

Diffusion-based image compression has recently shown outstanding perceptual fidelity, yet its practicality is hindered by prohibitive sampling overhead and high memory usage. Most existing diffusion codecs employ U-Net architectures, where hierarchical downsampling forces diffusion to operate in shallow latent spaces (typically with only 8x spatial downscaling), resulting in excessive computation. In contrast, conventional VAE-based codecs work in much deeper latent domains (16x - 64x downscaled), motivating a key question: Can diffusion operate effectively in such compact latent spaces without compromising reconstruction quality? To address this, we introduce DiT-IC, an Aligned Diffusion Transformer for Image Compression, which replaces the U-Net with a Diffusion Transformer capable of performing diffusion in latent space entirely at 32x downscaled resolution. DiT-IC adapts a pretrained text-to-image multi-step DiT into a single-step reconstruction model through three key alignment mechanisms: (1) a variance-guided reconstruction flow that adapts denoising strength to latent uncertainty for efficient reconstruction; (2) a self-distillation alignment that enforces consistency with encoder-defined latent geometry to enable one-step diffusion; and (3) a latent-conditioned guidance that replaces text prompts with semantically aligned latent conditions, enabling text-free inference. With these designs, DiT-IC achieves state-of-the-art perceptual quality while offering up to 30x faster decoding and drastically lower memory usage than existing diffusion-based codecs. Remarkably, it can reconstruct 2048x2048 images on a 16 GB laptop GPU.

2603.13136 2026-03-16 eess.SY cs.SY math.OC

Unifying Decision Making and Trajectory Planning in Automated Driving through Time-Varying Potential Fields

David Costa, Francesco Cerrito, Massimo Canale, Carlo Novara

详情
英文摘要

This paper proposes a unified decision making and local trajectory planning framework based on Time-Varying Artificial Potential Fields (TVAPFs). The TVAPF explicitly models the predicted motion via bounded uncertainty of dynamic obstacles over the planning horizon, using information from perception and V2X sources when available. TVAPFs are embedded into a finite horizon optimal control problem that jointly selects the driving maneuver and computes a feasible, collision free trajectory. The effectiveness and real-time suitability of the approach are demonstrated through a simulation test in a multi-actor scenario with real road topology, highlighting the advantages of the unified TVAPF-based formulation.

2603.13112 2026-03-16 eess.SP

AirGuard: UAV and Bird Recognition Scheme for Integrated Sensing and Communications System

Hongliang Luo, Zhonghua Chu, Tengyu Zhang, Chuanbin Zhao, Bo Lin, Feifei Gao

详情
英文摘要

In this paper, we propose an unmanned aerial vehicle (UAV) and bird recognition scheme with signal processing and deep learning for integrated sensing and communications (ISAC) system. We first provide the basic scene of low-altitude targets monitoring, and formulate the motion equations and echo signals for UAVs and birds. Next, we extract the centralized micro-Doppler (cmD) spectrum and the high resolution range profile (HRRP) of the low-altitude target from the echo signals. Then we design a dual feature fusion enabled low-altitude target recognition network with convolutional neural network (CNN), which employs both the images of cmD spectrum and HRRP as inputs to jointly distinguish between UAV and bird. Meanwhile, we generate 237600 cmD and HRRP image samples to train, validate, and evaluate the designed low-altitude target recognition network. The proposed scheme is termed as AirGuard, whose effectiveness has been demonstrated by simulation results.

2603.13108 2026-03-16 cs.RO cs.CV eess.IV

Panoramic Multimodal Semantic Occupancy Prediction for Quadruped Robots

Guoqiang Zhao, Zhe Yang, Sheng Wu, Fei Teng, Mengfei Duan, Yuanfan Zheng, Kai Luo, Kailun Yang

Comments The dataset and code will be publicly released at https://github.com/SXDR/PanoMMOcc

详情
英文摘要

Panoramic imagery provides holistic 360° visual coverage for perception in quadruped robots. However, existing occupancy prediction methods are mainly designed for wheeled autonomous driving and rely heavily on RGB cues, limiting their robustness in complex environments. To bridge this gap, (1) we present PanoMMOcc, the first real-world panoramic multimodal occupancy dataset for quadruped robots, featuring four sensing modalities across diverse scenes. (2) We propose a panoramic multimodal occupancy perception framework, VoxelHound, tailored for legged mobility and spherical imaging. Specifically, we design (i) a Vertical Jitter Compensation (VJC) module to mitigate severe viewpoint perturbations caused by body pitch and roll during mobility, enabling more consistent spatial reasoning, and (ii) an effective Multimodal Information Prompt Fusion (MIPF) module that jointly leverages panoramic visual cues and auxiliary modalities to enhance volumetric occupancy prediction. (3) We establish a benchmark based on PanoMMOcc and provide detailed data analysis to enable systematic evaluation of perception methods under challenging embodied scenarios. Extensive experiments demonstrate that VoxelHound achieves state-of-the-art performance on PanoMMOcc (+4.16%} in mIoU). The dataset and code will be publicly released to facilitate future research on panoramic multimodal 3D perception for embodied robotic systems at https://github.com/SXDR/PanoMMOcc, along with the calibration tools released at https://github.com/losehu/CameraLiDAR-Calib.

2603.13082 2026-03-16 cs.CV cs.RO eess.IV

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Yebin Yang, Di Wen, Lei Qi, Weitong Kong, Junwei Zheng, Ruiping Liu, Yufan Chen, Chengzhi Wu, Kailun Yang, Yuqian Fu, Danda Pani Paudel, Luc Van Gool, Kunyu Peng

Comments The dataset and code will be released at https://github.com/YNG916/InterEdit

详情
英文摘要

Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion consistency and edit fidelity, achieving state-of-the-art TMME performance. The dataset and code will be released at https://github.com/YNG916/InterEdit.

2603.13050 2026-03-16 eess.SY cs.SY

EMT and RMS Modeling of Thyristor Rectifiers for Stability Analysis of Converter-Based Systems

Ognjen Stanojev, Pol Jane Soneira, Gösta Stomberg, Mario Schweizer

详情
英文摘要

Thyristor rectifiers are a well-established and cost-effective solution for controlled high-power rectification, commonly used for hydrogen electrolysis and HVDC transmission. However, small-signal modeling and analysis of thyristor rectifiers remain challenging due to their line-commutated operation and nonlinear switching dynamics. This paper first revisits conventional RMS-based modeling of thyristor rectifiers and subsequently proposes a novel nonlinear state-space EMT model in the dq domain that can be linearized for small-signal analysis. The proposed model accurately captures all the relevant dynamic phenomena, including PLL dynamics, the commutation process, and switching delays. It is derived in polar coordinates, offering novel insights into the impact of the PLL and commutation angle on the thyristor rectifier dynamics. We verify the RMS and EMT models against a detailed switching model and demonstrate their applicability through small-signal stability analysis of a modified IEEE 39-bus test system that incorporates thyristor rectifier-interfaced hydrogen electrolyzers, synchronous generators, and grid-forming converters.

2603.13035 2026-03-16 eess.SP cs.LG

Association-Aware GNN for Precoder Learning in Cell-Free Systems

Mingyu Deng, Shengqian Han

详情
英文摘要

Deep learning has been widely recognized as a promising approach for optimizing multi-user multi-antenna precoders in traditional cellular systems. However, a critical distinction between cell-free and cellular systems lies in the flexibility of user equipment (UE)-access point (AP) associations. Consequently, the optimal precoder depends not only on channel state information but also on the dynamic UE-AP association status. In this paper, we propose an association-aware graph neural network (AAGNN) that explicitly incorporates association status into the precoding design. We leverage the permutation equivariance properties of the cell-free precoding policy to reduce the training complexity of AAGNN and employ an attention mechanism to enhance its generalization performance. Simulation results demonstrate that the proposed AAGNN outperforms baseline learning methods in both learning performance and generalization capabilities while maintaining low training and inference complexity.

2603.13024 2026-03-16 cs.CV cs.AI cs.LG eess.IV

SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider, Roger Soberanis-Mukul, Yufan He, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Pengfei Guo, Daguang Xu, Mathias Unberath

Comments The manuscript is under review

详情
英文摘要

A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.

2603.13007 2026-03-16 eess.IV cs.CV cs.LG physics.med-ph

Accelerating Stroke MRI with Diffusion Probabilistic Models through Large-Scale Pre-training and Target-Specific Fine-Tuning

Yamin Arefeen, Sidharth Kumar, Steven Warach, Hamidreza Saber, Jonathan Tamir

详情
英文摘要

Purpose: To develop a data-efficient strategy for accelerated MRI reconstruction with Diffusion Probabilistic Generative Models (DPMs) that enables faster scan times in clinical stroke MRI when only limited fully-sampled data samples are available. Methods: Our simple training strategy, inspired by the foundation model paradigm, first trains a DPM on a large, diverse collection of publicly available brain MRI data in fastMRI and then fine-tunes on a small dataset from the target application using carefully selected learning rates and fine-tuning durations. The approach is evaluated on controlled fastMRI experiments and on clinical stroke MRI data with a blinded clinical reader study. Results: DPMs pre-trained on approximately 4000 subjects with non-FLAIR contrasts and fine-tuned on FLAIR data from only 20 target subjects achieve reconstruction performance comparable to models trained with substantially more target-domain FLAIR data across multiple acceleration factors. Experiments reveal that moderate fine-tuning with a reduced learning rate yields improved performance, while insufficient or excessive fine-tuning degrades reconstruction quality. When applied to clinical stroke MRI, a blinded reader study involving two neuroradiologists indicates that images reconstructed using the proposed approach from $2 \times$ accelerated data are non-inferior to standard-of-care in terms of image quality and structural delineation. Conclusion: Large-scale pre-training combined with targeted fine-tuning enables DPM-based MRI reconstruction in data-constrained, accelerated clinical stroke MRI. The proposed approach substantially reduces the need for large application-specific datasets while maintaining clinically acceptable image quality, supporting the use of foundation-inspired diffusion models for accelerated MRI in targeted applications.

2603.12951 2026-03-16 eess.IV cs.CV

Reinforcing the Weakest Links: Modernizing SIENA with Targeted Deep Learning Integration

Riccardo Raciti, Lemuel Puglisi, Francesco Guarnera, Daniele Ravì, Sebastiano Battiato

详情
英文摘要

Percentage Brain Volume Change (PBVC) derived from Magnetic Resonance Imaging (MRI) is a widely used biomarker of brain atrophy, with SIENA among the most established methods for its estimation. However, SIENA relies on classical image processing steps, particularly skull stripping and tissue segmentation, whose failures can propagate through the pipeline and bias atrophy estimates. In this work, we examine whether targeted deep learning substitutions can improve SIENA while preserving its established and interpretable framework. To this end, we integrate SynthStrip and SynthSeg into SIENA and evaluate three pipeline variants on the ADNI and PPMI longitudinal cohorts. Performance is assessed using three complementary criteria: correlation with longitudinal clinical and structural decline, scan-order consistency, and end-to-end runtime. Replacing the skull-stripping module yields the most consistent gains: in ADNI, it substantially strengthens associations between PBVC and multiple measures of disease progression relative to the standard SIENA pipeline, while across both datasets it markedly improves robustness under scan reversal. The fully integrated pipeline achieves the strongest scan-order consistency, reducing the error by up to 99.1%. In addition, GPU-enabled variants reduce execution time by up to 46% while maintaining CPU runtimes comparable to standard SIENA. Overall, these findings show that deep learning can meaningfully strengthen established longitudinal atrophy pipelines when used to reinforce their weakest image processing steps. More broadly, this study highlights the value of modularly modernizing clinically trusted neuroimaging tools without sacrificing their interpretability. Code is publicly available at https://github.com/Raciti/Enhanced-SIENA.git.

2603.12949 2026-03-16 eess.IV cs.CR cs.MM

Editing Away the Evidence: Diffusion-Based Image Manipulation and the Failure Modes of Robust Watermarking

Qian Qi, Jiangyun Tang, Jim Lee, Emily Davis, Finn Carter

Comments Preprint

详情
英文摘要

Robust invisible watermarks are widely used to support copyright protection, content provenance, and accountability by embedding hidden signals designed to survive common post-processing operations. However, diffusion-based image editing introduces a fundamentally different class of transformations: it injects noise and reconstructs images through a powerful generative prior, often altering semantic content while preserving photorealism. In this paper, we provide a unified theoretical and empirical analysis showing that non-adversarial diffusion editing can unintentionally degrade or remove robust watermarks. We model diffusion editing as a stochastic transformation that progressively contracts off-manifold perturbations, causing the low-amplitude signals used by many watermarking schemes to decay. Our analysis derives bounds on watermark signal-to-noise ratio and mutual information along diffusion trajectories, yielding conditions under which reliable recovery becomes information-theoretically impossible. We further evaluate representative watermarking systems under a range of diffusion-based editing scenarios and strengths. The results indicate that even routine semantic edits can significantly reduce watermark recoverability. Finally, we discuss the implications for content provenance and outline principles for designing watermarking approaches that remain robust under generative image editing.

2603.12948 2026-03-16 eess.SP

Identification and Visualization of Correlation Structures in Large-Scale Power Quality Data

Max Domagk, Jan Meyer, Marco Lindner

Comments 5 pages, 10 figures, submitted to IEEE conferences

详情
英文摘要

Large-scale power quality (PQ) measurement campaigns generate vast amounts of multivariate data, in which systematic dependencies are difficult to identify using conventional analysis techniques. This paper presents a methodology for the automated analysis and visualization of correlation structures in large PQ datasets. Building on an existing framework, the approach is adapted for shorter observation periods and enhanced with aggregation and distance-based visualization techniques. Daily Spearman correlation coefficients are averaged via Fishers z-transformation and aggregated across phases, parameters, and sites. The resulting correlation structures are visualized using hierarchical clustering and multidimensional scaling to reveal consistent and recurring relationships. The methodology is demonstrated using data from 85 measurement sites within the German transmission system.

2603.12914 2026-03-16 eess.SP

Joint and Streamwise Distributed MIMO Satellite Communications with Multi-Antenna Ground Users

Parisa Ramezani, Emil Björnson

详情
英文摘要

We consider a low Earth orbit downlink communication, where multiple satellites jointly serve multi-antenna ground users, transmitting multiple spatial streams per user. Using a line-of-sight-dominant satellite channel model with statistical channel state information, including angular information and large-scale fading, we study two distributed transmission modes with different fronthaul requirements. First, for joint transmission, where all satellites transmit all user streams, we formulate a sum spectral efficiency (SE) maximization problem under general convex power constraints and address the intractability of the exact ergodic SE expression by adopting a tractable approximation. Exploiting the equivalence between sum SE maximization and weighted sum mean square error minimization, we derive a novel iterative transceiver design. Second, to reduce fronthaul load, we propose streamwise transmission, where each stream is sent by a single satellite, and develop an eigenmode-based stream-satellite association using participation factors and a maximum-weight bipartite matching problem solved by the Hungarian algorithm. Numerical simulations evaluate the validity of the SE approximation, demonstrate conditions under which streamwise transmission performs nearly optimally or trades SE for lower overhead, highlight the impact of stream/user loading, and show substantial performance gains over conventional benchmarks.

2603.12899 2026-03-16 cs.ET cs.SY eess.SY

A Physics-Based Digital Human Twin for Galvanic-Coupling Wearable Communication Links

Silvia Mura, Chiara Cavigliano, Anna Marcucci, Pietro Savazzi, Anna Vizziello, Maurizio Magarini

详情
英文摘要

This paper presents a systematic characterization of wearable galvanic coupling (GC) channels under narrowband and wideband operation. A physics-consistent digital human twin maps anatomical properties, propagation geometry, and electrode-skin interfaces into complex transfer functions directly usable for communication analysis. Attenuation, phase delay, and group delay are evaluated for longitudinal and radial configurations, and dispersion-induced variability is quantified through attenuation ripple and delay standard deviation metrics versus bandwidth. Results confirm electro-quasistatic, weakly dispersive behavior over 10 kHz-1 MHz. Attenuation is primarily geometry-driven, whereas amplitude ripple and delay variability increase with bandwidth, tightening equalization and synchronization constraints. Interface conditioning (gel and foam) significantly improves amplitude and phase stability, while propagation geometry governs link budget and baseline delay. Overall, the framework quantitatively links tissue electromagnetics to waveform distortion, enabling informed trade-offs among bandwidth, interface design, and transceiver complexity in wearable GC systems.

2603.12896 2026-03-16 eess.SP

Environment-aware Near-field UE Tracking under Partial Blockage and Reflection

Hyunwoo Park, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Sunwoo Kim

Comments 5 pages, 3 figures, conference

详情
英文摘要

This paper proposes an environment-aware near-field (NF) user equipment (UE) tracking method for extremely large aperture arrays. By integrating known surface geometries and tracking the line-of-sight (LOS) and non-line-of-sight (NLOS) indicators per antenna element, the method captures partial blockages and reflections specific to the NF spherical-wavefront regime, which are unavailable under the conventional far-field (FF) assumption. The UE positions are tracked by maximizing the cosine similarity between the predicted and received channels, enabling tracking even under complete LOS obstruction. Simulation results confirm that increasing environment-awareness improves accuracy, and that NF consistently outperforms FF baselines, achieving a $0.22\,\mathrm{m}$ root-mean-square error with full environment-awareness.

2603.05441 2026-03-16 eess.SP cs.SY eess.SY

Near-Optimal Low-Complexity MIMO Detection via Structured Reduced-Search Enumeration

Logeshwaran Vijayan

Comments 6 pages, 10 figures

详情
英文摘要

Maximum-likelihood (ML) detection in high-order MIMO systems is computationally prohibitive due to exponential complexity in the number of transmit layers and constellation size. In this white paper, we demonstrate that for practical MIMO dimensions (up to 8x8) and modulation orders, near-ML hard-decision performance can be achieved using a structured reduced-search strategy with complexity linear in constellation size. Extensive simulations over i.i.d. Rayleigh fading channels show that list sizes of 3|X| for 3x3, 4|X| for 4x4, and 8|X| for 8x8 systems closely match full ML performance, even under high channel condition numbers, |X| being the constellation size. In addition, we provide a trellis based interpretation of the method. We further discuss implications for soft LLR generation and FEC interaction.

2601.07090 2026-03-16 eess.SY cs.SY

Next-Generation Grid Codes: Towards a New Paradigm for Dynamic Ancillary Services

Verena Häberle, Kehao Zhuang, Xiuqiang He, Linbin Huang, Gabriela Hug, Florian Dörfler

Comments 13 pages, 15 figures

详情
英文摘要

This paper introduces a conceptual foundation for Next Generation Grid Codes (NGGCs) based on stability and performance certificates, enabling the provision of dynamic ancillary services such as fast frequency and voltage regulation through decentralized frequency-domain criteria. The NGGC framework offers two key benefits: (i) rigorous closed-loop stability guarantees, and (ii) explicit performance guarantees for frequency and voltage dynamics in power systems. Regarding (i) stability, we employ loop-shifting and passivity-based techniques to derive local frequency-domain stability certificates for individual device dynamics. These certificates ensure the closed-loop stability of the entire interconnected power system through fully decentralized verification. Concerning (ii) performance, we establish quantitative bounds on critical time-domain indicators of system dynamics, including the average-mode frequency and voltage nadirs, the rate-of-change-of-frequency (RoCoF), steady-state deviations, and oscillation damping capabilities. The bounds are obtained by expressing the performance metrics as frequency-domain conditions on local device behavior. The NGGC framework is non-parametric, model-agnostic, and accommodates arbitrary device dynamics under mild assumptions. It thus provides a unified, decentralized approach to certifying both stability and performance without requiring explicit device-model parameterizations. Moreover, the NGGC framework can be directly used as a set of specifications for control design, offering a principled foundation for future stability- and performance-oriented grid codes in power systems.

2510.17176 2026-03-16 eess.SY cs.SY

Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication

Lakshmikanta Sau, Priyadarshi Mukherjee, Sasthi C. Ghosh

Comments To appear in IEEE Transactions on Communications

详情
英文摘要

Reconfigurable intelligent surface (RIS) is a cutting-edge communication technology that has been proposed as aviable option for beyond fifth-generation wireless communication networks. This paper investigates various group selection strategies in the context of grouping-based self-sustainable RIS-aided device-to-device (D2D) communication with spatially correlated wireless channels. Specifically, we consider both power splitting (PS) and time switching (TS) configurations, of the self-sustainable RIS to analyze the system performance and propose appropriate bounds on the choice of system parameters. The analysis takes into account a simplified linear energy harvesting (EH) model as well as a practical non-linear EH model. Based on the application requirements, we propose various group selection strategies at the RIS. Notably, each strategy schedules the k-th best available group at the RIS based on the end-to-end signal-to-noise ratio (SNR) and also the energy harvested at a particular group of the RIS. Accordingly, by using tools from high order statistics, we derive analytical expressions for the outage probability of each selection strategy. Moreover, by applying the tools from extreme value theory, we also investigate an asymptotic scenario, where the number of groups available for selection at an RIS approaches infinity. The nontrivial insights obtained from this approach is especially beneficial in applications like large intelligent surface-aided wireless communication. Finally, the numerical results demonstrate the importance and benefits of the proposed approaches in terms of metrics such as the data throughput and the outage (both data and energy) performance.

2510.11395 2026-03-16 eess.AS

Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training

Haixin Zhao, Kaixuan Yang, Nilesh Madhu

Comments Accepted by ICASSP2026

详情
英文摘要

To further reduce the complexity of lightweight speech enhancement models, we introduce a gating-based Dynamically Slimmable Network (DSN). The DSN comprises static and dynamic components. For architecture-independent applicability, we introduce distinct dynamic structures targeting the commonly used components, namely, grouped recurrent neural network units, multi-head attention, convolutional, and fully connected layers. A policy module adaptively governs the use of dynamic parts at a frame-wise resolution according to the input signal quality, controlling computational load. We further propose Metric-Guided Training (MGT) to explicitly guide the policy module in assessing input speech quality. Experimental results demonstrate that the DSN achieves comparable enhancement performance in instrumental metrics to the state-of-the-art lightweight baseline, while using only 73% of its computational load on average. Evaluations of dynamic component usage ratios indicate that the MGT-DSN can appropriately allocate network resources according to the severity of input signal distortion.

2509.26471 2026-03-16 eess.AS cs.AI

On Deepfake Voice Detection -- It's All in the Presentation

Héctor Delgado, Giorgio Ramondetti, Emanuele Dalmasso, Gennady Karvitsky, Daniele Colibro, Haydar Talib

Comments ICASSP 2026. \c{opyright}IEEE Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
英文摘要

While the technologies empowering malicious audio deepfakes have dramatically evolved in recent years due to generative AI advances, the same cannot be said of global research into spoofing (deepfake) countermeasures. This paper highlights how current deepfake datasets and research methodologies led to systems that failed to generalize to real world application. The main reason is due to the difference between raw deepfake audio, and deepfake audio that has been presented through a communication channel, e.g. by phone. We propose a new framework for data creation and research methodology, allowing for the development of spoofing countermeasures that would be more effective in real-world scenarios. By following the guidelines outlined here we improved deepfake detection accuracy by 39% in more robust and realistic lab setups, and by 57% on a real-world benchmark. We also demonstrate how improvement in datasets would have a bigger impact on deepfake detection accuracy than the choice of larger SOTA models would over smaller models; that is, it would be more important for the scientific community to make greater investment on comprehensive data collection programs than to simply train larger models with higher computational demands.

2509.22327 2026-03-16 eess.SP

Stacked Intelligent Metasurface-Enhanced Wideband Multiuser MIMO OFDM-IM Communications

Zheao Li, Jiancheng An, Chau Yuen

详情
英文摘要

Leveraging the multilayer realization of programmable metasurfaces, stacked intelligent metasurfaces (SIM) enable fine-grained wave-domain control. However, their wideband deployment is impeded by two structural factors: (i) a single, quasi-static SIM phase tensor must adapt to all subcarriers, and (ii) multiuser scheduling changes the subcarrier activation pattern frame by frame, requiring rapid reconfiguration. To address both challenges, we develop a SIM-enhanced wideband multiuser transceiver built on orthogonal frequency-division multiplexing with index modulation (OFDM-IM). The sparse activation of OFDM-IM confines high-fidelity equalization to the active tones, effectively widening the usable bandwidth. To make the design reliability-aware, we directly target the worst-link bit-error rate (BER) and adopt a max-min per-tone signal-to-interference-plus-noise ratio (SINR) as a principled surrogate, turning the reliability optimization tractable. For frame-rate inference and interpretability, we propose an unfolded projected-gradient-descent network (UPGD-Net) that double-unrolls across the SIM's layers and algorithmic iterations: each cell computes the analytic gradient from the cascaded precoder with a learnable per-iteration step size. Simulations on wideband multiuser downlinks show fast, monotone convergence, an evident layer-depth sweet spot, and consistent gains in worst-link BER and sum rate. By combining structural sparsity with a BER-driven, deep-unfolded optimization backbone, the proposed framework directly addresses the key wideband deficiencies of SIM.

2508.01410 2026-03-16 physics.flu-dyn cs.SY eess.SY

Upper bound of transient growth in accelerating and decelerating wall-driven flows using the Lyapunov method

Zhengyang Wei, Weichen Zhao, Chang Liu

Comments 6 pages, 8 figures

详情
英文摘要

This work analyzes accelerating and decelerating wall-driven flows by quantifying the upper bound of transient energy growth using a Lyapunov-type approach. By formulating the linearized Navier-Stokes equations as a linear time-varying system and constructing a time-dependent Lyapunov function, we obtain an upper bound on transient energy growth by solving linear matrix inequalities. This Lyapunov method can obtain the upper bound of transient energy growth that closely matches transient growth computed via the singular value decomposition of the state-transition matrix of linear time-varying systems. Our analysis captures that decelerating base flows exhibit significantly larger transient growth compared with accelerating flows. Our Lyapunov method offers the advantages of providing a certificate of uniform stability and an invariant set to bound the solution trajectory.

2507.15781 2026-03-16 eess.SY cs.SY

Bio-inspired density control of multi-agent swarms via leader-follower plasticity

Gian Carlo Maffettone, Alain Boldini, Mario di Bernardo, Maurizio Porfiri

详情
Journal ref
Volume 188, Automatica, 2026
英文摘要

The design of control systems for the spatial self-organization of mobile agents is an open challenge across several engineering domains, including swarm robotics and synthetic biology. Here, we propose a bio-inspired leader-follower solution, which is aware of energy constraints of mobile agents and is apt to deal with large swarms. Akin to many natural systems, control objectives are formulated for the entire collective, and leaders and followers are allowed to plastically switch their role in time. We frame a density control problem, modeling the agents' population via a system of nonlinear partial differential equations. This approach allows for a compact description that inherently avoids the curse of dimensionality and improves analytical tractability. We derive analytical guarantees for the existence of desired steady-state solutions and their local stability for one-dimensional and higher-dimensional problems. We numerically validate our control methodology, offering support to the effectiveness, robustness, and versatility of our proposed bio-inspired control strategy.

2503.14550 2026-03-16 eess.IV cs.AI cs.CV cs.LG

Novel AI-Based Quantification of Breast Arterial Calcification to Predict Cardiovascular Risk

Theodorus Dapamede, Aisha Urooj, Vedant Joshi, Gabrielle Gershon, Frank Li, Mohammadreza Chavoshi, Beatrice Brown-Mulry, Rohan Satya Isaac, Aawez Mansuri, Chad Robichaux, Chadi Ayoub, Reza Arsanjani, Laurence Sperling, Judy Gichoya, Marly van Assen, Charles W. ONeill, Imon Banerjee, Hari Trivedi

详情
英文摘要

Women are underdiagnosed and undertreated for cardiovascular disease. Automatic quantification of breast arterial calcification on screening mammography can identify women at risk for cardiovascular disease and enable earlier treatment and management of disease. In this retrospective study of 116,135 women from two healthcare systems, a transformer-based neural network quantified BAC severity (no BAC, mild, moderate, and severe) on screening mammograms. Outcomes included major adverse cardiovascular events (MACE) and all-cause mortality. BAC severity was independently associated with MACE after adjusting for cardiovascular risk factors, with increasing hazard ratios from mild (HR 1.18-1.22), moderate (HR 1.38-1.47), to severe BAC (HR 2.03-2.22) across datasets (all p<0.001). This association remained significant across all age groups, with even mild BAC indicating increased risk in women under 50. BAC remained an independent predictor when analyzed alongside ASCVD risk scores, showing significant associations with myocardial infarction, stroke, heart failure, and mortality (all p<0.005). Automated BAC quantification enables opportunistic cardiovascular risk assessment during routine mammography without additional radiation or cost. This approach provides value beyond traditional risk factors, particularly in younger women, offering potential for early CVD risk stratification in the millions of women undergoing annual mammography.

2502.14720 2026-03-16 physics.app-ph cs.SY eess.SY

Advancing Measurement Capabilities in Lithium-Ion Batteries: Exploring the Potential of Fiber Optic Sensors for Thermal Monitoring of Battery Cells

Florian Krause, Felix Schweizer, Alexandra Burger, Franziska Ludewig, Marcus Knips, Katharina Quade, Andreas Wuersig, Dirk Uwe Sauer

详情
Journal ref
Batteries 2026, 12(3), 95
英文摘要

This work demonstrates the potential of fiber optic sensors for measuring thermal effects in lithium-ion batteries, using a fiber optic measurement method of Optical Frequency Domain Reflectometry (OFDR). The innovative application of fiber sensors allows for spatially resolved temperature measurement, particularly emphasizing the importance of monitoring not just the exterior but also the internal conditions within battery cells. Utilizing inert glass fibers as sensors, which exhibit minimal sensitivity to electric fields, opens up new pathways for their implementation in a wide range of applications, such as battery monitoring. The sensors used in this work provide real-time information along the entire length of the fiber, unlike commonly used Fiber Bragg Grating (FBG) sensors. It is shown that using the herein presented novel sensors in a temperature range of 0 to 80 degree celsius reveals a linear thermal dependency with high sensitivity and a local resolution of a few centimeters. Furthermore, this study presents preliminary findings on the potential application of fiber optic sensors in lithium-ion battery (LIB) cells, demonstrating that the steps required for battery integration do not impose any restrictive effects on thermal measurements.

2407.06705 2026-03-16 cs.NI eess.SP

Integrating Atmospheric Sensing and Communications for Resource Allocation in NTNs

Israel Leyva-Mayorga, Fabio Saggese, Lintao Li, Petar Popovski

Comments Submitted for publication to IEEE Transactions on Wireless Communications

详情
英文摘要

The integration of Non-Terrestrial Networks (NTNs) with Low Earth Orbit (LEO) satellite constellations into 5G and Beyond is essential to achieve truly global connectivity. A distinctive characteristic of LEO mega constellations is that they constitute a global infrastructure with predictable dynamics, which enables the pre-planned allocation of radio resources. However, the different bands that can be used for ground-to-satellite communication are affected differently by atmospheric conditions such as precipitation, which introduces uncertainty on the attenuation of the communication links at high frequencies. Based on this, we present a compelling case for applying integrated sensing and communications (ISAC) in heterogeneous and multi-layer LEO satellite constellations over wide areas. Specifically, we propose a sensing-assisted communications framework and frame structure that not only enables the accurate estimation of the atmospheric attenuation in the communication links through sensing but also leverages this information to determine the optimal serving satellites and allocate resources efficiently for downlink communication with users on the ground. The results show that, by dedicating an adequate amount of resources for sensing and solving the association and resource allocation problems jointly, it is feasible to increase the average throughput by 59% and the fairness by 700% when compared to solving these problems separately.

2407.03131 2026-03-16 cs.NE cs.AI eess.SP

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu

Comments Accepted by ICONIP 2025 (Oral). 16 pages, 5 figures

详情
英文摘要

Electroencephalography (EEG), a technique that records electrical activity from the scalp using electrodes, plays a vital role in affective computing. However, fully utilizing the multi-domain characteristics of EEG signals remains a significant challenge. Traditional single-perspective analyses often fail to capture the complex interplay of temporal, frequency, and spatial dimensions in EEG data. To address this, we introduce a multi-view graph transformer (MVGT) based on spatial relations that integrates information across three domains: temporal dynamics from continuous series, frequency features extracted from frequency bands, and inter-channel relationships captured through several spatial encodings. This comprehensive approach allows model to capture the nuanced properties inherent in EEG signals, enhancing its flexibility and representational power. Evaluation on publicly available datasets demonstrates that MVGT surpasses state-of-the-art methods in performance. The results highlight its ability to extract multi-domain information and effectively model inter-channel relationships, showcasing its potential for EEG-based emotion recognition tasks.

2404.07650 2026-03-16 eess.SP

Coexistence of Pull and Push Communication in Wireless Access for IoT Devices

Sara Cavallero, Fabio Saggese, Junya Shiraishi, Shashi Raj Pandey, Chiara Buratti, Petar Popovski

Comments Paper submitted to the 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2024). Copyright may be transferred without further notice

详情
英文摘要

We consider a setup with Internet of Things (IoT), where a base station (BS) collects data from nodes that use two different communication modes. The first is pull-based, where the BS retrieves the data from specific nodes through queries. In addition, the nodes that apply pull-based communication contain a wake-up receiver: upon a query, the BS sends wake-up signal (WuS) to activate the corresponding devices equipped with wake-up receiver (WuDs). The second one is push-based communication, in which the nodes decide when to send to the BS. Consider a time-slotted model, where the time slots in each frame are shared for both pull-based and push-based communications. Therein, this coexistence scenario gives rise to a new type of problem with fundamental trade-offs in sharing communication resources: the objective to serve a maximum number of queries, within a specified deadline, limits the transmission opportunities for push sensors, and vice versa. This work develops a mathematical model that characterizes these trade-offs, validates them through simulations, and optimizes the frame design to meet the objectives of both the pull- and push-based communications.

2312.12025 2026-03-16 eess.SP

Control Aspects for Using RIS in Latency-Constrained Mobile Edge Computing

Fabio Saggese, Victor Croisfelt, Francesca Costanzo, Junya Shiraishi, Radosław Kotaba, Paolo Di Lorenzo, Petar Popovski

Comments Paper submitted to Asilomar Conference on Signals, Systems, and Computers 2023. Copyright may be transferred without further notice

详情
英文摘要

This paper investigates the role and the impact of control operations for dynamic mobile edge computing (MEC) empowered by Reconfigurable Intelligent Surfaces (RISs), in which multiple devices offload their computation tasks to an access point (AP) equipped with an edge server (ES), with the help of the RIS. While usually ignored, the control aspects related to channel estimation (CE), resource allocation (RA), and control signaling play a fundamental role in the user-perceived delay and energy consumption. In general, the higher the resources involved in the control operations, the higher their reliability; however, this introduces an overhead, which reduces the number of resources available for computation offloading, possibly increasing the overall latency experienced. Conversely, a lower control overhead translates to more resources available for computation offloading but impacts the CE accuracy and RA flexibility. This paper establishes a basic framework for integrating the impact of control operations in the performance evaluation of the RIS-aided MEC paradigm, clarifying their trade-offs through theoretical analysis and numerical simulations.