arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.13204 2026-03-16 eess.AS eess.IV

Bounds on Agreement between Subjective and Objective Measurements

Jaden Pieper, Stephen D. Voran

Comments Currently under review at IEEE Transactions on Multimedia. Submitted 5 November 2025, revised 3 March 2026

详情

英文摘要

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test. Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are "fully data-driven." We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes. Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.

URL PDF HTML ☆

赞 0 踩 0

2603.13162 2026-03-16 eess.IV cs.CV

DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

Junqi Shi, Ming Lu, Xingchen Li, Anle Ke, Ruiqi Zhang, Zhan Ma

2603.13136 2026-03-16 eess.SY cs.SY math.OC

Unifying Decision Making and Trajectory Planning in Automated Driving through Time-Varying Potential Fields

David Costa, Francesco Cerrito, Massimo Canale, Carlo Novara

2603.13112 2026-03-16 eess.SP

AirGuard: UAV and Bird Recognition Scheme for Integrated Sensing and Communications System

Hongliang Luo, Zhonghua Chu, Tengyu Zhang, Chuanbin Zhao, Bo Lin, Feifei Gao

2603.13108 2026-03-16 cs.RO cs.CV eess.IV

Panoramic Multimodal Semantic Occupancy Prediction for Quadruped Robots

Guoqiang Zhao, Zhe Yang, Sheng Wu, Fei Teng, Mengfei Duan, Yuanfan Zheng, Kai Luo, Kailun Yang

Comments The dataset and code will be publicly released at https://github.com/SXDR/PanoMMOcc

2603.13082 2026-03-16 cs.CV cs.RO eess.IV

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Yebin Yang, Di Wen, Lei Qi, Weitong Kong, Junwei Zheng, Ruiping Liu, Yufan Chen, Chengzhi Wu, Kailun Yang, Yuqian Fu, Danda Pani Paudel, Luc Van Gool, Kunyu Peng

Comments The dataset and code will be released at https://github.com/YNG916/InterEdit

2603.13050 2026-03-16 eess.SY cs.SY

EMT and RMS Modeling of Thyristor Rectifiers for Stability Analysis of Converter-Based Systems

Ognjen Stanojev, Pol Jane Soneira, Gösta Stomberg, Mario Schweizer

2603.13035 2026-03-16 eess.SP cs.LG

Association-Aware GNN for Precoder Learning in Cell-Free Systems

Mingyu Deng, Shengqian Han

2603.13024 2026-03-16 cs.CV cs.AI cs.LG eess.IV

SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider, Roger Soberanis-Mukul, Yufan He, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Pengfei Guo, Daguang Xu, Mathias Unberath

Comments The manuscript is under review

详情

英文摘要

A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.

URL PDF HTML ☆

赞 0 踩 0

2603.13007 2026-03-16 eess.IV cs.CV cs.LG physics.med-ph

Accelerating Stroke MRI with Diffusion Probabilistic Models through Large-Scale Pre-training and Target-Specific Fine-Tuning

Yamin Arefeen, Sidharth Kumar, Steven Warach, Hamidreza Saber, Jonathan Tamir

详情

英文摘要

Purpose: To develop a data-efficient strategy for accelerated MRI reconstruction with Diffusion Probabilistic Generative Models (DPMs) that enables faster scan times in clinical stroke MRI when only limited fully-sampled data samples are available. Methods: Our simple training strategy, inspired by the foundation model paradigm, first trains a DPM on a large, diverse collection of publicly available brain MRI data in fastMRI and then fine-tunes on a small dataset from the target application using carefully selected learning rates and fine-tuning durations. The approach is evaluated on controlled fastMRI experiments and on clinical stroke MRI data with a blinded clinical reader study. Results: DPMs pre-trained on approximately 4000 subjects with non-FLAIR contrasts and fine-tuned on FLAIR data from only 20 target subjects achieve reconstruction performance comparable to models trained with substantially more target-domain FLAIR data across multiple acceleration factors. Experiments reveal that moderate fine-tuning with a reduced learning rate yields improved performance, while insufficient or excessive fine-tuning degrades reconstruction quality. When applied to clinical stroke MRI, a blinded reader study involving two neuroradiologists indicates that images reconstructed using the proposed approach from $2 \times$ accelerated data are non-inferior to standard-of-care in terms of image quality and structural delineation. Conclusion: Large-scale pre-training combined with targeted fine-tuning enables DPM-based MRI reconstruction in data-constrained, accelerated clinical stroke MRI. The proposed approach substantially reduces the need for large application-specific datasets while maintaining clinically acceptable image quality, supporting the use of foundation-inspired diffusion models for accelerated MRI in targeted applications.

URL PDF HTML ☆

赞 0 踩 0

2603.12951 2026-03-16 eess.IV cs.CV

Reinforcing the Weakest Links: Modernizing SIENA with Targeted Deep Learning Integration

Riccardo Raciti, Lemuel Puglisi, Francesco Guarnera, Daniele Ravì, Sebastiano Battiato

2603.12949 2026-03-16 eess.IV cs.CR cs.MM

Editing Away the Evidence: Diffusion-Based Image Manipulation and the Failure Modes of Robust Watermarking

Qian Qi, Jiangyun Tang, Jim Lee, Emily Davis, Finn Carter

Comments Preprint

2603.12948 2026-03-16 eess.SP

Identification and Visualization of Correlation Structures in Large-Scale Power Quality Data

Max Domagk, Jan Meyer, Marco Lindner

Comments 5 pages, 10 figures, submitted to IEEE conferences

2603.12914 2026-03-16 eess.SP

Joint and Streamwise Distributed MIMO Satellite Communications with Multi-Antenna Ground Users

Parisa Ramezani, Emil Björnson

2603.12899 2026-03-16 cs.ET cs.SY eess.SY

A Physics-Based Digital Human Twin for Galvanic-Coupling Wearable Communication Links

Silvia Mura, Chiara Cavigliano, Anna Marcucci, Pietro Savazzi, Anna Vizziello, Maurizio Magarini

2603.12896 2026-03-16 eess.SP

Environment-aware Near-field UE Tracking under Partial Blockage and Reflection

Hyunwoo Park, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Sunwoo Kim

Comments 5 pages, 3 figures, conference

2603.05441 2026-03-16 eess.SP cs.SY eess.SY

Near-Optimal Low-Complexity MIMO Detection via Structured Reduced-Search Enumeration

Logeshwaran Vijayan

Comments 6 pages, 10 figures

2601.07090 2026-03-16 eess.SY cs.SY

Next-Generation Grid Codes: Towards a New Paradigm for Dynamic Ancillary Services

Verena Häberle, Kehao Zhuang, Xiuqiang He, Linbin Huang, Gabriela Hug, Florian Dörfler

Comments 13 pages, 15 figures

2510.17176 2026-03-16 eess.SY cs.SY

Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication

Lakshmikanta Sau, Priyadarshi Mukherjee, Sasthi C. Ghosh

Comments To appear in IEEE Transactions on Communications

2510.11395 2026-03-16 eess.AS

Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training

Haixin Zhao, Kaixuan Yang, Nilesh Madhu

Comments Accepted by ICASSP2026

2509.26471 2026-03-16 eess.AS cs.AI

On Deepfake Voice Detection -- It's All in the Presentation

Héctor Delgado, Giorgio Ramondetti, Emanuele Dalmasso, Gennady Karvitsky, Daniele Colibro, Haydar Talib

Comments ICASSP 2026. \c{opyright}IEEE Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

2509.22327 2026-03-16 eess.SP

Stacked Intelligent Metasurface-Enhanced Wideband Multiuser MIMO OFDM-IM Communications

Zheao Li, Jiancheng An, Chau Yuen

2508.01410 2026-03-16 physics.flu-dyn cs.SY eess.SY

Upper bound of transient growth in accelerating and decelerating wall-driven flows using the Lyapunov method

Zhengyang Wei, Weichen Zhao, Chang Liu

Comments 6 pages, 8 figures

2507.15781 2026-03-16 eess.SY cs.SY

Bio-inspired density control of multi-agent swarms via leader-follower plasticity

Gian Carlo Maffettone, Alain Boldini, Mario di Bernardo, Maurizio Porfiri

2503.14550 2026-03-16 eess.IV cs.AI cs.CV cs.LG

Novel AI-Based Quantification of Breast Arterial Calcification to Predict Cardiovascular Risk

Theodorus Dapamede, Aisha Urooj, Vedant Joshi, Gabrielle Gershon, Frank Li, Mohammadreza Chavoshi, Beatrice Brown-Mulry, Rohan Satya Isaac, Aawez Mansuri, Chad Robichaux, Chadi Ayoub, Reza Arsanjani, Laurence Sperling, Judy Gichoya, Marly van Assen, Charles W. ONeill, Imon Banerjee, Hari Trivedi

2502.14720 2026-03-16 physics.app-ph cs.SY eess.SY

Advancing Measurement Capabilities in Lithium-Ion Batteries: Exploring the Potential of Fiber Optic Sensors for Thermal Monitoring of Battery Cells

Florian Krause, Felix Schweizer, Alexandra Burger, Franziska Ludewig, Marcus Knips, Katharina Quade, Andreas Wuersig, Dirk Uwe Sauer

2407.06705 2026-03-16 cs.NI eess.SP

Integrating Atmospheric Sensing and Communications for Resource Allocation in NTNs

Israel Leyva-Mayorga, Fabio Saggese, Lintao Li, Petar Popovski

Comments Submitted for publication to IEEE Transactions on Wireless Communications

2407.03131 2026-03-16 cs.NE cs.AI eess.SP

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu

Comments Accepted by ICONIP 2025 (Oral). 16 pages, 5 figures

2404.07650 2026-03-16 eess.SP

Coexistence of Pull and Push Communication in Wireless Access for IoT Devices

Sara Cavallero, Fabio Saggese, Junya Shiraishi, Shashi Raj Pandey, Chiara Buratti, Petar Popovski

Comments Paper submitted to the 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2024). Copyright may be transferred without further notice

2312.12025 2026-03-16 eess.SP

Control Aspects for Using RIS in Latency-Constrained Mobile Edge Computing

Fabio Saggese, Victor Croisfelt, Francesca Costanzo, Junya Shiraishi, Radosław Kotaba, Paolo Di Lorenzo, Petar Popovski

Comments Paper submitted to Asilomar Conference on Signals, Systems, and Computers 2023. Copyright may be transferred without further notice