arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28173 2026-03-31 cs.LG cs.AI

Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling

Weiqi Chen, Wenwei Wang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, Liang Sun

详情

英文摘要

Data-driven weather models have advanced global medium-range forecasting, yet high-resolution regional prediction remains challenging due to unresolved multiscale interactions between large-scale dynamics and small-scale processes such as terrain-induced circulations and coastal effects. This paper presents a global-regional coupling framework for kilometer-scale regional weather forecasting that synergistically couples a pretrained Transformer-based global model with a high-resolution regional network via a novel bidirectional coupling module, ScaleMixer. ScaleMixer dynamically identifies meteorologically critical regions through adaptive key-position sampling and enables cross-scale feature interaction through dedicated attention mechanisms. The framework produces forecasts at $0.05^\circ$ ($\sim 5 \mathrm{km}$ ) and 1-hour resolution over China, significantly outperforming operational NWP and AI baselines on both gridded reanalysis data and real-time weather station observations. It exhibits exceptional skill in capturing fine-grained phenomena such as orographic wind patterns and Foehn warming, demonstrating effective global-scale coherence with high-resolution fidelity. The code is available at https://anonymous.4open.science/r/ScaleMixer-6B66.

URL PDF HTML ☆

赞 0 踩 0

2603.28167 2026-03-31 cs.LG

Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

Ane G Domingo-Aldama, Marcos Merino Prado, Alain García Olea, Josu Goikoetxea, Koldo Gojenola, Aitziber Atutxa

2603.28163 2026-03-31 cs.CL

From Reviews to Requirements: Can LLMs Generate Human-Like User Stories?

Shadman Sakib, Oishy Fatema Akhand, Tasnia Tasneem, Shohel Ahmed

2603.28162 2026-03-31 cs.CV

ColorFLUX: A Structure-Color Decoupling Framework for Old Photo Colorization

Bingchen Li, Zhixin Wang, Fan Li, Jiaqi Xu, Jiaming Guo, Renjing Pei, Xin Li, Zhibo Chen

Comments Accepted by CVPR26

2603.28159 2026-03-31 cs.CV

Event-Based Method for High-Speed 3D Deformation Measurement under Extreme Illumination Conditions

Banglei Guan, Yifei Bian, Zibin Liu, Haoyang Li, Xuanyu Bai, Taihang Lei, Bin Li, Yang Shang, Qifeng Yu

Comments Exp Mech (2026)

详情

DOI: 10.1007/s11340-026-01286-2

英文摘要

Background: Large engineering structures, such as space launch towers and suspension bridges, are subjected to extreme forces that cause high-speed 3D deformation and compromise safety. These structures typically operate under extreme illumination conditions. Traditional cameras often struggle to handle strong light intensity, leading to overexposure due to their limited dynamic range. Objective: Event cameras have emerged as a compelling alternative to traditional cameras in high dynamic range and low-latency applications. This paper presents an integrated method, from calibration to measurement, using a multi-event camera array for high-speed 3D deformation monitoring of structures in extreme illumination conditions. Methods: Firstly, the proposed method combines the characteristics of the asynchronous event stream and temporal correlation analysis to extract the corresponding marker center point. Subsequently, the method achieves rapid calibration by solving the Kruppa equations in conjunction with a parameter optimization framework. Finally, by employing a unified coordinate transformation and linear intersection, the method enables the measurement of 3D deformation of the target structure. Results: Experiments confirmed that the relative measurement error is below 0.08%. Field experiments under extreme illumination conditions, including self-calibration of a multi-event camera array and 3D deformation measurement, verified the performance of the proposed method. Conclusions: This paper addressed the critical limitation of traditional cameras in measuring high-speed 3D deformations under extreme illumination conditions. The experimental results demonstrate that, compared to other methods, the proposed method can accurately measure 3D deformations of structures under harsh lighting conditions, and the relative error of the measured deformation is less than 0.1%.

URL PDF HTML ☆

赞 0 踩 0

2603.28156 2026-03-31 cs.RO

Reducing Mental Workload through On-Demand Human Assistance for Physical Action Failures in LLM-based Multi-Robot Coordination

Shoichi Hasegawa, Akira Taniguchi, Lotfi El Hafi, Gustavo Alfonso Garcia Ricardez, Tadahiro Taniguchi

Comments Under review in IEEE RO-MAN 2026. Project page is https://emergentsystemlabstudent.github.io/REPAIR/

2603.28152 2026-03-31 cs.CV

ObjectMorpher: 3D-Aware Image Editing via Deformable 3DGS Models

Yuhuan Xie, Aoxuan Pan, Yi-Hua Huang, Chirui Chang, Peng Dai, Xin Yu, Xiaojuan Qi

Comments 11 pages, 8 figures

2603.28149 2026-03-31 cs.CV

BlankSkip: Early-exit Object Detection onboard Nano-drones

Carlo Marra, Beatrice Alessandra Motetti, Alessio Burrello, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

Comments Accepted for publication in the Embedded Vision Workshop of the 2026 Computer Vision and Pattern Recognition (CVPR) conference

2603.28142 2026-03-31 cs.CV cs.AI

RecycleLoRA: Rank-Revealing QR-Based Dual-LoRA Subspace Adaptation for Domain Generalized Semantic Segmentation

Chanseul Cho, Seokju Yun, Jeaseong Jeon, Seungjae Moon, Youngmin Ro

Comments Accepted to CVPR 2026 (Findings)

2603.28141 2026-03-31 cs.CV cs.LG

Intelligent Road Condition Monitoring using 3D In-Air SONAR Sensing

Amber Cassimon, Robin Kerstens, Walter Daems, Jan Steckel

Comments 10 pages, 9 figures, 2 tables

2603.28135 2026-03-31 cs.AI

CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning

Siyuan Ma, Bo Gao, Zikai Xiao, Hailong Wang, Xinlei Yu, Rui Qian, Jiayu Qian, Luqi Gong, Yang Liu

2603.28134 2026-03-31 cs.CV

Robust Remote Sensing Image-Text Retrieval with Noisy Correspondence

Qiya Song, Yiqiang Xie, Yuan Sun, Renwei Dian, Xudong Kang

详情

英文摘要

As a pivotal task that bridges remote visual and linguistic understanding, Remote Sensing Image-Text Retrieval (RSITR) has attracted considerable research interest in recent years. However, almost all RSITR methods implicitly assume that image-text pairs are matched perfectly. In practice, acquiring a large set of well-aligned data pairs is often prohibitively expensive or even infeasible. In addition, we also notice that the remote sensing datasets (e.g., RSITMD) truly contain some inaccurate or mismatched image text descriptions. Based on the above observations, we reveal an important but untouched problem in RSITR, i.e., Noisy Correspondence (NC). To overcome these challenges, we propose a novel Robust Remote Sensing Image-Text Retrieval (RRSITR) paradigm that designs a self-paced learning strategy to mimic human cognitive learning patterns, thereby learning from easy to hard from multi-modal data with NC. Specifically, we first divide all training sample pairs into three categories based on the loss magnitude of each pair, i.e., clean sample pairs, ambiguous sample pairs, and noisy sample pairs. Then, we respectively estimate the reliability of each training pair by assigning a weight to each pair based on the values of the loss. Further, we respectively design a new multi-modal self-paced function to dynamically regulate the training sequence and weights of the samples, thus establishing a progressive learning process. Finally, for noisy sample pairs, we present a robust triplet loss to dynamically adjust the soft margin based on semantic similarity, thereby enhancing the robustness against noise. Extensive experiments on three popular benchmark datasets demonstrate that the proposed RRSITR significantly outperforms the state-of-the-art methods, especially in high noise rates. The code is available at: https://github.com/MSFLabX/RRSITR

URL PDF HTML ☆

赞 0 踩 0

2603.28130 2026-03-31 cs.CV cs.AI

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu

2603.28126 2026-03-31 cs.CV

SVGS: Single-View to 3D Object Editing via Gaussian Splatting

Pengcheng Xue, Yan Tian, Qiutao Song, Ziyi Wang, Linyang He, Weiping Ding, Mahmoud Hassaballah, Karen Egiazarian, Wei-Fa Yang, Leszek Rutkowski

2603.28120 2026-03-31 cs.CV

MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding

Guangjing Yang, Ziyuan Qin, Chaoran Zhang, Chenlin Du, Jinlin Wang, Wanran Sun, Zhenyu Zhang, Bing Ji, Qicheng Lao

Comments 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2603.28116 2026-03-31 cs.RO cs.CV

$AutoDrive\text{-}P^3$: Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning

Yuqi Ye, Zijian Zhang, Junhong Lin, Shangkun Sun, Changhao Peng, Wei Gao

Comments Accepted at ICLR 2026 (International Conference on Learning Representations)

详情

英文摘要

Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a significant domain gap and compromises decision-making capability; 2) Other VLMs can generate outputs for perception, prediction, and planning tasks but employ a fragmented decision-making approach where these modules operate separately, leading to a significant lack of synergy that undermines true planning performance. To address these limitations, we propose ${AutoDrive\text{-}P^3}$, a novel framework that seamlessly integrates $\textbf{P}$erception, $\textbf{P}$rediction, and $\textbf{P}$lanning through structured reasoning. We introduce the ${P^3\text{-}CoT}$ dataset to facilitate coherent reasoning and propose ${P^3\text{-}GRPO}$, a hierarchical reinforcement learning algorithm that provides progressive supervision across all three tasks. Specifically, ${AutoDrive\text{-}P^3}$ progressively generates CoT reasoning and answers for perception, prediction, and planning, where perception provides essential information for subsequent prediction and planning, while both perception and prediction collectively contribute to the final planning decisions, enabling safer and more interpretable autonomous driving. Additionally, to balance inference efficiency with performance, we introduce dual thinking modes: detailed thinking and fast thinking. Extensive experiments on both open-loop (nuScenes) and closed-loop (NAVSIMv1/v2) benchmarks demonstrate that our approach achieves state-of-the-art performance in planning tasks. Code is available at https://github.com/haha-yuki-haha/AutoDrive-P3.

URL PDF HTML ☆

赞 0 踩 0

2603.28115 2026-03-31 cs.LG

Graph Vector Field: A Unified Framework for Multimodal Health Risk Assessment from Heterogeneous Wearable and Environmental Data Streams

Silvano Coletti, Francesca Fallucchi

Comments 25 pages, 6 appendices. Theoretical framework; no empirical experiments

2603.28114 2026-03-31 cs.CV cs.LG

Attention Frequency Modulation: Training-Free Spectral Modulation of Diffusion Cross-Attention

Seunghun Oh, Unsang Park

Comments 16 pages; preprint

2603.28110 2026-03-31 cs.CV

Contour-Guided Query-Based Feature Fusion for Boundary-Aware and Generalizable Cardiac Ultrasound Segmentation

Zahid Ullah, Sieun Choi, Jihie Kim

2603.28105 2026-03-31 cs.CV

RAWIC: Bit-Depth Adaptive Lossless Raw Image Compression

Chunhang Zheng, Tongda Xu, Mingli Xie, Yan Wang, Dou Li

Comments Accepted by ICME 2026

2603.28101 2026-03-31 cs.LG

Heddle: A Distributed Orchestration System for Agentic RL Rollout

Zili Zhang, Yinmin Zhong, Chengxu Yang, Chao Jin, Bingyang Wu, Xinming Wei, Yuliang Liu, Xin Jin

2603.28095 2026-03-31 cs.CV

Octree-based Learned Point Cloud Geometry Compression: A Lossy Perspective

Kaiyu Zheng, Wei Gao, Huiming Zheng

2603.28092 2026-03-31 cs.LG

InkDrop: Invisible Backdoor Attacks Against Dataset Condensation

He Yang, Dongyi Lv, Song Ma, Wei Xi, Zhi Wang, Hanlin Gu, Yajie Wang

2603.28091 2026-03-31 cs.CV cs.RO

SHARP: Short-Window Streaming for Accurate and Robust Prediction in Motion Forecasting

Alexander Prutsch, Christian Fruhwirth-Reisinger, David Schinagl, Horst Possegger

Comments CVPR 2026. Project page at https://a-pru.github.io/sharp

2603.28090 2026-03-31 cs.CV

To View Transform or Not to View Transform: NeRF-based Pre-training Perspective

Hyeonjun Jeong, Juyeb Shin, Dongsuk Kum

Comments The Fourteenth International Conference on Learning Representations (ICLR'26)

2603.28088 2026-03-31 cs.CV

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang

Comments Project Page: https://gems-gen.github.io

2603.28086 2026-03-31 cs.SD cs.AI cs.CL

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Kexin Huang, Liwei Fan, Botian Jiang, Yaozhou Jiang, Qian Tu, Jie Zhu, Yuqian Zhang, Yiwei Zhao, Chenchen Yang, Zhaoye Fei, Shimin Li, Xiaogui Yang, Qinyuan Cheng, Xipeng Qiu

2603.28082 2026-03-31 cs.CV cs.MA

LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

Chutian Meng, Fan Ma, Chi Zhang, Jiaxu Miao, Yi Yang, Yueting Zhuang

2603.28079 2026-03-31 cs.RO

Control Without Control: Defining Implicit Interaction Paradigms for Autonomous Assistive Robots

Janavi Gupta, Kavya Puthuveetil, Dimitra Tsakona, Akhil Padmanabha, Yiannis Demiris, Zackory Erickson

Comments 8 pages, 2 figures

2603.28074 2026-03-31 cs.LG math.DS

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

Tim Plotzki, Sebastian Peitz