arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02966 2026-04-06 cs.CV

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

Wenhao Li, Zimeng Wu, Yu Wu, Zehua Fu, Jiaxin Chen

Comments CVPR2026 Accepted

详情

英文摘要

Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.

URL PDF HTML ☆

赞 0 踩 0

2604.02965 2026-04-06 cs.RO cs.CL

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Zihua Wang, Zhitao Lin, Ruibo Li, Yu Zhang, Xu Yang, Siya Mi, Xiu-Shen Wei

Comments Under Review

2604.02956 2026-04-06 cs.CV

Collaborative Multi-Mode Pruning for Vision-Language Models

Zimeng Wu, Yunhong Wang, Donghao Wang, Jiaxin Chen

Comments CVPR2026 Accepted

2604.02954 2026-04-06 cs.CL cs.AI

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Yilin Xiao, Jin Chen, Qinggang Zhang, Yujing Zhang, Chuang Zhou, Longhao Yang, Lingfei Ren, Xin Yang, Xiao Huang

2604.02951 2026-04-06 cs.CL cs.AI

How Annotation Trains Annotators: Competence Development in Social Influence Recognition

Maciej Markiewicz, Beata Bajcar, Wiktoria Mieleszczenko-Kowszewicz, Aleksander Szczęsny, Tomasz Adamczyk, Grzegorz Chodak, Karolina Ostrowska, Aleksandra Sawczuk, Jolanta Babiak, Jagoda Szklarczyk, Przemysław Kazienko

Comments Accepted to AIED 2026 (27th Conference on Artificial Intelligence in Education)

2604.02948 2026-04-06 cs.CV

CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation

Zelin Zhang, Kedi Li, Huiqi Liang, Tao Zhang, Chuanzhi Xu

2604.02947 2026-04-06 cs.AI

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Yunhao Feng, Yifan Ding, Yingshui Tan, Xingjun Ma, Yige Li, Yutao Wu, Yifeng Gao, Kun Zhai, Yanming Guo

2604.02946 2026-04-06 cs.CV cs.AI cs.LG

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

Koshiro Nagano, Ryo Fujii, Ryo Hachiuma, Fumiaki Sato, Taiki Sekii, Hideo Saito

Comments CVPR 2026

2604.02942 2026-04-06 cs.LG

Explainable Machine Learning Reveals 12-Fold Ucp1 Upregulation and Thermogenic Reprogramming in Female Mouse White Adipose Tissue After 37 Days of Microgravity: First AI/ML Analysis of NASA OSD-970

Md. Rashadul Islam

Comments 11 pages, 9 figures, 5 tables. First AI/ML analysis of NASA OSD-970 (GLDS-790). Code available at https://github.com/Rashadul22/NASA_OSD970_Complete_Output

2604.02937 2026-04-06 cs.SD

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

David A. Kelly, Hana Chockler

2604.02935 2026-04-06 cs.CV

Modality-Specific Hierarchical Enhancement for RGB-D Camouflaged Object Detection

Yuzhen Niu, Yangqing Wang, Ri Cheng, Fusheng Li, Rongshen Wang, Zhichen Yang

Comments 11 pages, 7 figures, including supplementary material. Accepted by IEEE ICME 2026

2604.02934 2026-04-06 cs.CV

PolyReal: A Benchmark for Real-World Polymer Science Workflows

Wanhao Liu, Weida Wang, Jiaqing Xie, Suorong Yang, Jue Wang, Benteng Chen, Guangtao Mei, Zonglin Yang, Shufei Zhang, Yuchun Mo, Lang Cheng, Jin Zeng, Houqiang Li, Wanli Ouyang, Yuqiang Li

2604.02930 2026-04-06 cs.CV

BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving

Miguel Antunes-García, Santiago Montiel-Marín, Fabio Sánchez-García, Rodrigo Gutiérrez-Moreno, Rafael Barea, Luis M. Bergasa

Comments 15 pages, 5 figures

2604.02926 2026-04-06 cs.CL

A Multi-head-based architecture for effective morphological tagging in Russian with open dictionary

K. Skibin, M. Pozhidaev, S. Suschenko

Comments 8 pages, 1 figure, submitted to AINL-2026

2604.02920 2026-04-06 cs.LG

Efficient Logistic Regression with Mixture of Sigmoids

Federico Di Gennaro, Saptarshi Chakraborty, Nikita Zhivotovskiy

2604.02915 2026-04-06 cs.CV

GP-4DGS: Probabilistic 4D Gaussian Splatting from Monocular Video via Variational Gaussian Processes

Mijeong Kim, Jungtaek Kim, Bohyung Han

Comments CVPR 2026, Page: https://cv.snu.ac.kr/research/GP4DGS

2604.02913 2026-04-06 cs.SD cs.AI cs.LG

Split and Conquer Partial Deepfake Speech

Inbal Rimon, Oren Gal, Haim Permuter

2604.02911 2026-04-06 cs.RO

Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

Junyang Liang, Yuxuan Liu, Yabin Chang, Junfan Lin, Junkai Ji, Hui Li, Changxin Huang, Jianqiang Li

Comments Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2026

2604.02910 2026-04-06 cs.AI cs.CL

Analysis of Optimality of Large Language Models on Planning Problems

Bernd Bohnet, Michael C. Mozer, Kevin Swersky, Wil Cunningham, Aaron Parisi, Kathleen Kenealy, Noah Fiedel

2604.02905 2026-04-06 cs.CV

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

Geonuk Kim, Minhoi Kim, Kangil Lee, Minsu Kim, Hyeonseong Jeon, Jeonghoon Han, Hyoungjoon Lim, Junho Yim

Comments Accepted to CVPR 2026

2604.02904 2026-04-06 cs.CL

BioUNER: A Benchmark Dataset for Clinical Urdu Named Entity Recognition

Wazir Ali, Adeeb Noor, Sanaullah Mahar, Alia, Muhammad Mazhar Younas

2604.02903 2026-04-06 cs.CV cs.AI

RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection

Cheng Lu, Mingqian Ji, Shanshan Zhang, Zhihao Li, Jian Yang

2604.02899 2026-04-06 cs.LG

Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation

Haseeb Tariq, Marwan Hassani

2604.02896 2026-04-06 cs.CV

EvaNet: Towards More Efficient and Consistent Infrared and Visible Image Fusion Assessment

Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Tao Zhou, Hui Li, Zhangyong Tang, Josef Kittler

Comments 20 figures,accepted by TPAMI

2604.02893 2026-04-06 cs.CV cs.AI cs.LG

Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models

Hai Nguyen-Truong, Alper Balbay, Tunga Bayrak

Comments 12 pages, 7 figures

2604.02892 2026-04-06 cs.RO

RAGE: A Tightly Coupled Radar-Aided Grip Estimator For Autonomous Race Cars

Davide Malvezzi, Nicola Musiu, Eugenio Mascaro, Francesco Iacovacci, Marko Bertogna

Comments 10 pages, 9 figures

2604.02891 2026-04-06 cs.CV

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding

Yufei Yin, Yuchen Xing, Qianke Meng, Minghao Chen, Yan Yang, Zhou Yu

Comments Accepted to ICME 2026

2604.02883 2026-04-06 cs.CV

Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision

Zhenxiao Liang, Qixing Huang

2604.02881 2026-04-06 cs.CL cs.AI

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Baban Gain, Asif Ekbal, Trilok Nath Singh

2604.02877 2026-04-06 cs.CV

Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework

Yu Zhu, Kang Li, Zheng Li, Pheng-Ann Heng

Comments Accepted by CVPR 2026