arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24622 2026-04-29 cs.CV cs.AI

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

Fan Du, Feng Yan, Jianxiong Wu, Xinrun Xu, Weiye Zhang, Weinong Wang, Yu Guo, Bin Qian, Zhihai He, Fei Wang, Heng Yang

详情

英文摘要

Flow-based vision-language-action (VLA) policies offer strong expressivity for action generation, but suffer from a fundamental inefficiency: multi-step inference is required to recover action structure from uninformative Gaussian noise, leading to a poor efficiency-quality trade-off under real-time constraints. We address this issue by rethinking the role of the starting point in generative action modeling. Instead of shortening the sampling trajectory, we propose CF-VLA, a coarse-to-fine two-stage formulation that restructures action generation into a coarse initialization step that constructs an action-aware starting point, followed by a single-step local refinement that corrects residual errors. Concretely, the coarse stage learns a conditional posterior over endpoint velocity to transform Gaussian noise into a structured initialization, while the fine stage performs a fixed-time refinement from this initialization. To stabilize training, we introduce a stepwise strategy that first learns a controlled coarse predictor and then performs joint optimization. Experiments on CALVIN and LIBERO show that our method establishes a strong efficiency-performance frontier under low-NFE (Number of Function Evaluations) regimes: it consistently outperforms existing NFE=2 methods, matches or surpasses the NFE=10 $π_{0.5}$ baseline on several metrics, reduces action sampling latency by 75.4%, and achieves the best average real-robot success rate of 83.0%, outperforming MIP by 19.5 points and $π_{0.5}$ by 4.0 points. These results suggest that structured, coarse-to-fine generation enables both strong performance and efficient inference. Our code is available at https://github.com/EmbodiedAI-RoboTron/CF-VLA.

URL PDF HTML ☆

赞 0 踩 0

2604.24179 2026-04-29 cs.CL cs.AI

MemeScouts@LT-EDI 2026: Asking the Right Questions -- Prompted Weak Supervision for Meme Hate Speech Detection

Ivo Bueno, Lea Hirlimann, Enkelejda Kasneci

Comments Accepted at Sixth Workshop on Language Technology for Equality, Diversity and Inclusion at ACL2026 (LT-EDI@ACL26)

2604.23829 2026-04-29 cs.AI

Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features

John Winnicki, Abeynaya Gnanasekaran, Eric Darve

2604.22821 2026-04-29 cs.SD cs.LG eess.AS

Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use

Ramit Pahwa, Apoorva Beedu, Parivesh Priye, Rutu Gandhi, Saloni Takawale, Aruna Baijal, Zengli Yang

2604.21101 2026-04-29 cs.LG cs.NA math.NA

A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting

Brooks Kinch, Xiaozhe Hu, Yilong Huang, Martine Dyring Hansen, Sunniva Meltzer, Nathaniel Donald Hamlin, David Sirajuddin, Eric C. Cyr, Nathaniel Trask

Comments 29 pages, 6 figures

2604.17070 2026-04-29 cs.CV

NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report

Andrei Dumitriu, Aakash Ralhan, Florin Miron, Florin Tatui, Radu Tudor Ionescu, Radu Timofte, Abdullah Naeem, Anav Katwal, Ayon Dey, Md Tamjidul Hoque, Asuka Shin, Hiroto Shirono, Kosuke Shigematsu, Gaurav Mahesh, Anjana Nanditha, Jiji CV, Akbarali Vakhitov, Sang-Chul Lee, Xinger Li, Chun'an Yu, Junhao Chen, Yang Yang, Gundluri Yuvateja Reddy, Harshitha Palaram, Gejalakshmi N, Jeevitha S, Jiachen Tu, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Yaokun Shi, Amitabh Tripathi, Modugumudi Mahesh, Santosh Kumar Vipparthi, Subrahmanyam Murala

Comments Challenge report paper from NTIRE Workshop at CVPR 2026

2604.16171 2026-04-29 cs.LG cs.AI cs.CL

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad, Cristian Daniel Paduraru, Alexandru Tifrea, Elena Burceanu, Radu Tudor Ionescu

2604.11110 2026-04-29 cs.SD

Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan

Jialing Wang, Yue Zhao, Yuhao Zhang, Jing Yu, Shaosai Li, Zhanchen Dai, Benyou Wang, Haizhou Li

2604.10359 2026-04-29 cs.CV cs.AI

Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex

Alexandru Brateanu, Tingting Mu, Codruta Ancuti, Cosmin Ancuti

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

2604.07236 2026-04-29 cs.AI cs.CL

How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM's Residual Role in a Planning Agent

Sungwoo Jung, Seonil Son

2604.06171 2026-04-29 cs.CL cs.AI

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

Nguyen Phuc Tran, Brigitte Jaumard, Oscar Delgado, Tristan Glatard, Karthikeyan Premkumar, Kun Ni

Comments This work has been accepted for publication in IEEE Access. The final published version will be available via IEEE Xplore

2604.05594 2026-04-29 cs.CV

RABC-Net: Reliability-Aware Annotation-Free Skin Lesion Segmentation for Low-Resource Dermoscopy

Yujie Yao, Yuhaohang He, Junjie Huang, Zhou Liu, Jiangzhao Li, Yan Qiao, Wen Xiao, Yunsen Liang, Xiaofan Li

2604.05310 2026-04-29 cs.RO

Instantaneous Planning, Control and Safety for Navigation in Unknown Underwater Spaces

Veejay Karthik, Udit Ekansh, Tejal Bedmutha, Shivam Vishwakarma, Rohan Deshpande, Leena Vachhani

Comments Uploaded by mistake. A different version of the study is under process

2604.05030 2026-04-29 cs.CL cs.AI cs.LG

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

Gowrav Vishwakarma, Christopher J. Agostino

Comments submitting to APS Open Science, 13 pages, 3 figure, code and training logs available at https://github.com/gowrav-vishwakarma/qllm2

2602.23069 2026-04-29 cs.CV

Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception

Yiding Sun, Jihua Zhu, Haozhe Cheng, Chaoyi Lu, Zhichuan Yang, Lin Chen, Yaonan Wang

Comments Accepted by IEEE Transactions on Multimedia (Regular Paper)

2602.15547 2026-04-29 cs.CL

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, Han Xiao

Comments 14 pages, 8 figures. Model weights: https://huggingface.co/collections/jinaai/jina-embeddings-v5-text

2602.11786 2026-04-29 cs.LG cs.AI

Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing

Keita Broadwater

Comments 23 pages, 9 figures; editorial and LaTeX revisions for clarity; improved presentation of methodology and results; updated figures, tables, and float placement; clarified temperature sensitivity and deployment-risk analysis; expanded reporting from the same experiments; results unchanged in substance

2602.01785 2026-04-29 cs.CL cs.SE

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Yuling Shi, Chaoxiang Xie, Zhensu Sun, Yeheng Chen, Chenxu Zhang, Longfei Yun, Chengcheng Wan, Hongyu Zhang, David Lo, Xiaodong Gu

Comments Accepted to ISSTA 2026. Code and data are available at https://github.com/YerbaPage/CodeOCR

2602.00592 2026-04-29 cs.AI cs.SE

DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder

Jiaran Zhang, Luck Ma, Fanqi Wan, Di Qi, Xu Zhao, Jieyi Hou, Zhe Xie, Mengqiang Ren, Xin Wu, Zhewei Huang, Liangyu Chen, Qi Han, Xiangyu Zhang

2601.19709 2026-04-29 cs.SD cs.AI

Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification

Zhihua Fang, Liang He

Comments 5 pages, 3 figures, Accepted at ICASSP 2026

2601.12052 2026-04-29 cs.CV

Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation

Zaiyan Zhang, Jie Li, Shaowei Shi, Qiangqiang Yuan

Comments Accepted by IGARSS 2026 Conference (Oral)

2601.11100 2026-04-29 cs.AI

ReCreate: Reasoning and Creating Domain Agents Driven by Experience

Zhezheng Hao, Hong Wang, Jian Luo, Jianqing Zhang, Yuyan Zhou, Qiang Lin, Can Wang, Hande Dong, Jiawei Chen

2601.05110 2026-04-29 cs.AI

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Wenhao Zeng, Xuteng Zhang, Yuling Shi, Chao Hu, Yuting Chen, Beijun Shen, Xiaodong Gu

Comments Accepted to ACL 2026 Findings. Code available at https://github.com/Zengwh02/GlimpRouter

2601.03136 2026-04-29 cs.CL cs.AI cs.RO

Limited Linguistic Diversity in Embodied AI Datasets

Selma Wanna, Agnes Luhtaru, Jonathan Salfity, Ryan Barron, Juston Moore, Cynthia Matuszek, Mitch Pryor

Comments Accepted to ACL 2026 (Main Conference)

2601.01056 2026-04-29 cs.CV cs.AI

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Ifeanyi Ezuma, Ugochukwu Ugwu

Comments 10 pages, 8 figures. Code and datasets available upon request

2512.16918 2026-04-29 cs.CV

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Chaoyang Wang, Kaituo Feng, Dongyang Chen, Zhongyu Wang, Zhixun Li, Sicheng Gao, Meng Meng, Xu Zhou, Manyuan Zhang, Yuzhang Shang, Xiangyu Yue

Comments ACL 2026 Findings, Project page: https://github.com/CYWang735/AdaTooler-V

2512.09923 2026-04-29 cs.CV

Splatent: Splatting Diffusion Latents for Novel View Synthesis

Or Hirschorn, Omer Sela, Inbar Huberman-Spiegelglas, Netalee Efrat, Eli Alshan, Ianir Ideses, Frederic Devernay, Yochai Zvik, Lior Fritz

Comments CVPR 2026. Project's webpage at https://orhir.github.io/Splatent/

2512.08323 2026-04-29 cs.CV

Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challenge

Achraf Ben-Hamadou, Nour Neifar, Ahmed Rekik, Oussama Smaoui, Firas Bouzguenda, Sergi Pujades, Niels van Nistelrooij, Shankeeth Vinayahalingam, Kaibo Shi, Hairong Jin, Youyi Zheng, Tibor Kubík, Oldřich Kodym, Petr Šilling, Kateřina Trávníčková, Tomáš Mojžiš, Jan Matula, Jeffry Hartanto, Xiaoying Zhu, Kim-Ngan Nguyen, Tudor Dascalu, Huikai Wu, and Weijie Liu, Shaojie Zhuang, Guangshun Wei, Yuanfeng Zhou

Comments MICCAI 2024, 3DTeethLand, Challenge report, under review

2512.06757 2026-04-29 cs.SD cs.CV

XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association

Zhihua Fang, Shumei Tao, Junxu Wang, Liang He

Comments FAME 2026 Technical Report

2512.03043 2026-04-29 cs.CV

OneThinker: All-in-one Reasoning Model for Image and Video

Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, Xiangyu Yue

Comments CVPR 2026, Project page: https://github.com/tulerfeng/OneThinker