arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.05614 2026-04-08 cs.RO

Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment

Theodor Wulff, Federico Tavella, Rahul Singh Maharjan, Manith Adikari, Angelo Cangelosi

详情

英文摘要

Achieving robot transparency is a critical step toward effective human-robot collaboration. To be transparent, a robot's natural language communication must be consistent with its actions and explicitly grounded in the task and environment. Existing hierarchical Vision-Language-Action (VLA) models can generate language (e.g., through chain-of-thought) and low-level actions. However, current work does not consider explicit alignment between these modalities during training. To address this crucial gap, we propose a novel training framework that explicitly grounds hierarchical VLA sub-task descriptions with respect to the visual observation and action space. Our framework uses a contrastive model to assess the alignment between generated language and corresponding action trajectories. This contrastive model enables direct ranking of different language-trajectory pairs based on their alignment, allowing us to refine the grounding of our hierarchical VLA through offline preference learning. We apply our framework to the LanguageTable dataset, a benchmark dataset of human language-annotated trajectories, and provide critical insights into multimodal grounding representations, all while establishing a strong baseline that achieves performance comparable to fully supervised fine-tuning and minimizing the need for costly data annotations.

URL PDF HTML ☆

赞 0 踩 0

2604.05613 2026-04-08 cs.LG

Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

Laurits Fredsgaard, Aaron Thomas, Michael Riis Andersen, Mikkel N. Schmidt, Mahito Sugiyama

Comments Workshop 'Towards Trustworthy Predictions: Theory and Applications of Calibration for Modern AI' at AISTATS 2026, Tangier, Morocco

2604.05610 2026-04-08 cs.RO

Control Architecture and experimental validation of a Novel Surgical Robotic Instrument

Doina Pisla, Ionut Zima, Calin Vaida, Andrei Cailean, Marius Miclaus, Adrian Pisla, Andrei Caprariu, Vasile Bulbucan, Bogdan Gherman, Damien Chablat

2604.05601 2026-04-08 cs.CV

ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference

Zhaohong Huang, Wenjing Liu, Yuxin Zhang, Fei Chao, Rongrong Ji

2604.05595 2026-04-08 cs.RO cs.CV

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Baoshun Tong, Haoran He, Ling Pan, Yang Liu, Liang Lin

2604.05593 2026-04-08 cs.AI cs.CL

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Xin Sun, Di Wu, Sijing Qin, Isao Echizen, Abdallah El Ali, Saku Sugawara

2604.05587 2026-04-08 cs.AI math.OC

ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation

Zhe Zhao, Haibin Wen, Jiaming Ma, Jiachang Zhan, Tianyi Xu, Ye Wei, Qingfu Zhang

2604.05583 2026-04-08 cs.CV

WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval

Yizhuo Xu, Chaojian Yu, Yuanjie Shao, Tongliang Liu, Qinmu Peng, Xinge You

2604.05581 2026-04-08 cs.CV

High-Resolution Single-Shot Polarimetric Imaging Made Easy

Shuangfan Zhou, Chu Zhou, Heng Guo, Youwei Lyu, Boxin Shi, Zhanyu Ma, Imari Sato

2604.05564 2026-04-08 cs.CL

THIVLVC: Retrieval Augmented Dependency Parsing for Latin

Luc Pommeret, Thibault Wagret, Jules Deret

2604.05562 2026-04-08 cs.CV

Physics-Aligned Spectral Mamba: Decoupling Semantics and Dynamics for Few-Shot Hyperspectral Target Detection

Luqi Gong, Qixin Xie, Yue Chen, Ziqiang Chen, Fanda Fan, Shuai Zhao, Chao Li

2604.05558 2026-04-08 cs.CV

Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities

Rongfei Chen, Tingting Zhang, Xiaoyu Shen, Wei Zhang

Comments 6 pages, 3 figures, conference

2604.05557 2026-04-08 cs.CL

EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal Agents

Xuan Dong, Huanyang Zheng, Tianhao Niu, Zhe Han, Pengzhan Li, Bofei Liu, Zhengyang Liu, Guancheng Li, Qingfu Zhu, Wanxiang Che

2604.05551 2026-04-08 cs.CL cs.AI cs.LG

FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version

Dat Nguyen-Cong, Tung Kieu, Hoang Thanh-Tung

Comments camera-ready version, accepted by ACL Findings (ACL 2026)

2604.05547 2026-04-08 cs.AI cs.GR

COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Liyuan Deng, Shujian Deng, Yongkang Chen, Yongkang Dai, Zhihang Zhong, Linyang Li, Xiao Sun, Yilei Shi, Huaxi Huang

Comments 10 pages, 3 figures, preprint paper

2604.05544 2026-04-08 cs.RO cs.CV

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

Jiahua Ma, Yiran Qin, Xin Wen, Yixiong Li, Yuyu Sun, Yulan Guo, Liang Lin, Ruimao Zhang

2604.05543 2026-04-08 cs.LG

Channel-wise Retrieval for Multivariate Time Series Forecasting

Junhyeok Kang, Jun Seo, Soyeon Park, Sangjun Han, Seohui Bae, Hyeokjun Choe, Soonyoung Lee

Comments Accepted at ICASSP 2026 Oral

2604.05540 2026-04-08 cs.CL

Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting

Jinhu Fu, Yan Bai, Longzhu He, Yihang Lou, Yanxiao Zhao, Li Sun, Sen Su

Comments Accepted by ACL 2026 main conference

2604.05539 2026-04-08 cs.AI

From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement

Cedric Haufe, Frieder Stolzenburg

Comments 16 pages, 2 figures, 4 tables

2604.05537 2026-04-08 cs.AI cs.DS

A canonical generalization of OBDD

Florent Capelli, YooJung Choi, Stefan Mengel, Martín Muñoz, Guy Van den Broeck

Comments Submitted to SAT26

2604.05536 2026-04-08 cs.CL cs.AI

Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system

Zhongxin Yang, Chun Bao, Yuanwei Bin, Xiang I. A. Yang, Shiyi Chen

2604.05535 2026-04-08 cs.AI

SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills

Da Lei, Feng Xiao, Lu Li, Yuzhan Liu

2604.05533 2026-04-08 cs.AI

Experience Transfer for Multimodal LLM Agents in Minecraft Game

Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang

2604.05531 2026-04-08 cs.RO

Simulation-Driven Evolutionary Motion Parameterization for Contact-Rich Granular Scooping with a Soft Conical Robotic Hand

Yongliang Wang, Cristian C. Beltran-Hernandez, Tomoya Takahashi, Masashi Hamaya

2604.05530 2026-04-08 cs.AI

Inventory of the 12 007 Low-Dimensional Pseudo-Boolean Landscapes Invariant to Rank, Translation, and Rotation

Arnaud Liefooghe, Sébastien Verel

2604.05527 2026-04-08 cs.CV

Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images

Xuanguang Liu, Lei Ding, Yujie Li, Chenguang Dai, Zhenchao Zhang, Mengmeng Li, Ziyi Yang, Yifan Sun, Yongqi Sun, Hanyun Wang

2604.05526 2026-04-08 cs.SD cs.AI

Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

Zhetao Hu, Yiquan Zhou, Wenyu Wang, Zhiyu Wu, Xin Gao, Jihua Zhu

Comments 8 pages, 5 figures

2604.05524 2026-04-08 cs.CV

Cross-Resolution Diffusion Models via Network Pruning

Jiaxuan Ren, Junhan Zhu, Huan Wang

Comments Accepted by CVPR Findings 2026

2604.05522 2026-04-08 cs.CL

Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs

Hongcheng Liu, Yuhao Wang, Zhe Chen, Pingjie Wang, Zhiyuan Zhu, Yixuan Hou, Yanfeng Wang, Yu Wang

2604.05517 2026-04-08 cs.AI

UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

Xiaolong Wei, Zerun Zhu, Simin Niu, Xingyu Zhang, Peiying Yu, Changxuan Xiao, Yuchen Li, Jicheng Yang, Zhejun Zhao, Chong Meng, Long Xia, Daiting Shi

Comments Accepted to Findings of ACL 2026