2605.17336
2026-05-19
cs.RO
cs.CV
eess.SP
版本更新
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms
基于触觉的多模态融合在具身智能中的应用:视觉、语言和接触驱动范式的综述
Zhixiang Cao, Di Tian, Runwei Guan, Yanzhou Mu, Xiaolou Sun, Shaofeng Liang, Daizong Liu, Tao Huang, Yutao Yue, Henghui Ding, Bin Fang, Alex Zhou, Qing-Long Han, Hui Xiong
发表机构
*
School of Electronic Science and Engineering, Xi’an Jiaotong University, China(西安交通大学电子科学与技术学院)
;
Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), China(香港科技大学(广州)人工智能研究所)
;
State Key Laboratory for Novel Software Technology, Nanjing University, China(南京大学新型软件技术国家重点实验室)
;
Purple Mountain Laboratory, China(紫金山实验室)
;
Institute for Math & AI, Wuhan University, China(武汉大学数学与人工智能学院)
;
Centre for AI and Data Science Innovation and the School of Science and Engineering, James Cook University, Australia(詹姆斯库克大学人工智能与数据科学创新中心及科学与工程学院)
;
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China(北京邮电大学人工智能学院)
;
Institute of Big Data, Fudan University, China(复旦大学大数据研究院)
;
Linkerbot (Beijing) Technology Co., Ltd, China(北京链动科技有限公司)
;
School of Engineering, Swinburne University of Technology, Melbourne(斯威本技术大学工程学院)
AI总结
本文综述了多模态触觉融合在具身智能中的研究,探讨了如何通过整合视觉、语言和触觉信息来提升物理交互与语义推理的结合,提出了一种分层的分类体系,并总结了当前的研究挑战和未来方向。