arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26690 2026-03-31 cs.RO cs.AI cs.CV

SpatialPoint: Spatial-aware Point Prediction for Embodied Localization

Qiming Zhu, Zhirui Fang, Tianming Zhang, Chuanxiu Liu, Xiaoke Jiang, Lei Zhang

Comments 19 pages, 12 figures, supplementary material included

详情

英文摘要

Embodied intelligence fundamentally requires a capability to determine where to act in 3D space. We formalize this requirement as embodied localization -- the problem of predicting executable 3D points conditioned on visual observations and language instructions. We instantiate embodied localization with two complementary target types: touchable points, surface-grounded 3D points enabling direct physical interaction, and air points, free-space 3D points specifying placement and navigation goals, directional constraints, or geometric relations. Embodied localization is inherently a problem of embodied 3D spatial reasoning -- yet most existing vision-language systems rely predominantly on RGB inputs, necessitating implicit geometric reconstruction that limits cross-scene generalization, despite the widespread adoption of RGB-D sensors in robotics. To address this gap, we propose SpatialPoint, a spatial-aware vision-language framework with careful design that integrates structured depth into a vision-language model (VLM) and generates camera-frame 3D coordinates. We construct a 2.6M-sample RGB-D dataset covering both touchable and air points QA pairs for training and evaluation. Extensive experiments demonstrate that incorporating depth into VLMs significantly improves embodied localization performance. We further validate SpatialPoint through real-robot deployment across three representative tasks: language-guided robotic arm grasping at specified locations, object placement to target destinations, and mobile robot navigation to goal positions.

URL PDF HTML ☆

赞 0 踩 0

2603.26687 2026-03-31 cs.RO cs.AI

Learning Energy-Efficient Air--Ground Actuation for Hybrid Robots on Stair-Like Terrain

Jiaxing Li, Wen Tian, Xinhang Xu, Junbin Yuan, Sebastian Scherer, Muqing Cao

2603.26686 2026-03-31 cs.RO cs.HC

Bridging the Awareness Gap: Socially Mediated State Externalization for Transparent Distributed Home Robots

Wenzheng Zhao, Manideep Duggi, Fengpei Yuan

Comments 9 pages, 7 figures, 6 tables. Under review for IROS 2026

2603.26685 2026-03-31 cs.RO cs.AI cs.CV cs.LG

Contextual Graph Representations for Task-Driven 3D Perception and Planning

Christopher Agia

Comments University of Toronto Undergraduate Thesis, 2021. 85 pages, 24 figures

2603.26675 2026-03-31 cs.CL cs.LG

GeoBlock: Inferring Block Granularity from Dependency Geometry in Diffusion Language Models

Lipeng Wan, Junjie Ma, Jianhui Gu, Zeyang Liu, Xuyang Lu, Xuguang Lan

Comments 13 pages, 4 figures, Code available upon publication

2603.26674 2026-03-31 cs.RO cs.CY cs.HC

Co-designing a Social Robot for Newcomer Children's Cultural and Language Learning

Neil Fernandes, Tehniyat Shahbaz, Emily Davies-Robinson, Yue Hu, Kerstin Dautenhahn

Comments In proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI 2026)

2603.26671 2026-03-31 cs.LG math.OC

Mitigating Forgetting in Continual Learning with Selective Gradient Projection

Anika Singh, Aayush Dhaulakhandi, Varun Chopade, Likhith Malipati, David Martinez, Kevin Zhu

Comments 15 pages, 2 figures, Accepted to the Student Research Workshop at International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics, 2025

2603.20507 2026-03-31 cs.LG stat.ML

Distributed Gradient Clustering: Convergence and the Effect of Initialization

Aleksandar Armacki, Himkant Sharma, Dragana Bajović, Dušan Jakovetić, Mrityunjoy Chakraborty, Soummya Kar

Comments 9 pages, 3 figures

2512.08492 2026-03-31 cs.AI

Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance

Aliaksei Kaliutau

Comments 21 pages, 4 figures

2509.24968 2026-03-31 cs.CV

Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning

Donghwa Kang, Junho Kim, Dongwoo Kang

Comments 14 pages, 10 figures

2506.03388 2026-03-31 cs.CV

Cross-Modal Urban Sensing: Evaluating Sound-Vision Alignment Across Street-Level and Aerial Imagery

Pengyu Chen, Xiao Huang, Teng Fei, Sicheng Wang

Comments 18 pages, 13 figures

2504.10833 2026-03-31 cs.LG cs.AI cs.CV

Measuring the (Un)Faithfulness of Concept-Based Explanations

Shubham Kumar, Narendra Ahuja

Comments To appear in CVPR 2026

2502.00472 2026-03-31 cs.LG math.DS physics.flu-dyn

Binned Spectral Power Loss for Improved Prediction of Chaotic Systems

Dibyajyoti Chakraborty, Arvind T. Mohan, Romit Maulik

2603.28737 2026-03-31 eess.AS cs.AI cs.CL cs.SD

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath

Comments Under review

2603.28735 2026-03-31 cs.SE cs.AI

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam

Comments Accepted at ANGE 2026, co-located with IEEE ICSA 2026. 8 pages

2603.28731 2026-03-31 cs.SE cs.AI

SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability

Oliver Aleksander Larsen, Mahyar T. Moghaddam

Comments Accepted at SAGAI 2026, co-located with IEEE ICSA 2026. 8 pages

2603.26359 2026-03-31 quant-ph cs.AI

Automated near-term quantum algorithm discovery for molecular ground states

Fabian Finger, Frederic Rapp, Pranav Kalidindi, Kerry He, Kante Yin, Alexander Koziell-Pipe, David Zsolt Manrique, Gabriel Greene-Diniz, Stephen Clark, Hamza Fawzi, Bernardino Romera-Paredes, Alhussein Fawzi, Konstantinos Meichanetzidis

Comments main: 17 pages, 7 Figures

2511.22442 2026-03-31 cs.PF cs.AI cs.CV cs.LG stat.ML

What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$

Sébastien Piérard, Adrien Deliège, Marc Van Droogenbroeck

Comments CVPR 2026

2508.13197 2026-03-31 cond-mat.mtrl-sci cs.AI

The Rise of Generative AI for Metal-Organic Framework Design and Synthesis

Chenru Duan, Aditya Nandy, Shyam Chand Pal, Xin Yang, Wenhao Gao, Yuanqi Du, Hendrik Kraß, Yeonghun Kang, Varinia Bernales, Zuyang Ye, Tristan Pyle, Ray Yang, Zeqi Gu, Philippe Schwaller, Shengqian Ma, Shijing Sun, Alán Aspuru-Guzik, Seyed Mohamad Moosavi, Robert Wexler, Zhiling Zheng

Comments 10 pages, 5 figures

2508.11662 2026-03-31 cs.CY cs.AI cs.HC

Generative AI in Training and Coaching: Redefining the Design Process of Learning Materials

Alexander Komar, Marc-André Heidelmann, Kristina Schaaff

2505.12578 2026-03-31 stat.ML cs.LG

Stacked conformal prediction

Paulo C. Marques F

Comments 12 pages, 2 figures

2407.19097 2026-03-31 cs.GR cs.CV cs.HC cs.LG

NARVis: Neural Accelerated Rendering for Real-Time Scientific Point Cloud Visualization

Srinidhi Hegde, Kaur Kullman, Thomas Grubb, Leslie Lait, Stephen Guimond, Matthias Zwicker

2310.16472 2026-03-31 cs.LO cs.AI cs.DB

Semiring Provenance for Lightweight Description Logics

Camille Bourgaux, Ana Ozaki, Rafael Peñaloza

Comments This version fixes some issues and improves the presentation. 113 pages

2208.04980 2026-03-31 cs.SI cs.LG stat.AP

An NLP-Assisted Bayesian Time Series Analysis for Prevalence of Twitter Cyberbullying During the COVID-19 Pandemic

Christopher Perez, Sayar Karmakar

Comments 22 pages, 15 figures

1906.05284 2026-03-31 eess.IV cs.CV cs.LG

Image-Adaptive GAN based Reconstruction

Shady Abu Hussein, Tom Tirer, Raja Giryes

Comments Published to AAAI 2020. Code available at https://github.com/shadyabh/IAGAN

2603.28622 2026-03-31 cs.DC cs.AI cs.NI

Trust-Aware Routing for Distributed Generative AI Inference at the Edge

Chanh Nguyen, Erik Elmroth

Comments 11 pages, 10 figures. Preprint accepted at the 22nd Annual International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT 2026)

2603.28596 2026-03-31 cs.HC cs.AI cs.CL

Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection

Seyed Parsa Neshaei, Richard Lee Davis, Tanja Käser

Comments Accepted at AIED 2026

2603.28591 2026-03-31 math.DS cs.LG

Universal Approximation Constraints of Narrow ResNets: The Tunnel Effect

Christian Kuehn, Sara-Viola Kuntz, Tobias Wöhrer

2603.28553 2026-03-31 cs.HC cs.CY cs.LG

Multimodal Analytics of Cybersecurity Crisis Preparation Exercises: What Predicts Success?

Conrad Borchers, Valdemar Švábenský, Sandesh K. Kafle, Kevin K. Tang, Jan Vykopal

Comments Accepted as full paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

2603.28476 2026-03-31 cs.IR cs.LG cs.SI

With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Giovanni De Toni, Cristian Consonni, Erasmo Purificato, Emilia Gomez, Bruno Lepri