arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07957 2026-04-10 cs.AI cs.CV cs.RO

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models

Hongjin Chen, Shangyun Jiang, Tonghua Su, Chen Gao, Xinlei Chen, Yong Li, Zhibo Chen

详情

英文摘要

Vision-language models (VLMs) and generative world models are opening new opportunities for embodied navigation. VLMs are increasingly used as direct planners or trajectory predictors, while world models support look-ahead reasoning by imagining future views. Yet predicting a reliable trajectory from a single egocentric observation remains challenging. Current VLMs often generate unstable trajectories, and world models, though able to synthesize plausible futures, do not directly provide the grounded signals needed for navigation learning. This raises a central question: how can generated futures be turned into supervision for grounded trajectory prediction? We present WorldMAP, a teacher--student framework that converts world-model-generated futures into persistent semantic-spatial structure and planning-derived supervision. Its world-model-driven teacher builds semantic-spatial memory from generated videos, grounds task-relevant targets and obstacles, and produces trajectory pseudo-labels through explicit planning. A lightweight student with a multi-hypothesis trajectory head is then trained to predict navigation trajectories directly from vision-language inputs. On Target-Bench, WorldMAP achieves the best ADE and FDE among compared methods, reducing ADE by 18.0% and FDE by 42.1% relative to the best competing baseline, while lifting a small open-source VLM to DTW performance competitive with proprietary models. More broadly, the results suggest that, in embodied navigation, the value of world models may lie less in supplying action-ready imagined evidence than in synthesizing structured supervision for navigation learning.

URL PDF HTML ☆

赞 0 踩 0

2604.07955 2026-04-10 cs.LG

Rethinking Residual Errors in Compensation-based LLM Quantization

Shuaiting Li, Juncan Deng, Kedong Xu, Rongtao Deng, Hong Gu, Minghan Jiang, Haibin Shen, Kejie Huang

Comments ICLR'26 camera ready

2604.07953 2026-04-10 cs.LG cs.AI

Pruning Extensions and Efficiency Trade-Offs for Sustainable Time Series Classification

Raphael Fischer, Angus Dempster, Sebastian Buschjäger, Matthias Jakobs, Urav Maniar, Geoffrey I. Webb

2604.07952 2026-04-10 cs.LG

Fraud Detection System for Banking Transactions

Ranya Batsyas, Ritesh Yaduwanshi

2604.07945 2026-04-10 cs.RO cs.AI

Incremental Residual Reinforcement Learning Toward Real-World Learning for Social Navigation

Haruto Nagahisa, Kohei Matsumoto, Yuki Tomita, Yuki Hyodo, Ryo Kurazume

2604.07944 2026-04-10 cs.RO cs.AI cs.SY eess.SY

On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall

2604.07940 2026-04-10 cs.LG

A Systematic Framework for Tabular Data Disentanglement

Ivan Tjuawinata, Andre Gunawan, Anh Quan Tran, Nitish Kumar, Payal Pote, Harsh Bansal, Chu-Hung Chi, Kwok-Yan Lam, Parventanis Murthy

2604.07939 2026-04-10 cs.RO

RAGE-XY: RADAR-Aided Longitudinal and Lateral Forces Estimation For Autonomous Race Cars

Davide Malvezzi, Nicola Musiu, Eugenio Mascaro, Francesco Iacovacci, Marko Bertogna

Comments 6 pages, 5 figures

2604.07931 2026-04-10 cs.LG

Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions

Jing Wang, Yu-Yang Qian, Ke Xue, Chao Qian, Peng Zhao, Zhi-Hua Zhou

2604.07925 2026-04-10 cs.LG cs.AI math.OC

Sinkhorn doubly stochastic attention rank decay analysis

Michela Lapenna, Rita Fioresi, Bahman Gharesifard

2604.07923 2026-04-10 cs.CV

Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation

Hina Kogure, Kei Katsumata, Taiki Miyanishi, Komei Sugiura

2604.07922 2026-04-10 cs.AI cs.CL

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

Weiyang Huang, Xuefeng Bai, Kehai Chen, Xinyang Chen, Yibin Chen, Weili Guan, Min Zhang

Comments accepted to ACL2026 main conference

2604.07921 2026-04-10 cs.RO cs.CY

The Sustainability Gap in Robotics: A Large-Scale Survey of Sustainability Awareness in 50,000 Research Articles

Antun Skuric, Leandro Von Werra, Thomas Wolf

Comments 29 pages, 17 figures

2604.07916 2026-04-10 cs.CV

Tarot-SAM3: Training-free SAM3 for Any Referring Expression Segmentation

Weiming Zhang, Dingwen Xiao, Songyue Guo, Guangyu Xiang, Shiqi Wen, Minwei Zhao, Lei Chen, Lin Wang

Comments Under review

2604.07914 2026-04-10 cs.CV cs.AI

Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction

Yuanhong Zhang, Zhaoyang Wang, Xin Zhang, Weizhan Zhang, Joey Tianyi Zhou

2604.07912 2026-04-10 cs.CV cs.RO

ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models

Die Hu, Henan Li

Comments 7 pages, 3 tables. No university resources were used for this work

2604.07907 2026-04-10 cs.AI cs.LO

Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases

Alexander Pavlov

Comments 9 pages, 3 tables. Validated on 517 endgames covering 6.5 billion positions

2604.07904 2026-04-10 cs.LG cs.CV cs.NE

Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency

Mingqing Xiao, Yansen Wang, Dongqi Han, Caihua Shan, Dongsheng Li

2604.07901 2026-04-10 cs.CV

PanoSAM2: Lightweight Distortion- and Memory-aware Adaptions of SAM2 for 360 Video Object Segmentation

Dingwen Xiao, Weiming Zhang, Shiqi Wen, Lin Wang

2604.07900 2026-04-10 cs.CV cs.AI

AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning

Jiaming Su, Tengchao Yang, Ruikang Zhang, Zhengan Yan, Haoyu Sun, Linfeng Zhang

2604.07897 2026-04-10 cs.AI cs.LG

Visual Perceptual to Conceptual First-Order Rule Learning Networks

Kun Gao, Davide Soldà, Thomas Eiter, Katsumi Inoue

2604.07895 2026-04-10 cs.AI

DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

Joonhyeok Shin, Jaehoon Kang, Yujun Lee, Hannah Lee, Yejin Lee, Yoonji Park, Kyuhong Shim

2604.07894 2026-04-10 cs.CL cs.AI

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

Xinliang Frederick Zhang, Lu Wang

2604.07890 2026-04-10 cs.CV

Sampling-Aware 3D Spatial Analysis in Multiplexed Imaging

Ido Harlev, Tamar Oukhanov, Raz Ben-Uri, Leeat Keren, Shai Bagon

Comments Accepted to The 11th IEEE Workshop on Computer Vision for Multimodal Microscopy Image Analysis (CVMI), a CVPR 2026 workshop

详情

英文摘要

Highly multiplexed microscopy enables rich spatial characterization of tissues at single-cell resolution, yet most analyses rely on two-dimensional sections despite inherently three-dimensional tissue organization. Acquiring dense volumetric data in spatial proteomics remains costly and technically challenging, leaving practitioners to choose between 2D sections or 3D serial sections under limited imaging budgets. In this work, we study how sampling geometry impacts the stability of commonly used spatial statistics, and we introduce a geometry-aware reconstruction module that enables sparse yet consistent 3D analysis from serial sections. Using controlled simulations, we show that planar sampling reliably recovers global cell-type abundance but exhibits high variance for local statistics such as cell clustering and cell-cell interactions, particularly for rare or spatially localized populations. We observe consistent behavior in real multiplexed datasets, where interaction metrics and neighborhood relationships fluctuate substantially across individual sections. To support sparse 3D analysis in practice, we present a reconstruction approach that links cell projections across adjacent sections using phenotype and proximity constraints and recovers single-cell 3D centroids using cell-type-specific shape priors. We further analyze the trade-off between section spacing, coverage, and redundancy, identifying acquisition regimes that maximize reconstruction utility under fixed imaging budgets. We validate the reconstruction module on a public imaging mass cytometry dataset with dense axial sampling and demonstrate its downstream utility on an in-house CODEX dataset by enabling structure-level 3D analyses that are unreliable in 2D. Together, our results provide diagnostic tools and practical guidance for deciding when 2D sampling suffices and when sparse 3D reconstruction is warranted.

URL PDF HTML ☆

赞 0 踩 0

2604.07888 2026-04-10 cs.LG

Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs

Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo

2604.07885 2026-04-10 cs.CL

Contextualising (Im)plausible Events Triggers Figurative Language

Annerose Eichel, Tonmoy Rakshit, Sabine Schulte im Walde

2604.07884 2026-04-10 cs.CV cs.AI

Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition

Xuemei Jia, Jiawei Du, Hui Wei, Jun Chen, Joey Tianyi Zhou, Zheng Wang

2604.07883 2026-04-10 cs.AI cs.CL cs.CY cs.MA

An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks

Gabriel Stefan, Adrian-Marius Dumitran

Comments Accepted for ITS(Intelligent Tutoring Systems) 2026 Full Paper

2604.07882 2026-04-10 cs.CV

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

Boyuan Wang, Xiaofeng Wang, Yongkang Li, Zheng Zhu, Yifan Chang, Angen Ye, Guosheng Zhao, Chaojun Ni, Guan Huang, Yijie Ren, Yueqi Duan, Xingang Wang

2604.07879 2026-04-10 cs.CV cs.AI

FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

Jinghan Yang, Yihe Fan, Xudong Pan, Min Yang