arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.03739 2026-03-05 cs.CV cs.AI

PROSPECT: Unified Streaming Vision-Language Navigation via Semantic--Spatial Fusion and Latent Predictive Representation

Zehua Fan, Wenqi Lyu, Wenxuan Song, Linge Zhao, Yifei Yang, Xi Wang, Junjie He, Lida Huang, Haiyan Liu, Bingchuan Sun, Guangjun Bao, Xuanyao Mao, Liang Xu, Yan Wang, Feng Gao

详情

英文摘要

Multimodal large language models (MLLMs) have advanced zero-shot end-to-end Vision-Language Navigation (VLN), yet robust navigation requires not only semantic understanding but also predictive modeling of environment dynamics and spatial structure. We propose PROSPECT, a unified streaming navigation agent that couples a streaming Vision-Language-Action (VLA) policy with latent predictive representation learning. PROSPECT uses CUT3R as a streaming 3D foundation spatial encoder to produce long-context, absolute-scale spatial features, and fuses them with SigLIP semantic features via cross-attention. During training, we introduce learnable stream query tokens that query the streaming context and predict next-step 2D and 3D latent features (rather than pixels or explicit modalities), supervised in the latent spaces of frozen SigLIP and CUT3R teachers. The predictive branch shapes internal representations without inference overhead. Experiments on VLN-CE benchmarks and real-robot deployment demonstrate state-of-the-art performance and improved long-horizon robustness under diverse lighting. We will release code for the community soon.

URL PDF HTML ☆

赞 0 踩 0

2603.03735 2026-03-05 cs.RO

Characterization and Correlation of Robotic Snake Scale Friction and Locomotion Speed

Umit Sen, Andri Mahegan, Gina Olson

Comments Accepted for 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026), 8 pages, 7 figures

2603.03725 2026-03-05 cs.LG cs.AI

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Yifan Zhu, Yibo Miao, Yinpeng Dong, Xiao-Shan Gao

Comments 32 pages, ICLR 2026

2603.03718 2026-03-05 cs.CV

Glass Segmentation with Fusion of Learned and General Visual Features

Risto Ojala, Tristan Ellison, Mo Chen

2603.03714 2026-03-05 cs.CL cs.AI cs.CV cs.MM

Order Is Not Layout: Order-to-Space Bias in Image Generation

Yongkang Zhang, Zonglin Zhao, Yuechen Zhang, Fei Ding, Pei Li, Wenxuan Wang

2603.03701 2026-03-05 cs.RO cs.AI cs.HC cs.SI

UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services

Tonmoy Dey, Lin Jiang, Zheng Dong, Guang Wang

Comments 8 pages, 15 figures. This paper has been accepted by ICRA'26 as a regular paper

2603.03695 2026-03-05 cs.RO

TreeLoc++: Robust 6-DoF LiDAR Localization in Forests with a Compact Digital Forest Inventory

Minwoo Jung, Dongjae Lee, Nived Chebrolu, Haedam Oh, Maurice Fallon, Ayoung Kim

Comments 25 pages, 27 figures and 15 tables

详情

英文摘要

Reliable localization is essential for sustainable forest management, as it allows robots or sensor systems to revisit and monitor the status of individual trees over long periods. In modern forestry, this management is structured around Digital Forest Inventories (DFIs), which encode stems using compact geometric attributes rather than raw data. Despite their central role, DFIs have been overlooked in localization research, and most methods still rely on dense gigabyte-sized point clouds that are costly to store and maintain. To improve upon this, we propose TreeLoc++, a global localization framework that operates directly on DFIs as a discriminative representation, eliminating the need to use the raw point clouds. TreeLoc++ reduces false matches in structurally ambiguous forests and improves the reliability of full 6-DoF pose estimation. It augments coarse retrieval with a pairwise distance histogram that encodes local tree-layout context, subsequently refining candidates via DBH-based filtering and yaw-consistent inlier selection to further reduce mismatches. Furthermore, a constrained optimization leveraging tree geometry jointly estimates roll, pitch, and height, enhancing pose stability and enabling accurate localization without reliance on dense 3D point cloud data. Evaluations on 27 sequences recorded in forests across three datasets and four countries show that TreeLoc++ achieves precise localization with centimeter-level accuracy. We further demonstrate robustness to long-term change by localizing data recorded in 2025 against inventories built from 2023 data, spanning a two-year interval. The system represents 15 sessions spanning 7.98 km of trajectories using only 250KB of map data and outperforms both hand-crafted and learning-based baselines that rely on point cloud maps. This demonstrates the scalability of TreeLoc++ for long-term deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.03681 2026-03-05 cs.CV cs.AI

EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs

Yuhao Chen, Bin Shan, Xin Ye, Cheng Chen

Comments 16 pages, 4 figures, 3 tables

2603.03680 2026-03-05 cs.AI

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

Lu Yang, Zelai Xu, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu

2603.03677 2026-03-05 cs.CL cs.AI

MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation

Guoyi Li, Shihao Xu, Jiatong Ma, Yunyun Han, Jianhua Chen, Yafeng Deng

2603.03673 2026-03-05 cs.LG stat.ML

A Stein Identity for q-Gaussians with Bounded Support

Sophia Sklaviadis, Thomas Moellenhoff, Andre F. T. Martins, Mario A. T. Figueiredo, Mohammad Emtiyaz Khan

2603.03672 2026-03-05 cs.LG cs.AI cs.DB cs.GT

Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation

Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei

2603.03665 2026-03-05 cs.CV cs.LG

Machine Pareidolia: Protecting Facial Image with Emotional Editing

Binh M. Le, Simon S. Woo

Comments Proceedings of the AAAI Conference on Artificial Intelligence 40

2603.03662 2026-03-05 cs.LG cs.AI

Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

Jiaqi Lv, Qingfeng Du, Yu Zhang, Yongqi Han, Sheng Li

2603.03657 2026-03-05 cs.CV cs.AI

InEdit-Bench: Benchmarking Intermediate Logical Pathways for Intelligent Image Editing Models

Zhiqiang Sheng, Xumeng Han, Zhiwei Zhang, Zenghui Xiong, Yifan Ding, Aoxiang Ping, Xiang Li, Tong Guo, Yao Mao

Comments CVPR findings. Project page: https://github.com/SZStrong1/InEdit-Bench

2603.03655 2026-03-05 cs.AI

Mozi: Governed Autonomy for Drug Discovery LLM Agents

He Cao, Siyu Liu, Fan Zhang, Zijing Liu, Hao Li, Bin Feng, Shengyuan Bai, Leqing Chen, Kai Xie, Yu Li

详情

英文摘要

Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and poor long-horizon reliability. In dependency-heavy pharmaceutical pipelines, autonomous agents often drift into irreproducible trajectories, where early-stage hallucinations multiplicatively compound into downstream failures. To overcome this, we present Mozi, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology. Layer A (Control Plane) establishes a governed supervisor--worker hierarchy that enforces role-based tool isolation, limits execution to constrained action spaces, and drives reflection-based replanning. Layer B (Workflow Plane) operationalizes canonical drug discovery stages -- from Target Identification to Lead Optimization -- as stateful, composable skill graphs. This layer integrates strict data contracts and strategic human-in-the-loop (HITL) checkpoints to safeguard scientific validity at high-uncertainty decision boundaries. Operating on the design principle of ``free-form reasoning for safe tasks, structured execution for long-horizon pipelines,'' Mozi provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation. We evaluate Mozi on PharmaBench, a curated benchmark for biomedical agents, demonstrating superior orchestration accuracy over existing baselines. Furthermore, through end-to-end therapeutic case studies, we demonstrate Mozi's ability to navigate massive chemical spaces, enforce stringent toxicity filters, and generate highly competitive in silico candidates, effectively transforming the LLM from a fragile conversationalist into a reliable, governed co-scientist.

URL PDF HTML ☆

赞 0 踩 0

2603.03654 2026-03-05 cs.CV cs.AI eess.IV

Field imaging framework for morphological characterization of aggregates with computer vision: Algorithms and applications

Haohang Huang

Comments PhD thesis

详情

英文摘要

Construction aggregates, including sand and gravel, crushed stone and riprap, are the core building blocks of the construction industry. State-of-the-practice characterization methods mainly relies on visual inspection and manual measurement. State-of-the-art aggregate imaging methods have limitations that are only applicable to regular-sized aggregates under well-controlled conditions. This dissertation addresses these major challenges by developing a field imaging framework for the morphological characterization of aggregates as a multi-scenario solution. For individual and non-overlapping aggregates, a field imaging system was designed and the associated segmentation and volume estimation algorithms were developed. For 2D image analyses of aggregates in stockpiles, an automated 2D instance segmentation and morphological analysis approach was established. For 3D point cloud analyses of aggregate stockpiles, an integrated 3D Reconstruction-Segmentation-Completion (RSC-3D) approach was established: 3D reconstruction procedures from multi-view images, 3D stockpile instance segmentation, and 3D shape completion to predict the unseen sides. First, a 3D reconstruction procedure was developed to obtain high-fidelity 3D models of collected aggregate samples, based on which a 3D aggregate particle library was constructed. Next, two datasets were derived from the 3D particle library for 3D learning: a synthetic dataset of aggregate stockpiles with ground-truth instance labels, and a dataset of partial-complete shape pairs, developed with varying-view raycasting schemes. A state-of-the-art 3D instance segmentation network and a 3D shape completion network were trained on the datasets, respectively. The application of the integrated approach was demonstrated on real stockpiles and validated with ground-truth, showing good performance in capturing and predicting the unseen sides of aggregates.

URL PDF HTML ☆

赞 0 踩 0

2603.03652 2026-03-05 cs.CL

Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification

JaeGeon Yoo, Byoungwook Kim, Yeongwook Yang, Hong-Jun Jang

Comments 16 pages, 1 Figure, Accepted at DASFAA 2026 (Full Research Paper)

2603.03651 2026-03-05 cs.LG

Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm

Septian Enggar Sukmana, Sang Won Bae, Tomohiro Shibata

Comments Accepted on Activity and Behavior Computing (ABC) 2026 Conference (https://autocare.ai/abc2026) and will be published on International Journal of Activity and Behavior Computing (IJABC) (International Journal of Activity and Behavior Computing)

2603.03650 2026-03-05 cs.LG physics.comp-ph

Adaptive Sensing of Continuous Physical Systems for Machine Learning

Felix Köster, Atsushi Uchida

2603.03646 2026-03-05 cs.CV

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Mohamed Elmoghany, Liangbing Zhao, Xiaoqian Shen, Subhojyoti Mukherjee, Yang Zhou, Gang Wu, Viet Dac Lai, Seunghyun Yoon, Ryan Rossi, Abdullah Rashwan, Puneet Mathur, Varun Manjunatha, Daksh Dangi, Chien Nguyen, Nedim Lipka, Trung Bui, Krishna Kumar Singh, Ruiyi Zhang, Xiaolei Huang, Jaemin Cho, Yu Wang, Namyong Park, Zhengzhong Tu, Hongjie Chen, Hoda Eldardiry, Nesreen Ahmed, Thien Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt

2603.03640 2026-03-05 cs.RO

MistyPilot: An Agentic Fast-Slow Thinking LLM Framework for Misty Social Robots

Xiao Wang, Lu Dong, Jingchen Sun, Ifeoma Nwogu, Srirangaraj Setlur, Venu Govindaraju

2603.03637 2026-03-05 cs.CV cs.AI cs.CR

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

Neha Nagaraja, Lan Zhang, Zhilong Wang, Bo Zhang, Pawan Patil

Comments 7 pages, published in 2025 3rd International Conference on Foundation and Large Language Models (FLLM), Vienna, Austria

2603.03627 2026-03-05 cs.RO

Touch2Insert: Zero-Shot Peg Insertion by Touching Intersections of Peg and Hole

Masaru Yajima, Yuma Shin, Rei Kawakami, Asako Kanezaki, Kei Ota

Comments Accepted by ICRA 2026 (IEEE International Conference on Robotics and Automation)

2603.03623 2026-03-05 cs.CL econ.EM

A Neural Topic Method Using a Large-Language-Model-in-the-Loop for Business Research

Stephan Ludwig, Peter J. Danaher, Xiaohao Yang

2603.03621 2026-03-05 cs.LG cs.CV cs.NA math.NA math.OC stat.ML

Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

Blaine Quackenbush, Paul J. Atzberger

Comments related open source software see https://web.atzberger.org/

2603.03618 2026-03-05 cs.CV

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Juampablo E. Heras Rivera, Daniel K. Low, Xavier Xiong, Jacob J. Ruzevick, Daniel D. Child, Wen-wai Yim, Mehmet Kurt, Asma Ben Abacha

Comments Under review, MICCAI 2026

2603.03617 2026-03-05 cs.CV

RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented Generation

Hao Li, Yuhao Wang, Wenning Hao, Pingping Zhang, Dong Wang, Huchuan Lu

Comments This work is accepted by CVPR2026. More modifications may be performed

2603.03616 2026-03-05 cs.CV

LeafInst - Unified Instance Segmentation Network for Fine-Grained Forestry Leaf Phenotype Analysis: A New UAV based Benchmark

Taige Luo, Junru Xie, Chenyang Fan, Bingrong Liu, Ruisheng Wang, Yang Shao, Sheng Xu, Lin Cao

2603.03615 2026-03-05 cs.CV

Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

Haotian Zhang, Feiyue Long, Yixin Yu, Jian Xue, Haocheng Tang, Tongda Xu, Zhenning Shi, Yan Wang, Siwei Ma, Jiaqi Zhang

Comments Accepted by CVPR 2026