arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.15541 2026-03-18 cs.RO

CompliantVLA-adaptor: VLM-Guided Variable Impedance Action for Safe Contact-Rich Manipulation

Heng Zhang, Wei-Hsing Huang, Qiyi Tong, Gokhan Solak, Puze Liu, Kaidi Zhang, Sheng Liu, Jan Peters, Yu She, Arash Ajoudani

Comments under review

详情

英文摘要

We propose a CompliantVLA-adaptor that augments the state-of-the-art Vision-Language-Action (VLA) models with vision-language model (VLM)-informed context-aware variable impedance control (VIC) to improve the safety and effectiveness of contact-rich robotic manipulation tasks. Existing VLA systems (e.g., RDT, Pi0.5, OpenVLA-oft) typically output position, but lack force-aware adaptation, leading to unsafe or failed interactions in physical tasks involving contact, compliance, or uncertainty. In the proposed CompliantVLA-adaptor, a VLM interprets task context from images and natural language to adapt the stiffness and damping parameters of a VIC controller. These parameters are further regulated using real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms the VLA baselines on a suite of complex contact-rich tasks, both in simulation and the real world, with improved success rates and reduced force violations. This work presents a promising path towards a safe foundation model for physical contact-rich manipulation. We release our code, prompts, and force-torque-impedance-scenario context datasets at https://sites.google.com/view/compliantvla.

URL PDF HTML ☆

赞 0 踩 0

2601.14485 2026-03-18 cs.AI

Scalable Knee-Point Guided Activity Group Selection in Multi-Tree Genetic Programming for Dynamic Multi-Mode Project Scheduling

Yuan Tian, Yi Mei, Mengjie Zhang

Comments 17 pages, 9 figures. This paper has been accepted by the Pacific Rim International Conference Series on Artificial Intelligence (PRICAI) 2025 but not published yet. This is the submission to review version, not the camera-ready version

2601.14440 2026-03-18 cs.AI cs.CL cs.LG

VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration

Saeed Khaki, Ashudeep Singh, Nima Safaei, Kamal Ginotra

2601.13029 2026-03-18 cs.CV

Think3D: Thinking with Space for Spatial Reasoning

Zaibin Zhang, Yuhan Wu, Lianjie Jia, Yifan Wang, Zhongbo Zhang, Yijiang Li, Binghao Ran, Fuxi Zhang, Zhuohan Sun, Zhenfei Yin, Lijun Wang, Huchuan Lu

2601.12131 2026-03-18 cs.LG cs.HC

SolarGPT-QA: A Domain-Adaptive Large Language Model for Educational Question Answering in Space Weather and Heliophysics

Santosh Chapagain, MohammadReza EskandariNasab, Onur Vural, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi

Comments This is preliminary work towards a broader SolarGPT framework

2601.10477 2026-03-18 cs.CV cs.AI cs.CY

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Yu Wang, Yi Wang, Rui Dai, Yujie Wang, Kaikui Liu, Xiangxiang Chu, Yansheng Li

2601.07985 2026-03-18 cs.CL

Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset

Z. Melce Hüsünbeyi, Virginie Mouilleron, Leonie Uhling, Daniel Foppe, Tatjana Scheffler, Djamé Seddah

2601.00988 2026-03-18 cs.CV

Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency Loss

Lin Xi, Yingliang Ma, Xiahai Zhuang

2601.00702 2026-03-18 cs.RO cs.CV

DefVINS: Visual-Inertial Odometry for Deformable Scenes

Samuel Cerezo, Javier Civera

Comments 4 figures, 2 tables. Submitted to RA-L

2512.22588 2026-03-18 cs.RO

Real-Time Quasi-Static Modeling of UAV Tether Aerodynamics

Max Beffert, Andreas Zell

2512.22223 2026-03-18 cs.LG cs.AI cs.CR cs.NI

ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis

Shaghayegh Shajarian, Kennedy Marsh, James Benson, Sajad Khorsandroo, Mahmoud Abdelsalam

Comments Accepted to ICNC 2026. This is the accepted author manuscript

2512.19941 2026-03-18 cs.CV cs.AI cs.LG

Block-Recurrent Dynamics in Vision Transformers

Mozes Jacobs, Thomas Fel, Richard Hakim, Alessandra Brondetta, Demba Ba, T. Andy Keller

Comments 25 pages, 15 figures

详情

英文摘要

As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical structure, there is no settled framework that interprets Transformer depth as a well-characterized flow. In this work, we introduce the Block-Recurrent Hypothesis (BRH), arguing that trained ViTs admit a block-recurrent depth structure such that the computation of the original $L$ blocks can be accurately rewritten using only $k \ll L$ distinct blocks applied recurrently. Across diverse ViTs, between-layer representational similarity matrices suggest few contiguous phases. To determine whether these phases reflect genuinely reusable computation, we train block-recurrent surrogates of pretrained ViTs: Recurrent Approximations to Phase-structured TransfORmers (Raptor). In small-scale, we demonstrate that stochastic depth and training promote recurrent structure and subsequently correlate with our ability to accurately fit Raptor. We then provide an empirical existence proof for BRH by training a Raptor model to recover $96\%$ of DINOv2 ImageNet-1k linear probe accuracy in only 2 blocks at equivalent runtime. Finally, we leverage our hypothesis to develop a program of Dynamical Interpretability. We find i) directional convergence into class-dependent angular basins with self-correcting trajectories under small perturbations, ii) token-specific dynamics, where cls executes sharp late reorientations while patch tokens exhibit strong late-stage coherence toward their mean direction, and iii) a collapse to low rank updates in late depth, consistent with convergence to low-dimensional attractors. Altogether, we find a compact recurrent program emerges along ViT depth, pointing to a low-complexity normative solution that enables these models to be studied through principled dynamical systems analysis.

URL PDF HTML ☆

赞 0 踩 0

2512.16635 2026-03-18 cs.CV cs.LG

SARMAE: Masked Autoencoder for SAR Representation Learning

Danxu Liu, Di Wang, Hebaixu Wang, Haoyang Chen, Wentao Jiang, Yilin Cheng, Haonan Guo, Wei Cui, Jing Zhang

Comments The paper is accepted by CVPR 2026! Code and models will be available at https://github.com/MiliLab/SARMAE

2512.14001 2026-03-18 cs.RO cs.CV

CLAIM: Camera-LiDAR Alignment with Intensity and Monodepth

Zhuo Zhang, Yonghui Liu, Meijie Zhang, Feiyang Tan, Yikang Ding

Comments Accepted by IROS 2025

2512.13644 2026-03-18 cs.RO cs.AI cs.CV

World Models for Learning Dexterous Hand-Object Interactions from Human Videos

Raktim Gautam Goswami, Amir Bar, David Fan, Tsung-Yen Yang, Gaoyue Zhou, Prashanth Krishnamurthy, Michael Rabbat, Farshad Khorrami, Yann LeCun

2512.12633 2026-03-18 cs.CV cs.AI

DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model

Zhou Tao, Shida Wang, Yongxiang Hua, Haoyu Cao, Linli Xu

2512.11508 2026-03-18 cs.CV

On Geometric Understanding and Learned Priors in Feed-forward 3D Reconstruction Models

Jelena Bratulić, Sudhanshu Mittal, Thomas Brox, Christian Rupprecht

2512.11237 2026-03-18 cs.CV cs.GR

WildCap: Facial Albedo Capture in the Wild via Hybrid Inverse Rendering

Yuxuan Han, Xin Ming, Tianxiao Li, Zhuofan Shen, Qixuan Zhang, Lan Xu, Feng Xu

Comments CVPR 2026. project page: https://yxuhan.github.io/WildCap/index.html; code: https://github.com/yxuhan/WildCap

2512.11141 2026-03-18 cs.CV

Learning complete and explainable visual representations from itemized text supervision

Yiwei Lyu, Chenhui Zhao, Soumyanil Banerjee, Shixuan Liu, Akshay Rao, Akhil Kondepudi, Honglak Lee, Todd C. Hollon

2512.11130 2026-03-18 cs.CV cs.RO

Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

Bowen Wen, Shaurya Dewan, Stan Birchfield

2512.11098 2026-03-18 cs.CV cs.RO

Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description

Nazanin Mahjourian, Vinh Nguyen

2512.09736 2026-03-18 cs.AI

Analyzing Planner Design Trade-offs for MAPF under ADG-based Realistic Execution

Jingtian Yan, Zhifei Li, William Kang, Stephen F. Smith, Jiaoyang Li

2512.07107 2026-03-18 cs.CV

COREA: Coupled Relightable 3D Gaussians and SDFs for Efficient Normal Alignment

Jaeyoon Lee, Hojoon Jung, Sungtae Hwang, Jihyong Oh, Jongwon Choi

Comments Project page: https://cau-vilab.github.io/COREA/

2512.06330 2026-03-18 cs.CV

S2WMamba: A Wavelet-Assisted Mamba-Based Dual-Branch Network For Pansharpening

Haoyu Zhang, Junhan Luo, Yugang Cao, Jie Huang, Liangjian-Deng

2512.05663 2026-03-18 cs.CV

LeAD-M3D: Leveraging Asymmetric Distillation for Real-Time Monocular 3D Detection

Johannes Meier, Jonathan Michel, Oussema Dhaouadi, Yung-Hsu Yang, Christoph Reich, Zuria Bauer, Stefan Roth, Marc Pollefeys, Jacques Kaiser, Daniel Cremers

Comments Johannes Meier and Jonathan Michel - both authors contributed equally. Project page: https://deepscenario.github.io/LeAD-M3D/

2512.04761 2026-03-18 cs.CV

Order Matters: 3D Shape Generation from Sequential VR Sketches

Yizi Chen, Sidi Wu, Tianyi Xiao, Nina Wiedemann, Loic Landrieu

Comments Accepted at CVPR 2026

2512.01208 2026-03-18 cs.LG cs.AI cs.CL

Language as a Wave Phenomenon: Semantic Phase Locking and Interference in Neural Networks

Alper Yıldırım, İbrahim Yücedağ

Comments Reframed as controlled experimental study, removed unsupported claims, added explicit hypotheses and statistical tests. Core results unchanged

2512.00698 2026-03-18 cs.LG stat.ML

Flow Matching for Tabular Data Synthesis

Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth

Comments Published at TMLR

2511.21083 2026-03-18 cs.RO

Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry

Feiyang Pan, Shenghe Zheng, Chunyan Yin, Guangbin Dou

Comments Accepted to the CVPR 2026 Main Track

2511.19356 2026-03-18 cs.CV

Rethinking Reward Signals in Video GRPO: When Scores Become Targets

Rui Li, Yuanzhi Liang, Ziqi Ni, Haibing Huang, Chi Zhang, Xuelong Li