arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06450 2026-03-23 cs.RO

Data Analogies Enable Efficient Cross-Embodiment Transfer

Jonathan Yang, Chelsea Finn, Dorsa Sadigh

Comments 14 pages, 11 Figures, 6 Tables

详情

英文摘要

Generalist robot policies are trained on demonstrations collected across a wide variety of robots, scenes, and viewpoints. Yet it remains unclear how to best organize and scale such heterogeneous data so that it genuinely improves performance in a given target setting. In this work, we ask: what form of demonstration data is most useful for enabling transfer across robot set-ups? We conduct controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compare the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways. Our simulated experiments show that while perceptual shifts such as viewpoint benefit most from broad diversity, morphology shifts benefit far less from unstructured diversity and instead see the largest gains from data analogies, i.e. paired demonstrations that align scenes, tasks, and/or trajectories across different embodiments. Informed by the simulation results, we improve real-world cross-embodiment transfer success by an average of $22.5\%$ over large-scale, unpaired datasets by changing only the composition of the data.

URL PDF HTML ☆

赞 0 踩 0

2603.04803 2026-03-23 cs.CV cs.AI cs.LG

Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation

Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Ruochen Cui, Xilin Zhao, Qingming Huang

2603.04035 2026-03-23 cs.LG

mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple Silicon

Han Xiao

Comments 8 pages, 8 figures. Software: https://github.com/hanxiao/mlx-vis. v3: VRAM optimization, updated benchmarks, added LocalMAP and MMAE methods

2603.01211 2026-03-23 cs.AI cs.CL cs.CY

A Unified Framework to Quantify Cultural Intelligence of AI

Sunipa Dev, Vinodkumar Prabhakaran, Rutledge Chin Feman, Aida Davani, Remi Denton, Charu Kalia, Piyawat Lertvittayakumjorn, Madhurima Maji, Rida Qadri, Negar Rostamzadeh, Renee Shelby, Romina Stella, Hayk Stepanyan, Erin van Liemt, Aishwarya Verma, Oscar Wahltinez, Edem Wornyo, Andrew Zaldivar, Saška Mojsilović

2603.01038 2026-03-23 cs.CV cs.AI

From Intuition to Investigation: A Tool-Augmented Reasoning MLLM Framework for Generalizable Face Anti-Spoofing

Haoyuan Zhang, Keyao Wang, Guosheng Zhang, Haixiao Yue, Zhiwen Tan, Siran Peng, Tianshuo Zhang, Xiao Tan, Kunbin Chen, Wei He, Jingdong Wang, Ajian Liu, Xiangyu Zhu, Zhen Lei

Comments Accepted by CVPR 2026

2603.00523 2026-03-23 cs.CL cs.AI cs.LG

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Swapnil Parekh

2603.00518 2026-03-23 cs.CV

Vision-TTT: Efficient and Expressive Visual Representation Learning with Test-Time Training

Quan Kong, Yanru Xiao, Yuhao Shen, Cong Wang

2602.20945 2026-03-23 cs.CL cs.AI

The Art of Efficient Reasoning: Data, Reward, and Optimization

Taiqiang Wu, Zenan Xu, Bo Zhou, Ngai Wong

Comments Tech Report, Insights on Efficient Reasoning via Reward Shaping

2602.20930 2026-03-23 cs.CV

Computing a Characteristic Orientation for Rotation-Independent Image Analysis

Cristian Valero-Abundio, Emilio Sansano-Sansano, Raúl Montoliu, Marina Martínez García

Comments Accepted for publication at the 21st International Conference on Computer Vision Theory and Applications (VISAPP 2026). 8 pages

2602.14997 2026-03-23 cs.LG cs.AI

Spectral Convolution on Orbifolds for Geometric Deep Learning

Tim Mangliers, Bernhard Mössner, Benjamin Himpel

Comments 17 pages, 5 figures, minor spelling and layout improvements

2602.14855 2026-03-23 cs.LG cs.SI math.CO

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe, Paweł Prałat, François Théberge

Comments 14 pages, 3 figures. v2 fixes a bug in the code provided in the appendix. The experiments and figures were not affected

2602.10525 2026-03-23 cs.CL cs.AI cs.LG

LHAW: Controllable Underspecification for Long-Horizon Tasks

George Pu, Michael S. Lee, Udari Madhushani Sehwag, David J. Lee, Bryan Zhu, Yash Maurya, Mohit Raghavendra, Yuan Xue, Samuel Marc Denton

2602.10052 2026-03-23 cs.CV

Spatio-Temporal Attention for Consistent Video Semantic Segmentation in Automated Driving

Serin Varghese, Kevin Ross, Fabian Hueger, Kira Maag

2602.07451 2026-03-23 cs.CL

DLLM Agent: See Farther, Run Faster

Huiling Zhen, Weizhe Lin, Renxi Liu, Kai Han, Yiming Li, Yuchuan Tian, Hanting Chen, Xiaoguang Li, Xiaosong Li, Chen Chen, Xianzhi Yu, Mingxuan Yuan, Youliang Yan, Peifeng Qin, Jun Wang, Yu Wang, Dacheng Tao, Yunhe Wang

详情

英文摘要

Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic multi-step decision making remain underexplored. We ask a concrete question: when the generation paradigm is changed but the agent framework and supervision are held fixed, do diffusion backbones induce systematically different planning and tool-use behaviors, and do these differences translate into end-to-end efficiency gains? We study this in a controlled setting by instantiating DLLM and AR backbones within the same agent workflow (DeepDiver) and performing matched agent-oriented fine-tuning on the same trajectory data, yielding diffusion-backed DLLM Agents and directly comparable AR agents. Across benchmarks and case studies, we find that, at comparable accuracy, DLLM Agents are on average over 30% faster end to end than AR agents, with some cases exceeding 8x speedup. Conditioned on correct task completion, DLLM Agents also require fewer interaction rounds and tool invocations, consistent with higher planner hit rates that converge earlier to a correct action path with less backtracking. We further identify two practical considerations for deploying diffusion backbones in tool-using agents. First, naive DLLM policies are more prone to structured tool-call failures, necessitating stronger tool-call-specific training to emit valid schemas and arguments. Second, for multi-turn inputs interleaving context and action spans, diffusion-style span corruption requires aligned attention masking to avoid spurious context-action information flow; without such alignment, performance degrades. Finally, we analyze attention dynamics across workflow stages and observe paradigm-specific coordination patterns, suggesting stronger global planning signals in diffusion-backed agents.

URL PDF HTML ☆

赞 0 踩 0

2602.02632 2026-03-23 cs.LG cs.AI

Performance of Small Language Model Pretraining on FABRIC: An Empirical Study

Praveen Rao

2602.02500 2026-03-23 cs.LG cs.AI cs.NA math.NA

IFNSO: Iteration-Free Newton-Schulz Orthogonalization

Chen Hu, Qianxi Zhao, Xiaochen Yuan, Hong Zhang, Ding Yuan, Yanbin Wu, Xiying Li

Comments The paper is under consideration at Pattern Recognition Letters

2602.02142 2026-03-23 cs.RO cs.CV

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation

Ruiteng Zhao, Wenshuo Wang, Yicheng Ma, Xiaocong Li, Francis E. H. Tay, Marcelo H. Ang, Haiyue Zhu

Comments ICRA 2026 Accepted

2601.20568 2026-03-23 cs.LG

Reinforcement Unlearning via Group Relative Policy Optimization

Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci

Comments Accepted to ICLR 2026

2601.18734 2026-03-23 cs.LG cs.CL

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, Aditya Grover

Comments code is released here: https://github.com/siyan-zhao/OPSD

2601.12781 2026-03-23 cs.AI cs.CV

VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

Hyejin Park, Junhyuk Kwon, Suha Kwak, Jungseul Ok

Comments Accepted to CVPR 2026

2601.03018 2026-03-23 cs.CL cs.AI cs.LG

Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis

Choonghan Kim, Hyunmin Hwang, Hangeol Chang, Jaemin Kim, Jinse Park, Jae-Sung Lim, Jong Chul Ye

2512.24903 2026-03-23 cs.CV cs.CE

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

Zichen Tang, Haihong E, Rongjin Li, Jiacheng Liu, Linwei Jia, Zhuodi Hao, Zhongjun Yang, Yuanze Li, Haolin Tian, Xinyi Hu, Peizhi Zhao, Yuan Liu, Zhengyu Wang, Xianghe Wang, Yiling Huang, Xueyuan Lin, Ruofei Bai, Zijian Xie, Qian Huang, Ruining Cao, Haocheng Gao

Comments Accepted by AAAI-26 Main Track

2512.21375 2026-03-23 cs.RO cs.AI cs.SY eess.SY

Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks

Yuanshuang Fu, Qianyao Wang, Qihao Wang, Bonan Zhang, Jiaxin Zhao, Yiming Cao, Zhijun Li

2512.19799 2026-03-23 cs.AI hep-lat

PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research

Tingjia Miao, Jiawen Dai, Jingkun Liu, Jinxin Tan, Muhua Zhang, Wenkai Jin, Yuwen Du, Tian Jin, Xianghe Pang, Zexi Liu, Tu Guo, Zhengliang Zhang, Yunjie Huang, Shuo Chen, Rui Ye, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen

Comments 32 pages, 10 figures

2512.17432 2026-03-23 cs.CV

AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments

Georgios Simantiris, Konstantinos Bacharidis, Apostolos Papanikolaou, Petros Giannakakis, Costas Panagiotakis

Comments 36 pages, 19 figures, 8 tables

2512.17276 2026-03-23 cs.LG

Alzheimer's Disease Brain Network Mining

Alireza Moayedikia, Sara Fin

Comments We found bugs in the code that affected the results. The results are no longer valid. so we decided to no longer pursue publishing this paper

2512.15160 2026-03-23 cs.CV

EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence

Jiaxu Wan, Xu Wang, Mengwei Xie, Hang Zhang, Mu Xu, Yang Han, Hong Zhang, Ding Yuan, Yifan Yang

Comments Accepted by CVPR 2026

2512.07558 2026-03-23 cs.LG cs.CV

ReLaX: Reasoning with Latent Exploration for Large Reasoning Models

Shimin Zhang, Xianwei Chen, Yufan Shen, Ziyuan Ye, Jibin Wu

2512.01183 2026-03-23 cs.CL cs.AI

TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness

Yongxin Zhou, Philippe Mulhem, Didier Schwab

Comments LREC 2026, Palma, Mallorca (Spain), 11-16 May 2026

2511.22242 2026-03-23 cs.CV

Rethinking Test Time Scaling for Flow-Matching Generative Models

Qingtao Yu, Changlin Song, Minghao Sun, Zhengyang Yu, Vinay Kumar Verma, Soumya Roy, Sumit Negi, Hongdong Li, Dylan Campbell