arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24836 2026-03-31 cs.CV

WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching

Yihan Wang, Jia Deng

详情

英文摘要

We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D (BP-0.5), Middlebury (RMSE), and KITTI (all metrics), reducing the zero-shot error by 81% on ETH3D, while being 1.8-6.7x faster than competitive methods. Code and model weights are available at https://github.com/princeton-vl/WAFT-Stereo.

URL PDF HTML ☆

赞 0 踩 0

2603.24257 2026-03-31 cs.CV

Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

Tommaso Galliena, Stefano Rosa, Tommaso Apicella, Pietro Morerio, Alessio Del Bue, Lorenzo Natale

Comments 24 pages, 7 figures, 7 tables (including Supplementary Materials)

2603.24037 2026-03-31 cs.CV

A$^3$: Towards Advertising Aesthetic Assessment

Kaiyuan Ji, Yixuan Gao, Lu Sun, Yushuo Zheng, Zijian Chen, Jianbo Zhang, Xiangyang Zhu, Yuan Tian, Zicheng Zhang, Guangtao Zhai

Comments Accepted to CVPR 2026

2603.23562 2026-03-31 cs.LG cs.AI

Synthetic Mixed Training: Scaling Parametric Knowledge Acquisition Beyond RAG

Seungju Han, Konwoo Kim, Chanwoo Park, Benjamin Newman, Suhas Kotha, Jaehun Jung, James Zou, Yejin Choi

2603.22859 2026-03-31 cs.RO

DecompGrind: A Decomposition Framework for Robotic Grinding via Cutting-Surface Planning and Contact-Force Adaptation

Shunsuke Araki, Takumi Hachimine, Yuki Saito, Kouhei Ohnishi, Jun Morimoto, Takamitsu Matsubara

Comments Under review

2603.22300 2026-03-31 cs.LG cs.AI

Scaling Attention via Feature Sparsity

Yan Xie, Tiansheng Wen, Tangda Huang, Bo Chen, Chenyu You, Stefanie Jegelka, Yifei Wang

Comments 26 pages, 11 figures; Accepted at ICLR 2026

2603.19292 2026-03-31 cs.CL cs.AI

Automatic Analysis of Collaboration Through Human Conversational Data Resources: A Review

Yi Yu, Maria Boritchev, Chloé Clavel

Comments 9 pages

2603.16430 2026-03-31 cs.CL cs.AI

EngGPT2: Sovereign, Efficient and Open Intelligence

G. Ciarfaglia, A. Rosanova, S. Cipolla, J. Bartoli, A. Di Domenico, C. Fioroni, A. Fontana, M. R. Scoleri, M. I. Mone, D. Franchi, M. C. Del Gaudio, A. Leodori, F. Cinti, M. Capozzi, C. Baston, F. Picariello, M. Gabusi, S. Bonura, V. Morreale, I. Bailo

2603.16407 2026-03-31 cs.RO

Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression

Oscar Pang, Lisa Coiffard, Paul Templier, Luke Beddow, Kamil Dreczkowski, Antoine Cully

Comments 8 pages, 5 figures

2603.14830 2026-03-31 cs.LG stat.ML

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Yuri Kinoshita, Naoki Nishikawa, Taro Toyoizumi

2603.14790 2026-03-31 cs.CV

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Shufeng Nan, Mengtian Li, Sixiao Zheng, Yuwei Lu, Han Zhang, Yanwei Fu

2603.14354 2026-03-31 cs.LG cs.AI cs.RO

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

2603.13793 2026-03-31 cs.CL cs.AI

GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

Lawrence Adu Gyamfi, Paul Azunre, Stephen Edward Moore, Joel Budu, Akwasi Asare, Mich-Seth Owusu, Jonathan Ofori Asiamah

2603.13742 2026-03-31 cs.LG stat.ML

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

2603.12057 2026-03-31 cs.CV cs.AI

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

Yanghao Wang, Ziqi Jiang, Zhen Wang, Long Chen

2603.11382 2026-03-31 cs.AI cs.ET cs.LG quant-ph

Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

Christopher Altman

Comments 22 pages, 7 figures. v4 adds reference to the Continuation Observatory website as a live test laboratory in the replication/code availability and conclusion sections; no new experiments; empirical results and core conclusions unchanged

2603.09326 2026-03-31 cs.CV

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

Tengjin Weng, Wenhao Jiang, Jingyi Wang, Ming Li, Lin Ma, Zhong Ming

Comments accepted by CVPR 2026

2603.09104 2026-03-31 cs.CV

Training-free Motion Factorization for Compositional Video Generation

Zixuan Wang, Ziqin Zhou, Feng Chen, Duo Peng, Yixin Hu, Changsheng Li, Yinjie Lei

Comments Accepted by CVPR2026

2603.09094 2026-03-31 cs.CV

Chain of Event-Centric Causal Thought for Physically Plausible Video Generation

Zixuan Wang, Yixin Hu, Haolan Wang, Feng Chen, Yan Liu, Wen Li, Yinjie Lei

Comments Accepted by CVPR2026

2603.08343 2026-03-31 cs.LG cs.CL

Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers

Shubham Aggarwal, Lokendra Kumar

Comments 10 pages, 9 figures, 4 tables

2603.07554 2026-03-31 cs.CL cs.AI cs.SD

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

Rishikesh Kumar Sharma, Safal Narshing Shrestha, Jenny Poudel, Rupak Tiwari, Arju Shrestha, Rupak Raj Ghimire, Bal Krishna Bal

Comments Accepted in CHiPSAL@LREC 2026

2603.02842 2026-03-31 cs.CL

A Browser-based Open Source Assistant for Multimodal Content Verification

Rosanna Milner, Michael Foster, Twin Karmakharm, Olesya Razuvayevskaya, Ian Roberts, Valentin Porcellini, Denis Teyssou, Kalina Bontcheva

2603.01506 2026-03-31 cs.CV

OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar

Jianqiang Ren, Lin Liu, Steven Hoi

2603.01331 2026-03-31 cs.CL cs.AI cs.LG

MetaState: Persistent Working Memory Enhances Reasoning in Discrete Diffusion Language Models

Kejing Xia, Mingzhe Li, Lixuan Wei, Zhenbang Du, Xiangchi Yuan, Dachuan Shi, Qirui Jin, Wenke Lee

2603.01305 2026-03-31 cs.CV cs.AI

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

Zhen Qu, Xian Tao, Xiaoyi Bao, Dingrong Wang, ShiChen Qu, Zhengtao Zhang, Xingang Wang

2603.00175 2026-03-31 cs.CV

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

Giorgio Roffo, Hazem Abdelkawy, Nilli Lavie, Luke Palmer

Comments This work was initiated and primarily carried out while working at MindVisionLabs. We gratefully acknowledge the support of Toyota Motor Europe (TME) and Equixly API Security for this work

2602.23165 2026-03-31 cs.CV

DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation

Yichen Peng, Jyun-Ting Song, Siyeol Jung, Ruofan Liu, Haiyang Liu, Xuangeng Chu, Ruicong Liu, Erwin Wu, Hideki Koike, Kris Kitani

Comments 13 pages, 9 figures

2602.22601 2026-03-31 cs.LG cs.CV

$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren, Bhiksha Raj, Khoa Luu

Comments Accepted to CVPR'26

2602.19575 2026-03-31 cs.CV

ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization

Minseo Kim, Minchan Kwon, Dongyeun Lee, Yunho Jeon, Junmo Kim

Comments Accepted to CVPR 2026

2602.18792 2026-03-31 cs.CV

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose

Comments Accepted by CVPR2026