arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.22286 2026-03-24 cs.CV cs.AI cs.CL cs.LG

WorldCache: Content-Aware Caching for Accelerated Video World Models

Umair Nawaz, Ahmed Heakl, Ufaq Khan, Abdelrahman Shaker, Salman Khan, Fahad Shahbaz Khan

Comments 33 Pages

详情

英文摘要

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{https://umair1221.github.io/World-Cache/}{World-Cache}.

URL PDF HTML ☆

赞 0 踩 0

2603.22283 2026-03-24 cs.CV cs.AI cs.GR cs.LG

End-to-End Training for Unified Tokenization and Latent Denoising

Shivam Duggal, Xingjian Bai, Zongze Wu, Richard Zhang, Eli Shechtman, Antonio Torralba, Phillip Isola, William T. Freeman

Comments First two authors contributed equally. Project: https://xingjianbai.com/unite-tokenization-generation/ Code: https://github.com/ShivamDuggal4/UNITE-tokenization-generation

2603.22282 2026-03-24 cs.CV cs.AI

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

Ziyi Wang, Xinshun Wang, Shuang Chen, Yang Cong, Mengyuan Liu

Comments 42 pages, 16 figures

2603.22280 2026-03-24 cs.CV cs.RO

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

Zhide Zhong, Junfeng Li, Junjie He, Haodong Yan, Xin Gong, Guanyi Zhao, Yingjie Cai, Jiantao Gao, Xu Yan, Bingbing Liu, Yingcong Chen, Liuqing Yang, Haoang Li

2603.22279 2026-03-24 cs.CV cs.AI

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

Haoyu Zhen, Xiaolong Li, Yilin Zhao, Han Zhang, Sifei Liu, Kaichun Mo, Chuang Gan, Subhashree Radhakrishnan

2603.22276 2026-03-24 cs.LG stat.ML

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

Alexandra Zelenin, Alexandra Zhuravlyova

Comments 30 pages, 15 figures, 15 tables, including appendices. Code and data at https://github.com/sockeye44/dorafactors

2603.22275 2026-03-24 cs.CV

Repurposing Geometric Foundation Models for Multi-view Diffusion

Wooseok Jang, Seonghu Jeon, Jisang Han, Jinhyeok Choi, Minkyung Kwon, Seungryong Kim, Saining Xie, Sainan Liu

Comments project website: https://cvlab-kaist.github.io/GLD/

2603.22271 2026-03-24 cs.CV

DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution

Zhengyao Lv, Menghan Xia, Xintao Wang, Kwan-Yee K. Wong

Comments Accepted to CVPR 2026

2603.22270 2026-03-24 cs.CV

GenOpticalFlow: A Generative Approach to Unsupervised Optical Flow Learning

Yixuan Luo, Feng Qiao, Zhexiao Xiong, Yanjing Li, Nathan Jacobs

2603.22264 2026-03-24 cs.RO

UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

Gu Zhang, Qicheng Xu, Haozhe Zhang, Jianhan Ma, Long He, Yiming Bao, Zeyu Ping, Zhecheng Yuan, Chenhao Lu, Chengbo Yuan, Tianhai Liang, Xiaoyu Tian, Maanping Shao, Feihong Zhang, Mingyu Ding, Yang Gao, Hao Zhao, Hang Zhao, Huazhe Xu

Comments Accepted by CVPR 2026

2603.22263 2026-03-24 cs.RO

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

Hung-Chieh Fang, Amber Xie, Jennifer Grannen, Kenneth Llontop, Dorsa Sadigh

Comments Website: https://dexdrummer.github.io/

2603.22262 2026-03-24 cs.CG math.CO

Flip Distance of Non-Crossing Spanning Trees: NP-Hardness and Improved Bounds

Håvard Bakke Bjerkevik, Joseph Dorfer, Linda Kleist, Torsten Ueckerdt, Birgit Vogtenhuber

详情

英文摘要

We consider the problem of reconfiguring non-crossing spanning trees on point sets. For a set $P$ of $n$ points in general position in the plane, the flip graph $F(P)$ has a vertex for each non-crossing spanning tree on $P$ and an edge between any two spanning trees that can be transformed into each other by the exchange of a single edge. This flip graph has been intensively studied, lately with an emphasis on determining its diameter diam$(F(P))$ for sets $P$ of $n$ points in convex position. The current best bounds are $\frac{14}{9}n-O(1) \leq$ diam$(F(P))<\frac{15}{9}n-3$ [Bjerkevik, Kleist, Ueckerdt, and Vogtenhuber; SODA 2025]. The crucial tool for both the upper and lower bound are so-called *conflict graphs*, which the authors stated might be the key ingredient for determining the diameter (up to lower-order terms). In this paper, we pick up the concept of conflict graphs and show that this tool is even more versatile than previously hoped. As our first main result, we use conflict graphs to show that computing the flip distance between two non-crossing spanning trees is NP-hard, even for point sets in convex position. Interestingly, the result still holds for more constrained flip operations, concretely, compatible flips (where the removed and the added edge do not cross) and rotations (where the removed and the added edge share an endpoint). Extending the line of research from [BKUV SODA25], we present new insights on the diameter of the flip graph. Their lower bound is based on a constant-size pair of trees, one of which is *stacked*. We show that if one of the trees is stacked, then the lower bound is indeed optimal up to a constant term, that is, there exists a flip sequence of length at most $\frac{14}{9}(n-1)$ to any other tree. Lastly, we improve the lower bound on the diameter of the flip graph $F(P)$ for $n$ points in convex position to $\frac{11}{7}n-o(n)$.

URL PDF HTML ☆

赞 0 踩 0

2603.22260 2026-03-24 cs.CL

Greater accessibility can amplify discrimination in generative AI

Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou, Valentin Hofmann, Katharina von der Wense, Anne Lauscher

Comments Preprint

2603.22254 2026-03-24 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG physics.atm-clus physics.chem-ph

Characterizing High-Capacity Janus Aminobenzene-Graphene Anode for Sodium-Ion Batteries with Machine Learning

Claudia Islas-Vargas, L. Ricardo Montoya, Carlos A. Vital-José, Oliver T. Unke, Klaus-Robert Müller, Huziel E. Sauceda

Comments 8 pages, 5 figures, research article

2603.22252 2026-03-24 eess.AS cs.SD

SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation

Lucas H. Ueda, João G. T. Lima, Pedro R. Corrêa, Flávio O. Simões, Mário U. Neto, Paula D. P. Costa

Comments Submitted to Interspeech 2026

2603.22251 2026-03-24 cs.DC

exaCB: Reproducible Continuous Benchmark Collections at Scale Leveraging an Incremental Approach

Jayesh Badwaik, Mathis Bode, Michal Rajski, Andreas Herten

2603.22249 2026-03-24 cs.CV

EgoGroups: A Benchmark For Detecting Social Groups of People in the Wild

Jeffri Murrugarra-Llerena, Pranav Chitale, Zicheng Liu, Kai Ao, Yujin Ham, Guha Balakrishnan, Paola Cascante-Bonilla

Comments Project Page: https://lab-spell.github.io/EgoGroups/

2603.22248 2026-03-24 cs.LG cs.AI cs.IT math.IT stat.ML

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Changxiao Cai, Gen Li

2603.22240 2026-03-24 cs.DS cs.CC

A Dividing Line for Structural Kernelization of Component Order Connectivity via Distance to Bounded Pathwidth

Jakob Greilhuber, Roohani Sharma

Comments Abstract shortened due to arXiv length requirements

2603.22237 2026-03-24 cs.IT math.IT physics.soc-ph

Structure-aware divergences for comparing probability distributions

Rohit Sahasrabuddhe, Renaud Lambiotte

2603.22231 2026-03-24 cs.IR cs.AI cs.GT cs.LG

One Model, Two Markets: Bid-Aware Generative Recommendation

Yanchen Jiang, Zhe Feng, Christopher P. Mah, Aranyak Mehta, Di Wang

2603.22230 2026-03-24 cs.CV

Riverine Land Cover Mapping through Semantic Segmentation of Multispectral Point Clouds

Sopitta Thurachen, Josef Taher, Matti Lehtomäki, Leena Matikainen, Linnea Blåfield, Mikel Calle Navarro, Antero Kukko, Tomi Westerlund, Harri Kaartinen

2603.22229 2026-03-24 cs.CV

Benchmarking Deep Learning Models for Aerial LiDAR Point Cloud Semantic Segmentation under Real Acquisition Conditions: A Case Study in Navarre

Alex Salvatierra, José Antonio Sanz, Christian Gutiérrez, Mikel Galar

Comments 6 pages, 2 figures

2603.22228 2026-03-24 cs.CV cs.AI

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Sashuai Zhou, Qiang Zhou, Junpeng Ma, Yue Cao, Ruofan Hu, Ziang Zhang, Xiaoda Yang, Zhibin Wang, Jun Song, Cheng Yu, Bo Zheng, Zhou Zhao

2603.22227 2026-03-24 cs.HC cs.AI cs.CL

Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research

David M. Markowitz

2603.22222 2026-03-24 eess.SY cs.SY

A Portfolio-Level Optimization Framework for Coordinated Market Participation and Operational Scheduling of Hydrogen-Centric Companies

Seyed Amir Mansouri, Kenneth Bruninx

2603.22220 2026-03-24 cs.DB

Accelerating Fresh Data Exploration with Fluid ETL Pipelines

Maxwell Norfolk, Dong Xie

2603.22219 2026-03-24 cs.LG stat.ML

Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting

Qilin Wang

2603.22216 2026-03-24 cs.CL cs.LG

Gumbel Distillation for Parallel Text Generation

Chi Zhang, Xixi Hu, Bo Liu, Qiang Liu

Comments ICLR 2026

2603.22214 2026-03-24 cs.CR cs.AI cs.LG

Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models

Tom Biskupski, Stephan Kleber