arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25746 2026-03-27 cs.CV

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, Tianfan Xue

Comments Project Page: https://luo0207.github.io/ShotStream/ Code: https://github.com/KlingAIResearch/ShotStream

详情

英文摘要

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. By reformulating the task as next-shot generation conditioned on historical context, ShotStream allows users to dynamically instruct ongoing narratives via streaming prompts. We achieve this by first fine-tuning a text-to-video model into a bidirectional next-shot generator, which is then distilled into a causal student via Distribution Matching Distillation. To overcome the challenges of inter-shot consistency and error accumulation inherent in autoregressive generation, we introduce two key innovations. First, a dual-cache memory mechanism preserves visual coherence: a global context cache retains conditional frames for inter-shot consistency, while a local context cache holds generated frames within the current shot for intra-shot consistency. And a RoPE discontinuity indicator is employed to explicitly distinguish the two caches to eliminate ambiguity. Second, to mitigate error accumulation, we propose a two-stage distillation strategy. This begins with intra-shot self-forcing conditioned on ground-truth historical shots and progressively extends to inter-shot self-forcing using self-generated histories, effectively bridging the train-test gap. Extensive experiments demonstrate that ShotStream generates coherent multi-shot videos with sub-second latency, achieving 16 FPS on a single GPU. It matches or exceeds the quality of slower bidirectional models, paving the way for real-time interactive storytelling. Training and inference code, as well as the models, are available on our

URL PDF HTML ☆

赞 0 踩 0

2603.25745 2026-03-27 cs.CV

Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting

Yixing Lao, Xuyang Bai, Xiaoyang Wu, Nuoyuan Yan, Zixin Luo, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Shiwei Li, Hengshuang Zhao

2603.25743 2026-03-27 cs.CV

RefAlign: Representation Alignment for Reference-to-Video Generation

Lei Wang, YuXin Song, Ge Wu, Haocheng Feng, Hang Zhou, Jingdong Wang, Yaxing Wang, jian Yang

Comments 17 pages, 11 figures

2603.25740 2026-03-27 cs.RO cs.AI cs.CV cs.LG cs.MA

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li

Comments IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026); Project website: https://dmw-cvpr.github.io/

2603.25739 2026-03-27 cs.CV

MegaFlow: Zero-Shot Large Displacement Optical Flow

Dingxi Zhang, Fangjinhua Wang, Marc Pollefeys, Haofei Xu

Comments Project Page: https://kristen-z.github.io/projects/megaflow Code: https://github.com/cvg/megaflow

2603.25738 2026-03-27 cs.CV

PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow

Xincheng Shuai, Song Tang, Yutong Huang, Henghui Ding, Dacheng Tao

Comments CVPR 2026, Project Page: https://henghuiding.com/PSDesigner/

2603.25737 2026-03-27 cs.AI cs.CL cs.IR

Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang

Comments 15 pages

2603.25736 2026-03-27 cs.CV

How good was my shot? Quantifying Player Skill Level in Table Tennis

Akihiro Kubota, Tomoya Hasegawa, Ryo Kawahara, Ko Nishino

2603.25734 2026-03-27 cs.CV

Unleashing Guidance Without Classifiers for Human-Object Interaction Animation

Ziyin Wang, Sirui Xu, Chuan Guo, Bing Zhou, Jiangshan Gong, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui

Comments Project Page: http://ziyinwang1.github.io/LIGHT

2603.25733 2026-03-27 cs.CV

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding

Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi

Comments Accepted to GRAIL-V workshop at CVPR 2026

2603.25732 2026-03-27 cs.CV

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo

2603.25730 2026-03-27 cs.CV cs.AI

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Xiaofeng Mao, Shaohao Rui, Kaining Ying, Bo Zheng, Chuanhao Li, Mingmin Chi, Kaipeng Zhang

2603.25728 2026-03-27 cs.CV cs.AI

PixelSmile: Toward Fine-Grained Facial Expression Editing

Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang

Comments 21 Pages; Project Page: https://ammmob.github.io/PixelSmile/ Code: https://github.com/Ammmob/PixelSmile

2603.25727 2026-03-27 cs.AI cs.MM

Back to Basics: Revisiting ASR in the Age of Voice Agents

Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola

Comments 10 pages, 5 figures

2603.25725 2026-03-27 cs.RO

SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, Ajay Mandlekar

2603.25720 2026-03-27 cs.AI cs.CV

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao

2603.25711 2026-03-27 cs.CV

Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs

Vishal Narnaware, Animesh Gupta, Kevin Zhai, Zhenyi Wang, Mubarak Shah

2603.25707 2026-03-27 cs.CV

TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance

Quynh Phung, Long Mai, Cusuh Ham, Feng Liu, Jia-Bin Huang, Aniruddha Mahapatra

Comments webpage: https://trace-motion.github.io/

2603.25702 2026-03-27 cs.CL

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

Comments Code is available at https://github.com/phymhan/S2D2

2603.25699 2026-03-27 cs.LG cs.AI

Neural Network Conversion of Machine Learning Pipelines

Man-Ling Sung, Jan Silovsky, Man-Hung Siu, Herbert Gish, Chinnu Pittapally

Comments Submitted and accepted to AutoML 2018 @ ICML/IJCAI-ECAI

2603.25697 2026-03-27 cs.SE cs.AI

The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase

Yannick Roy

2603.25695 2026-03-27 cs.CY

Assessing Age Assurance Technologies: Effectiveness, Side-Effects, and Acceptance

Wouter Lueks, Stephan Dreyer, Hannes Federrath, Judith Simon

Comments 53 pages, 1 figure

2603.25692 2026-03-27 cs.LG cs.AI cs.AR cs.ET

A Unified Memory Perspective for Probabilistic Trustworthy AI

Xueji Zhao, Likai Pei, Jianbo Liu, Kai Ni, Ningyuan Cao

2603.25691 2026-03-27 math.NA cs.NA

Fast and Accurate CP-HIFI Tensor Decompositions: Exploiting Kronecker Structure

Johannes J. Brust, Tamara G. Kolda

2603.25689 2026-03-27 cs.CV

LEMMA: Laplacian pyramids for Efficient Marine SeMAntic Segmentation

Ishaan Gakhar, Laven Srivastava, Sankarshanaa Sagaram, Aditya Kasliwal, Ujjwal Verma

Comments Accepted at the MaCVi Workshop, CVPR 2026

2603.25688 2026-03-27 cs.RO

Intelligent Navigation and Obstacle-Aware Fabrication for Mobile Additive Manufacturing Systems

Yifei Li, Ruizhe Fu, Huihang Liu, Guha Manogharan, Feng Ju, Ilya Kovalenko

Comments 8 pages, 4 figures, conference

2603.25687 2026-03-27 cs.LG

On Neural Scaling Laws for Weather Emulation through Continual Training

Shashank Subramanian, Alexander Kiefer, Arnur Nigmetov, Amir Gholami, Dmitriy Morozov, Michael W. Mahoney

Comments ICLR Foundation Models for Science Workshop 2026, 19 pages, 13 figures

2603.25686 2026-03-27 cs.CV cs.AI

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz

Comments 18 pages, 6 figures

2603.25685 2026-03-27 cs.RO cs.CV

Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

Jai Bardhan, Patrik Drozdik, Josef Sivic, Vladimir Petrik

Comments 34 pages, 11 figures, 12 tables

2603.25682 2026-03-27 cs.LO math.AT

On the Formalization of Network Topology Matrices in HOL

Kubra Aksoy, Adnan Rashid, Osman Hasan, Sofiene Tahar