arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Brian B. Moser, Shalini Sarode, Federico Raue, Stanislav Frolov, Krzysztof Adamkiewicz, Arundhati Shanbhag, Joachim Folz, Tobias C. Nauen, Andreas Dengel

Journal ref Transactions on Machine Learning Research, 2026

2511.07399 2026-02-24 cs.CV cs.LG

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Tianrui Feng, Zhi Li, Shuo Yang, Haocheng Xi, Muyang Li, Xiuyu Li, Lvmin Zhang, Keting Yang, Kelly Peng, Song Han, Maneesh Agrawala, Kurt Keutzer, Akio Kodaira, Chenfeng Xu

Comments Accepted by MLSys 2026. Project Page: http://streamdiffusionv2.github.io

详情

英文摘要

Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered. Previous image-based streaming diffusion models have powered efficient and creative live streaming products but have hit limits on temporal consistency due to the foundation of image-based designs. Recent advances in video diffusion have markedly improved temporal consistency and sampling efficiency for offline generation. However, offline generation systems primarily optimize throughput by batching large workloads. In contrast, live online streaming operates under strict service-level objectives (SLOs): time-to-first-frame must be minimal, and every frame must meet a per-frame deadline with low jitter. Besides, scalable multi-GPU serving for real-time streams remains largely unresolved so far. To address this, we present StreamDiffusionV2, a training-free pipeline for interactive live streaming with video diffusion models. StreamDiffusionV2 integrates an SLO-aware batching scheduler and a block scheduler, together with a sink-token--guided rolling KV cache, a motion-aware noise controller, and other system-level optimizations. Moreover, we introduce a scalable pipeline orchestration that parallelizes the diffusion process across denoising steps and network layers, achieving near-linear FPS scaling without violating latency guarantees. The system scales seamlessly across heterogeneous GPU environments and supports flexible denoising steps (e.g., 1--4), enabling both ultra-low-latency and higher-quality modes. Without TensorRT or quantization, StreamDiffusionV2 renders the first frame within 0.5s and attains 58.28 FPS with a 14B-parameter model and 64.52 FPS with a 1.3B-parameter model on four H100 GPUs, making state-of-the-art generative live streaming practical and accessible--from individual creators to enterprise-scale platforms.

URL PDF HTML ☆

赞 0 踩 0

2511.06450 2026-02-24 cs.CV cs.LG

Countering Multi-modal Representation Collapse through Rank-targeted Fusion

Seulgi Kim, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib

Comments Accepted in 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

2511.05275 2026-02-24 cs.RO cs.LG

TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

Hokyun Im, Euijin Jeong, Andrey Kolobov, Jianlong Fu, Youngwoon Lee

Comments Accepted to ICLR 2026 (Poster). Project webpage : https://jellyho.github.io/TwinVLA/

2511.03665 2026-02-24 cs.CV

A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential

Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran

2511.00958 2026-02-24 cs.LG cs.AI stat.ML

The Hidden Power of Normalization Layers in Neural Networks: Exponential Capacity Control

Khoat Than

2511.00574 2026-02-24 cs.LG

Bayesian Network Structure Discovery Using Large Language Models

Yinghuan Zhang, Yufei Zhang, Parisa Kordjamshidi, Zijun Cui

Comments Accepted to TMLR

2510.27623 2026-02-24 cs.AI cs.CL cs.CV

BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning

Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang

Comments ICLR 2026. Project Page: https://zqs1943.github.io/BEAT/

2510.26376 2026-02-24 cs.LG

Efficient Generative AI Boosts Probabilistic Forecasting of Sudden Stratospheric Warmings

Ningning Tao, Fei Xie, Baoxiang Pan, Hongyu Wang, Han Huang, Zhongpu Qiu, Ke Gui, Jiali Luo, Xiaosong Chen

2510.25850 2026-02-24 cs.RO cs.LG cs.MA

Debate2Create: Robot Co-design via Multi-Agent LLM Debate

Kevin Qiu, Marek Cygan

2510.25232 2026-02-24 cs.AI cs.CL

From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity

Tianxi Wan, Jiaming Luo, Siyuan Chen, Kunyao Lan, Jianhua Chen, Haiyang Geng, Mengyue Wu

2510.23828 2026-02-24 cs.CL

Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language

Mena Attia, Aashiq Muhamed, Mai Alkhamissi, Thamar Solorio, Mona Diab

Comments EACL 2026 Main Conference

2510.23304 2026-02-24 cs.AI

CNOT Minimal Circuit Synthesis: A Reinforcement Learning Approach

Riccardo Romanello, Daniele Lizzio Bosco, Jacopo Cossio, Dusan Sutulovic, Giuseppe Serra, Carla Piazza, Paolo Burelli

2510.22512 2026-02-24 cs.LG cs.AI

Transitive RL: Value Learning via Divide and Conquer

Seohong Park, Aditya Oberai, Pranav Atreya, Sergey Levine

Comments ICLR 2026

2510.21491 2026-02-24 cs.LG cs.DC stat.ML

Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting

Khaled Hallak, Oudom Kem

Comments Accepted for presentation at the FLTA 2025 Conference on Federated Learning. This version corresponds to the camera-ready author manuscript

2510.17448 2026-02-24 cs.RO math.DS

Switching Among Feedback-Linearizing Output Sets (Melds): Dwell-Time and Compatibility Guarantees

Mirko Mizzoni, Pieter van Goor, Barbara Bazzana, Antonio Franchi

2510.16703 2026-02-24 cs.LG cs.AI stat.ME

On the Granularity of Causal Effect Identifiability

Yizuo Chen, Adnan Darwiche

2510.13614 2026-02-24 cs.CL

MemoTime: Memory-Augmented Temporal Knowledge Graph Enhanced Large Language Model Reasoning

Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, Liming Zhu, Wenjie Zhang

Comments Accepted by The Web Conference 2026 (WWW, 2026)

2510.13205 2026-02-24 cs.LG cs.AI

CleverCatch: A Knowledge-Guided Weak Supervision Model for Fraud Detection

Amirhossein Mozafari, Kourosh Hashemi, Erfan Shafagh, Soroush Motamedi, Azar Taheri Tayebi, Mohammad A. Tayebi

2510.12924 2026-02-24 cs.RO

Geometric Model Predictive Path Integral for Agile UAV Control with Online Collision Avoidance

Pavel Pochobradský, Ondřej Procházka, Robert Pěnička, Vojtěch Vonásek, Martin Saska

Comments This work has been accepted to the IEEE for possible publication

2510.12206 2026-02-24 cs.RO cs.LG

Controllable Collision Scenario Generation via Collision Pattern Prediction

Pin-Lun Chen, Chi-Hsi Kung, Che-Han Chang, Wei-Chen Chiu, Yi-Ting Chen

Comments 8 pages, 3 figures

2510.12066 2026-02-24 cs.AI cs.LG

AI Agents as Universal Task Solvers

Alessandro Achille, Stefano Soatto

2510.08318 2026-02-24 cs.CV

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

Yushi Huang, Xingtong Ge, Ruihao Gong, Chengtao Lv, Jun Zhang

Comments Accepted by CVPR 2026

2510.08233 2026-02-24 cs.LG

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization

Yuchen Zhu, Wei Guo, Jaemoo Choi, Petr Molodyk, Bo Yuan, Molei Tao, Yongxin Chen

2510.06940 2026-02-24 cs.LG

Revisiting Node Affinity Prediction in Temporal Graphs

Or Feldman, Krishna Sri Ipsit Mantri, Moshe Eliasof, Chaim Baskin

Comments Accepted at ICLR 2026

2510.06820 2026-02-24 cs.CV cs.LG

Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

Mitchell Keren Taraday, Shahaf Wagner, Chaim Baskin

Comments Accepted at ICLR 2026

2510.06751 2026-02-24 cs.CV

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang

2510.05780 2026-02-24 cs.RO cs.SY eess.SY

Human-in-the-loop Optimisation in Robot-assisted Gait Training

Andreas Christou, Andreas Sochopoulos, Elliot Lister, Sethu Vijayakumar

2510.04891 2026-02-24 cs.CL cs.AI cs.LG

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj, Rada Mihalcea, Zhijing Jin