arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.24289 2026-03-02 cs.CV cs.LG

Mode Seeking meets Mean Seeking for Fast Long Video Generation

Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Zhang, Nanye Ma, Hansheng Chen, Maneesh Agrawala, Leonidas Guibas, Gordon Wetzstein, Arash Vahdat

Comments Project website: https://primecai.github.io/mmm/

详情

英文摘要

Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, resulting in a few-step fast long video generator. Evaluations show that our method effectively closes the fidelity-horizon gap by jointly improving local sharpness, motion and long-range consistency. Project website: https://primecai.github.io/mmm/.

URL PDF HTML ☆

赞 0 踩 0

2602.24288 2026-03-02 cs.AI cs.CL

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

Fan Shu, Yite Wang, Ruofan Wu, Boyi Liu, Zhewei Yao, Yuxiong He, Feng Yan

Comments Published as a conference paper at ICLR 2026. 10 pages plus appendix

2602.24287 2026-03-02 cs.CL cs.AI

Do LLMs Benefit From Their Own Words?

Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas

2602.24286 2026-03-02 cs.LG cs.AI

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan Song, Hongli Yu, Jiaze Chen, Wei-Ying Ma, Ya-Qin Zhang, Jingjing Liu, Mingxuan Wang, Xin Liu, Hao Zhou

2602.24283 2026-03-02 cs.LG cs.AI cs.CL

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

Comments Camera-ready version. Accepted as Oral at ICLR 2026

2602.24281 2026-03-02 cs.LG cs.AI

Memory Caching: RNNs with Growing Memory

Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni

2602.24278 2026-03-02 cs.LG

Who Guards the Guardians? The Challenges of Evaluating Identifiability of Learned Representations

Shruti Joshi, Théo Saulus, Wieland Brendel, Philippe Brouillard, Dhanya Sridhar, Patrik Reizinger

2602.24275 2026-03-02 cs.CV

Hierarchical Action Learning for Weakly-Supervised Action Segmentation

Junxian Huang, Ruichu Cai, Hao Zhu, Juntao Fang, Boyan Xu, Weilin Chen, Zijian Li, Shenghua Gao

2602.24266 2026-03-02 cs.LG cs.AI

Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Amir Asiaee

2602.24264 2026-03-02 cs.CV cs.LG

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Arnas Uselis, Andrea Dittadi, Seong Joon Oh

2602.24251 2026-03-02 cs.LG cs.CV

Histopathology Image Normalization via Latent Manifold Compaction

Xiaolong Zhang, Jianwei Zhang, Selim Sevim, Emek Demir, Ece Eksi, Xubo Song

Comments 11 pages

2602.24245 2026-03-02 cs.LG

Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Hainan Xu, Vladimir Bataev, Travis M. Bartley, Jagadeesh Balam

Comments Accepted at ICASSP 2026

2602.24240 2026-03-02 cs.CV

Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution

Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, Wang Zhang

2602.24233 2026-03-02 cs.CV

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Zhenyu Tang, Chaoran Feng, Yufan Deng, Jie Wu, Xiaojie Li, Rui Wang, Yunpeng Chen, Daquan Zhou

Comments Accepted at CVPR 2026. Github: https://github.com/DAGroup-PKU/SpatialT2I Project website: https://dagroup-pku.github.io/SpatialT2I/

2602.24231 2026-03-02 cs.LG

Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference

Hongrui Xie, Junyu Cao, Kan Xu

Comments 30 pages, 3 figure, AISTATS 2026 accepted paper

2602.24222 2026-03-02 cs.CV cs.LG

MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy

Albert Dominguez Mantes, Gioele La Manno, Martin Weigert

Comments Accepted at CVPR 2026

2602.24220 2026-03-02 cs.LG quant-ph

Comparing Classical and Quantum Variational Classifiers on the XOR Problem

Miras Seilkhan, Adilbek Taizhanov

Comments 32 pages, 17 figures. Code and experiment scripts available at https://github.com/mseilkhan/XOR-research-Quantum-ML-vs-Classic

2602.24209 2026-03-02 cs.LG cs.AI

An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks

Mohsen Tajgardan, Atena Shiranzaei, Mahdi Rabbani, Reza Khoshkangini, Mahtab Jamali

2602.24208 2026-03-02 cs.CV cs.LG

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Yasaman Haghighi, Alexandre Alahi

2602.24202 2026-03-02 cs.RO

Evaluating Accuracy of Vine Robot Shape Sensing with Distributed Inertial Measurement Units

Alexis E. Laudenslager, Antonio Alvarez Valdivia, Nathaniel Hanson, Margaret McGuinness

2602.24195 2026-03-02 cs.AI cs.CL cs.CV cs.LG

Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume

Gregory Kang Ruey Lau, Hieu Dao, Nicole Kan Hui Lin, Bryan Kian Hsiang Low

Comments Earlier versions presented at ICLR 2025 QUESTION workshop and ICML 2025 R2-FM workshop

2602.24192 2026-03-02 cs.RO

How IMU Drift Influences Multi-Radar Inertial Odometry for Ground Robots in Subterranean Terrains

Moumita Mukherjee, Magnus Norén, Anton Koval, Avijit Banerjee, George Nikolakopoulos

Comments Accepted in IEEE International Conference on Robotics and Automation (ICRA), 2026

2602.24188 2026-03-02 cs.CL cs.LG

MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games

Jacob Eisenstein, Fantine Huot, Adam Fisch, Jonathan Berant, Mirella Lapata

2602.24183 2026-03-02 cs.CV cs.LG

A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification

Yixuan Liu, Kanwal K. Bhatia, Ahmed E. Fetit

2602.24182 2026-03-02 cs.LG

Multi-Objective Reinforcement Learning for Large-Scale Tote Allocation in Human-Robot Collaborative Fulfillment Centers

Sikata Sengupta, Guangyi Liu, Omer Gottesman, Joseph W Durham, Michael Kearns, Aaron Roth, Michael Caldara

2602.24180 2026-03-02 cs.AI

Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints

Shishun Zhang, Juzhan Xu, Yidan Fan, Chenyang Zhu, Ruizhen Hu, Yongjun Wang, Kai Xu

Comments 8 pages, 8 figures, conference

2602.24178 2026-03-02 cs.LG cs.CC

Sandwiching Polynomials for Geometric Concepts with Low Intrinsic Dimension

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

Comments 30 pages

2602.24174 2026-03-02 cs.CL cs.AI cs.IT math.IT

Task-Centric Acceleration of Small-Language Models

Dor Tsur, Sharon Adar, Ran Levy

2602.24173 2026-03-02 cs.AI

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics

Antoine Peyronnet, Fabian Gloeckle, Amaury Hayat

Comments 15 pages, 3 figures, 5 Tables

2602.24172 2026-03-02 cs.CL cs.AI

ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models

Adam Dejl, Deniz Gorur, Francesca Toni

Comments AAMAS 2026 Demonstration Track