arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.12267 2026-03-13 cs.CV

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu

Comments Accepted by CVPR 2026. Project page: https://silentview.github.io/EVATok/

详情

英文摘要

Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokenizers apply a uniform token assignment across temporal blocks of different videos, often wasting tokens on simple, static, or repetitive segments while underserving dynamic or complex ones. To address this inefficiency, we introduce $\textbf{EVATok}$, a framework to produce $\textbf{E}$fficient $\textbf{V}$ideo $\textbf{A}$daptive $\textbf{Tok}$enizers. Our framework estimates optimal token assignments for each video to achieve the best quality-cost trade-off, develops lightweight routers for fast prediction of these optimal assignments, and trains adaptive tokenizers that encode videos based on the assignments predicted by routers. We demonstrate that EVATok delivers substantial improvements in efficiency and overall quality for video reconstruction and downstream AR generation. Enhanced by our advanced training recipe that integrates video semantic encoders, EVATok achieves superior reconstruction and state-of-the-art class-to-video generation on UCF-101, with at least 24.4% savings in average token usage compared to the prior state-of-the-art LARP and our fixed-length baseline.

URL PDF HTML ☆

赞 0 踩 0

2603.12266 2026-03-13 cs.CV

MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Haozhan Shen, Shilin Yan, Hongwei Xue, Shuaiqi Lu, Xiaojun Tang, Guannan Zhang, Tiancheng Zhao, Jianwei Yin

Comments Project Page: https://accio-lab.github.io/MM-CondChain

2603.12265 2026-03-13 cs.CV

OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

Yibin Yan, Jilan Xu, Shangzhe Di, Haoning Wu, Weidi Xie

Comments Technical Report. Project Page: https://go2heart.github.io/omnistream/

2603.12264 2026-03-13 cs.CV

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Mingxin Liu, Ziqian Fan, Zhaokai Wang, Leyao Gu, Zirun Zhu, Yiguo He, Yuchen Yang, Changyao Tian, Xiangyu Zhao, Ning Liao, Shaofeng Zhang, Qibing Ren, Zhihang Zhong, Xuanhe Zhou, Junchi Yan, Xue Yang

Comments 49 pages, 23 figures, 10 tables; Project Page: https://grade-bench.github.io/, Code: https://github.com/VisionXLab/GRADE, Dataset: https://huggingface.co/datasets/VisionXLab/GRADE

2603.12263 2026-03-13 cs.RO

$Ψ_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni, Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang, Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang

2603.12262 2026-03-13 cs.CV

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Yiran Guan, Liang Yin, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Jian Luan, Yuliang Liu, Xiang Bai

2603.12257 2026-03-13 cs.CV

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning

Yujie Wei, Xinyu Liu, Shiwei Zhang, Hangjie Yuan, Jinbo Xing, Zhekai Chen, Xiang Wang, Haonan Qiu, Rui Zhao, Yutong Feng, Ruihang Chu, Yingya Zhang, Yike Guo, Xihui Liu, Hongming Shan

Comments Project Page: https://dreamvideo-omni.github.io

2603.12255 2026-03-13 cs.CV cs.LG

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Fangfu Liu, Diankun Wu, Jiawei Chi, Yimo Cai, Yi-Hsin Hung, Xumin Yu, Hao Li, Han Hu, Yongming Rao, Yueqi Duan

Comments Project Page: https://liuff19.github.io/Spatial-TTT

2603.12254 2026-03-13 cs.CV

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

Baifeng Shi, Stephanie Fu, Long Lian, Hanrong Ye, David Eigen, Aaron Reite, Boyi Li, Jan Kautz, Song Han, David M. Chan, Pavlo Molchanov, Trevor Darrell, Hongxu Yin

Comments CVPR 2026. Project page: https://autogaze.github.io/

2603.12250 2026-03-13 cs.CV

DVD: Deterministic Video Depth Estimation with Generative Priors

Hongfei Zhang, Harold Haodong Chen, Chenfei Liao, Jing He, Zixin Zhang, Haodong Li, Yihao Liang, Kanghao Chen, Bin Ren, Xu Zheng, Shuai Yang, Kun Zhou, Yinchuan Li, Nicu Sebe, Ying-Cong Chen

Comments Project: https://dvd-project.github.io/

2603.12247 2026-03-13 cs.CV

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

Xiangyu Zhao, Peiyuan Zhang, Junming Lin, Tianhao Liang, Yuchen Duan, Shengyuan Ding, Changyao Tian, Yuhang Zang, Junchi Yan, Xue Yang

2603.12246 2026-03-13 cs.AI cs.CL cs.LG

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen

2603.12245 2026-03-13 cs.CV

One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers

Moayed Haji-Ali, Willi Menapace, Ivan Skorokhodov, Dogyun Park, Anil Kag, Michael Vasilkovsky, Sergey Tulyakov, Vicente Ordonez, Aliaksandr Siarohin

Comments Project page: https://snap-research.github.io/elit/

2603.12240 2026-03-13 cs.CV cs.LG

BiGain: Unified Token Compression for Joint Generation and Classification

Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen

Comments CVPR 2026. Code: https://github.com/Greenoso/BiGain

详情

英文摘要

Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.12238 2026-03-13 cs.CV

SceneAssistant: A Visual Feedback Agent for Open-Vocabulary 3D Scene Generation

Jun Luo, Jiaxiang Tang, Ruijie Lu, Gang Zeng

Comments Code: https://github.com/ROUJINN/SceneAssistant

2603.12237 2026-03-13 cs.LG cs.CR cs.IT math.IT

STAMP: Selective Task-Aware Mechanism for Text Privacy

Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon

Comments EACL 2026

2603.12235 2026-03-13 quant-ph cond-mat.dis-nn cs.ET physics.optics

Transition from Statistical to Hardware-Limited Scaling in Photonic Quantum State Reconstruction

Attila Baumann, Zsolt Kis, János Koltai, Gábor Vattay

Comments 12 pages, 7 figures

2603.12232 2026-03-13 cs.LO cs.AI

Incremental Neural Network Verification via Learned Conflicts

Raya Elsaleh, Liam Davis, Haoze Wu, Guy Katz

2603.12229 2026-03-13 cs.MA

Language Model Teams as Distributed Systems

Elizabeth Mieczkowski, Katherine M. Collins, Ilia Sucholutsky, Natalia Vélez, Thomas L. Griffiths

2603.12228 2026-03-13 cs.LG cs.AI

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan, Phillip Isola

Comments codes are provided at https://github.com/sunrainyg/RandOpt

2603.12227 2026-03-13 cs.SC cs.LG

Interpreting Contrastive Embeddings in Specific Domains with Fuzzy Rules

Javier Fumanal-Idocin, Mohammadreza Jamalifard, Javier Andreu-Perez

2603.12226 2026-03-13 cs.CL cs.AI

Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration

Priyanka Kargupta, Shuhaib Mehri, Dilek Hakkani-Tur, Jiawei Han

Comments Code and dataset provided at https://github.com/pkargupta/idea_catalyst

详情

英文摘要

Despite interdisciplinary research leading to larger and longer-term impact, most work remains confined to single-domain academic silos. Recent AI-based approaches to scientific discovery show promise for interdisciplinary research, but many prioritize rapidly designing experiments and solutions, bypassing the exploratory, collaborative reasoning processes that drive creative interdisciplinary breakthroughs. As a result, prior efforts largely prioritize automating scientific discovery rather than augmenting the reasoning processes that underlie scientific disruption. We present Idea-Catalyst, a novel framework that systematically identifies interdisciplinary insights to support creative reasoning in both humans and large language models. Starting from an abstract research goal, Idea-Catalyst is designed to assist the brainstorming stage, explicitly avoiding premature anchoring on specific solutions. The framework embodies key metacognitive features of interdisciplinary reasoning: (a) defining and assessing research goals, (b) awareness of a domain's opportunities and unresolved challenges, and (c) strategic exploration of interdisciplinary ideas based on impact potential. Concretely, Idea-Catalyst decomposes an abstract goal (e.g., improving human-AI collaboration) into core target-domain research questions that guide the analysis of progress and open challenges within that domain. These challenges are reformulated as domain-agnostic conceptual problems, enabling retrieval from external disciplines (e.g., Psychology, Sociology) that address analogous issues. By synthesizing and recontextualizing insights from these domains back into the target domain, Idea-Catalyst ranks source domains by their interdisciplinary potential. Empirically, this targeted integration improves average novelty by 21% and insightfulness by 16%, while remaining grounded in the original research problem.

URL PDF HTML ☆

赞 0 踩 0

2603.12224 2026-03-13 cs.AI

Portfolio of Solving Strategies in CEGAR-based Object Packing and Scheduling for Sequential 3D Printing

Pavel Surynek

Comments arXiv admin note: substantial text overlap with arXiv:2503.05071

2603.12222 2026-03-13 cs.CV cs.LG

HiAP: A Multi-Granular Stochastic Auto-Pruning Framework for Vision Transformers

Andy Li, Aiden Durrant, Milan Markovic, Georgios Leontidis

Comments 14 pages, 9 figures, 3 Tables

2603.12218 2026-03-13 cs.HC

UniMotion: Self-Supervised Learning for Cross-Domain IMU Motion Recognition

Prerna Khanna, Tanmay Srivastava, Shubham Jain, Aruna Balasubramanian

2603.12217 2026-03-13 cs.CV

Real-World Point Tracking with Verifier-Guided Pseudo-Labeling

Görkay Aydemir, Fatma Güney, Weidi Xie

Comments CVPR 2026

2603.12215 2026-03-13 cs.CV cs.AI

RDNet: Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network in Optical Remote Sensing Images

Bin Wan, Runmin Cong, Xiaofei Zhou, Hao Fang, Yaoqi Sun, Sam Kwong

2603.12211 2026-03-13 cs.DS cs.DB

Bounding the Fragmentation of B-Trees Subject to Batched Insertions

Michael A. Bender, Aaron Bernstein, Nairen Cao, Alex Conway, Martín Farach-Colton, Hanna Komlós, Yarin Shechter, Nicole Wein

Comments To appear at PODS 2026, 30 pages, 5 figures

2603.12208 2026-03-13 cs.CV

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang, Linlin Shen, Yong Xu, Xiaochun Cao

2603.12205 2026-03-13 math.NA cs.CE cs.NA

Parameter unbounded Uzawa and penalty-splitted accelerated algorithms for frictionless contact problems

Daria Koliesnikova, Isabelle Ramière

详情

英文摘要

We propose a unified iterative framework for the solution of frictionless mechanical contact problems, which relies exclusively on the solution of standard stiffness systems. The framework is built upon a two-step fixed-point algorithm: first, the displacement is computed for given contact forces; second, the contact forces are updated based on the displacement solution. The choice of the dual update scheme depends on the numerical contact formulation under consideration. Specifically, the Uzawa iterative scheme is obtained for the Lagrange multiplier formulation, while a penalty-based operator-splitting strategy is proposed for the penalty contact formulation. The main interest of such displacement-force splitting strategy is to involve only standard rigidity matrices in the solving step: no saddle-point or penalized ill-conditionned coefficient matrices have to be handled. Moreover only the right-hand side of the system is updated throughout the iterations, which enables matrix factorization reuse or efficient iterative solvers initialization. The main limitation of such splitting iterative strategies lies in the inherently slow convergence of the underlying fixed-point iterations. Moreover, convergence is guaranteed only within a narrow range of numerical parameter values (i.e., the augmentation or penalty parameter). This work addresses both issues by applying the Crossed-Secant fixed-point acceleration strategy, which substantially improves the convergence rate and renders the iterative schemes effectively parameter-unconstrained. To the best of our knowledge, this contribution provides the first computational demonstration of efficient, parameter-unbounded convergence for such contact formulations. The substantial practical benefits of the proposed approach are illustrated through representative three-dimensional academic and industrial frictionless contact problems.

URL PDF HTML ☆

赞 0 踩 0