arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.13036 2026-04-15 cs.CV

Lyra 2.0: Explorable Generative 3D Worlds

Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, Sanja Fidler, Jiahui Huang, Huan Ling, Jun Gao, Xuanchi Ren

Comments Project Page: https://research.nvidia.com/labs/sil/projects/lyra2/

详情

英文摘要

Recent advances in video generation enable a new paradigm for 3D scene creation: generating camera-controlled videos that simulate scene walkthroughs, then lifting them to 3D via feed-forward reconstruction techniques. This generative reconstruction approach combines the visual fidelity and creative capacity of video models with 3D outputs ready for real-time rendering and simulation. Scaling to large, complex environments requires 3D-consistent video generation over long camera trajectories with large viewpoint changes and location revisits, a setting where current video models degrade quickly. Existing methods for long-horizon generation are fundamentally limited by two forms of degradation: spatial forgetting and temporal drifting. As exploration proceeds, previously observed regions fall outside the model's temporal context, forcing the model to hallucinate structures when revisited. Meanwhile, autoregressive generation accumulates small synthesis errors over time, gradually distorting scene appearance and geometry. We present Lyra 2.0, a framework for generating persistent, explorable 3D worlds at scale. To address spatial forgetting, we maintain per-frame 3D geometry and use it solely for information routing -- retrieving relevant past frames and establishing dense correspondences with the target viewpoints -- while relying on the generative prior for appearance synthesis. To address temporal drifting, we train with self-augmented histories that expose the model to its own degraded outputs, teaching it to correct drift rather than propagate it. Together, these enable substantially longer and 3D-consistent video trajectories, which we leverage to fine-tune feed-forward reconstruction models that reliably recover high-quality 3D scenes.

URL PDF HTML ☆

赞 0 踩 0

2604.13035 2026-04-15 cs.CV cs.CL

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla

Comments Project Page: https://lab-spell.github.io/SceneCritic/

2604.13030 2026-04-15 cs.CV

Generative Refinement Networks for Visual Synthesis

Jian Han, Jinlai Liu, Jiahuan Wang, Bingyue Peng, Zehuan Yuan

Comments code: https://github.com/MGenAI/GRN

2604.13029 2026-04-15 cs.CV cs.AI

Visual Preference Optimization with Rubric Rewards

Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu

2604.13028 2026-04-15 cs.CV

Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns

Baris Sarper Tezcan, Hrishikesh Viswanath, Rubab Saher, Daniel Aliaga

Comments Accepted to the CVPR 2026 EarthVision Workshop

2604.13024 2026-04-15 cs.LG cs.DB

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

Benzhao Tang, Shiyu Yang

2604.13023 2026-04-15 cs.SD cs.MM

SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding

Luoyi Sun, Xiao Zhou, Zeqian Li, Ya Zhang, Yanfeng Wang, Weidi Xie

2604.13021 2026-04-15 cs.CV cs.AI

Representation geometry shapes task performance in vision-language modeling for CT enterography

Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham

2604.13017 2026-04-15 cs.AI cs.HC

PAL: Personal Adaptive Learner

Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth

2604.13013 2026-04-15 cs.AI math.OC

Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen

2604.12999 2026-04-15 cs.CV

Agentic Discovery with Active Hypothesis Exploration for Visual Recognition

Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez

2604.12995 2026-04-15 cs.CL cs.CY

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang

Comments Accepted by ACL 2026 findings

2604.12989 2026-04-15 cs.CL

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel, Yaniv Romano

2604.12978 2026-04-15 cs.CL cs.CV

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

Amir Hossein Kargaran, Nafiseh Nikeghbal, Jana Diesner, François Yvon, Hinrich Schütze

2604.12969 2026-04-15 cs.CV

AbdomenGen: Sequential Volume-Conditioned Diffusion Framework for Abdominal Anatomy Generation

Yubraj Bhandari, Lavsen Dahal, Paul Segars, Joseph Y. Lo

2604.12968 2026-04-15 cs.LG cs.CV

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Tong Zhang, Jiangning Zhang, Zhucun Xue, Juntao Jiang, Yicheng Xu, Chengming Xu, Teng Hu, Xingyu Xie, Xiaobin Hu, Yabiao Wang, Yong Liu, Shuicheng Yan

2604.12967 2026-04-15 cs.AI

Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min

2604.12966 2026-04-15 cs.CV

Boosting Visual Instruction Tuning with Self-Supervised Guidance

Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris

2604.12952 2026-04-15 cs.LG math.CO stat.ML

An Optimal Sauer Lemma Over $k$-ary Alphabets

Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri

Comments 38 pages

2604.12951 2026-04-15 cs.LG

The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

Jason Z Wang

Comments 25 pages, 16 figures, 6 tables. Code and data at https://github.com/Jason-Wang313/verification-tax

2604.12948 2026-04-15 cs.AI

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

Benjamin Stern, Peter Nadel

Comments 16 pages, 2 tables, 2 figures

2604.12946 2026-04-15 cs.LG

Parcae: Scaling Laws For Stable Looped Language Models

Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu

2604.12945 2026-04-15 cs.LG cs.CV

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

Amar Gahir, Varshil Patel, Shreyank N Gowda

2604.12944 2026-04-15 cs.CV cs.AI

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu

Comments ACL 2026 findings

2604.12941 2026-04-15 cs.CV

Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection

Tianshuo Zhang, Haoyuan Zhang, Siran Peng, Weisong Zhao, Xiangyu Zhu, Zhen Lei

2604.12935 2026-04-15 cs.CV

Task Alignment: A simple and effective proxy for model merging in computer vision

Pau de Jorge, César Roberto de Souza, Björn Michele, Mert Bülent Sarıyıldız, Philippe Weinzaepfel, Florent Perronnin, Diane Larlus, Yannis Kalantidis

2604.12933 2026-04-15 cs.RO cs.CV

DINO-Explorer: Active Underwater Discovery via Ego-Motion Compensated Semantic Predictive Coding

Yuhan Jin, Nayari Marie Lessa, Mariela De Lucas Alvarez, Melvin Laux, Lucas Amparo Barbosa, Frank Kirchner, Rebecca Adam

2604.12929 2026-04-15 cs.CV

Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

Ayce Idil Aytekin, Xu Chen, Zhengyang Shen, Thabo Beeler, Helge Rhodin, Rishabh Dabral, Christian Theobalt

Comments Project page: https://aidilayce.github.io/GraG-page/

2604.12917 2026-04-15 cs.CV

M3D-Stereo: A Multiple-Medium and Multiple-Degradation Dataset for Stereo Image Restoration

Deqing Yang, Yingying Liu, Qicong Wang, Zhi Zeng, Dajiang Lu, Yibin Tian

2604.12916 2026-04-15 cs.RO

E2E-Fly: An Integrated Training-to-Deployment System for End-to-End Quadrotor Autonomy

Fangyu Sun, Fanxing Li, Linzuo Zhang, Yu Hu, Renbiao Jin, Shuyu Wu, Wenxian Yu, Danping Zou