arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.30045 2026-04-01 cs.CV

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

Yuheng Liu, Xin Lin, Xinke Li, Baihan Yang, Chen Wang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Hao Tan, Kai Zhang, Xiaohui Xie, Zifan Shi, Yiwei Hu

Comments Code is available at https://github.com/yuhengliu02/OmniRoam

详情

英文摘要

Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction. Code is available at https://github.com/yuhengliu02/OmniRoam.

URL PDF HTML ☆

赞 0 踩 0

2603.30043 2026-04-01 cs.CV

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

Kaleb Newman, Tyler Zhu, Olga Russakovsky

2603.30042 2026-04-01 cs.RO cs.HC

HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation

Xiangshan Tan, Jingtian Ji, Tianchong Jiang, Pedro Lopes, Matthew R. Walter

Comments Accepted to IEEE International Conference on Robotics and Automation (ICRA), 2026. 8 pages, 5 figures. Project page: https://ripl.github.io/HapCompass/

2603.30038 2026-04-01 cs.CV

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

Wenyi Li, Renkai Luo, Yue Yu, Huan-ang Gao, Mingju Gao, Li Yuan, Chaoyou Fu, Hao Zhao

Comments Accepted by CVPR 2026; Project page: https://geocodebench.github.io/

2603.30036 2026-04-01 cs.LG cs.AI

Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah

2603.30035 2026-04-01 cs.LG cs.CL

Reward-Based Online LLM Routing via NeuralUCB

Ming-Hua Tsai, Phat Tran

2603.30033 2026-04-01 cs.LG cs.AI

Tucker Attention: A generalization of approximate attention mechanisms

Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer

2603.30032 2026-04-01 cs.CL cs.SD

Covertly improving intelligibility with data-driven adaptations of speech timing

Paige Tuttösí, Angelica Lim, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier

详情

英文摘要

Human talkers often address listeners with language-comprehension challenges, such as hard-of-hearing or non-native adults, by globally slowing down their speech. However, it remains unclear whether this strategy actually makes speech more intelligible. Here, we take advantage of recent advancements in machine-generated speech allowing more precise control of speech rate in order to systematically examine how targeted speech-rate adjustments may improve comprehension. We first use reverse-correlation experiments to show that the temporal influence of speech rate prior to a target vowel contrast (ex. the tense-lax distinction) in fact manifests in a scissor-like pattern, with opposite effects in early versus late context windows; this pattern is remarkably stable both within individuals and across native L1-English listeners and L2-English listeners with French, Mandarin, and Japanese L1s. Second, we show that this speech rate structure not only facilitates L2 listeners' comprehension of the target vowel contrast, but that native listeners also rely on this pattern in challenging acoustic conditions. Finally, we build a data-driven text-to-speech algorithm that replicates this temporal structure on novel speech sequences. Across a variety of sentences and vowel contrasts, listeners remained unaware that such targeted slowing improved word comprehension. Strikingly, participants instead judged the common strategy of global slowing as clearer, even though it actually increased comprehension errors. Together, these results show that targeted adjustments to speech rate significantly aid intelligibility under challenging conditions, while often going unnoticed. More generally, this paper provides a data-driven methodology to improve the accessibility of machine-generated speech which can be extended to other aspects of speech comprehension and a wide variety of listeners and environments.

URL PDF HTML ☆

赞 0 踩 0

2603.30025 2026-04-01 cs.CL

ContextClaim: A Context-Driven Paradigm for Verifiable Claim Detection

Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

2603.30022 2026-04-01 cs.RO cs.AI

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

Md Saad, Sajjad Hussain, Mohd Suhaib

2603.30017 2026-04-01 cs.LG cs.CR stat.ML

Refined Detection for Gumbel Watermarking

Tor Lattimore

2603.30002 2026-04-01 cs.LG cs.CL

Tracking Equivalent Mechanistic Interpretations Across Neural Networks

Alan Sun, Mariya Toneva

Comments 32 pages, 5 figures, ICLR 2026

2603.29997 2026-04-01 cs.CL cs.AI

Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

Mohammadhossein Khojasteh, Yifan Jiang, Stefano De Giorgis, Frank van Harmelen, Filip Ilievski

2603.29993 2026-04-01 cs.AI

Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

Nathan Heath

2603.29990 2026-04-01 cs.CV

SurgNavAR: An Augmented Reality Surgical Navigation Framework for Optical See-Through Head Mounted Displays

Abdullah Thabit, Mohamed Benmahdjoub, Rafiuddin Jinabade, Hizirwan S. Salim, Marie-Lise C. van Veelen, Mark G. van Vledder, Eppo B. Wolvius, Theo van Walsum

Comments This work has been submitted to the IEEE for possible publication

2603.29979 2026-04-01 cs.CL cs.HC cs.IR

Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior

Junwei Yu, Mufeng Yang, Yepeng Ding, Hiroyuki Sato

Comments 12 pages, 5 figures. This paper proposes GEO-SFE, a structural feature engineering framework for generative engine optimization

2603.29974 2026-04-01 cs.LG

Meteorology-Driven GPT4AP: A Multi-Task Forecasting LLM for Atmospheric Air Pollution in Data-Scarce Settings

Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan

Comments This manuscript is under review

2603.29968 2026-04-01 cs.CV cs.AI

Trimodal Deep Learning for Glioma Survival Prediction: A Feasibility Study Integrating Histopathology, Gene Expression, and MRI

Iain Swift, JingHua Ye

Comments 6 pages, 1 figure, submitted to the IEEE CBMS 2026 conference, still waiting for notification

2603.29967 2026-04-01 cs.CV

Learning Structural-Functional Brain Representations through Multi-Scale Adaptive Graph Attention for Cognitive Insight

Badhan Mazumder, Sir-Lord Wiafe, Aline Kotoski, Vince D. Calhoun, Dong Hye Ye

Comments Preprint version of the paper accepted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026). This is the author's accepted manuscript. The final published version will appear in IEEE Xplore

2603.29960 2026-04-01 cs.CV

NeuroBRIDGE: Behavior-Conditioned Koopman Dynamics with Riemannian Alignment for Early Substance Use Initiation Prediction from Longitudinal Functional Connectome

Badhan Mazumder, Sir-Lord Wiafe, Vince D. Calhoun, Dong Hye Ye

Comments Preprint version of the paper accepted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026). This is the author's accepted manuscript. The final published version will appear in IEEE Xplore

2603.29953 2026-04-01 cs.AI cs.HC

Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect

Peng Gang

Comments 25 pages, figures, tables, and appendix. Third paper in a cumulative research series on PPS and 5W3H structured intent representation, extending prior work to cross-model robustness, framework comparison, and user-study validation

2603.29950 2026-04-01 cs.AI cs.CL

Physiological and Semantic Patterns in Medical Teams Using an Intelligent Tutoring System

Xiaoshan Huang, Conrad Borchers, Jiayi Zhang, Susanne P. Lajoie

Comments Accepted as short paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

2603.29946 2026-04-01 cs.LG

Real-Time Explanations for Tabular Foundation Models

Luan Borges Teodoro Reis Sena, Francisco Galuppo Azevedo

Comments Accepted at the 2nd DATA4Science Workshop at ICLR 2026, Rio de Janeiro, Brazil. OpenReview: https://openreview.net/forum?id=StSMBSZqxx

2603.29943 2026-04-01 cs.CV

EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos

Fumihiko Tsuchiya, Taiki Miyanishi, Mahiro Ukai, Nakamasa Inoue, Shuhei Kurita, Yusuke Iwasawa, Yutaka Matsuo

Comments The first two authors are equally contributed. The data and code are publicly available at: https://github.com/matsuolab/EC-Bench

2603.29941 2026-04-01 cs.CV cs.LG

Better than Average: Spatially-Aware Aggregation of Segmentation Uncertainty Improves Downstream Performance

Vanessa Emanuela Guarino, Claudia Winklmayr, Jannik Franzen, Josef Lorenz Rumberger, Manuel Pfeuffer, Sonja Greven, Klaus Maier-Hein, Carsten T. Lüth, Christoph Karg, Dagmar Kainmueller

Comments 27 pages, 13 figures, 6 tables. Accepted at CVPR 2026 (The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026)

2603.29937 2026-04-01 cs.CL cs.IR

Rewrite the News: Tracing Editorial Reuse Across News Agencies

Soveatin Kuntur, Nina Smirnova, Anna Wroblewska, Philipp Mayr, Sebastijan Razboršek Maček

Comments The paper is accepted to SoCon-NLPSI 2026 : Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) workshop co-located with LREC 2026

2603.29931 2026-04-01 cs.CV

Gloria: Consistent Character Video Generation via Content Anchors

Yuhang Yang, Fan Zhang, Huaijin Pi, Shuai Guo, Guowei Xu, Wei Zhai, Yang Cao, Zheng-Jun Zha

Comments Accepted by CVPR2026 Main, project: https://yyvhang.github.io/Gloria_Page/

2603.29927 2026-04-01 cs.CV cs.AI cs.LG

End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

Raül Pérez-Gonzalo, Andreas Espersen, Søren Forchhammer, Antonio Agudo

Comments Accepted to TNNLS 2026

2603.29924 2026-04-01 cs.CV

Abstraction in Style

Min Lu, Yuanfeng He, Anthony Chen, Jianhuang He, Pu Wang, Daniel Cohen-Or, Hui Huang

Comments siggraph 2026 conditionally accepted paper

2603.29922 2026-04-01 cs.CV cs.AI

Training deep learning based dynamic MR image reconstruction using synthetic fractals

Anirudh Raman, Olivier Jaubert, Mark Wrobel, Tina Yao, Ruaraidh Campbell, Rebecca Baker, Ruta Virsinskaite, Daniel Knight, Michael Quail, Jennifer Steeden, Vivek Muthurangu