arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16871 2026-03-18 cs.CV

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

Jisu Nam, Yicong Hong, Chun-Hao Paul Huang, Feng Liu, JoungBin Lee, Jiyoung Kim, Siyoon Jin, Yunsung Lee, Jaeyoon Jung, Suhwan Choi, Seungryong Kim, Yang Zhou

Comments Project page is available at https://cvlab-kaist.github.io/WorldCam/

详情

英文摘要

Recent advances in video diffusion transformers have enabled interactive gaming world models that allow users to explore generated environments over extended horizons. However, existing approaches struggle with precise action control and long-horizon 3D consistency. Most prior works treat user actions as abstract conditioning signals, overlooking the fundamental geometric coupling between actions and the 3D world, whereby actions induce relative camera motions that accumulate into a global camera pose within a 3D world. In this paper, we establish camera pose as a unifying geometric representation to jointly ground immediate action control and long-term 3D consistency. First, we define a physics-based continuous action space and represent user inputs in the Lie algebra to derive precise 6-DoF camera poses, which are injected into the generative model via a camera embedder to ensure accurate action alignment. Second, we use global camera poses as spatial indices to retrieve relevant past observations, enabling geometrically consistent revisiting of locations during long-horizon navigation. To support this research, we introduce a large-scale dataset comprising 3,000 minutes of authentic human gameplay annotated with camera trajectories and textual descriptions. Extensive experiments show that our approach substantially outperforms state-of-the-art interactive gaming world models in action controllability, long-horizon visual quality, and 3D spatial consistency.

URL PDF HTML ☆

赞 0 踩 0

2603.16868 2026-03-18 cs.CV cs.AI cs.RO

MessyKitchens: Contact-rich object-level 3D scene reconstruction

Junaid Ahmed Ansari, Ran Ding, Fabio Pizzati, Ivan Laptev

2603.16866 2026-03-18 cs.RO cs.AI cs.GR cs.LG cs.SE

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

Kaixuan Wang, Tianxing Chen, Jiawei Liu, Honghao Su, Shaolong Zhu, Minxuan Wang, Zixuan Li, Yue Chen, Huan-ang Gao, Yusen Qin, Jiawei Wang, Qixuan Zhang, Lan Xu, Jingyi Yu, Yao Mu, Ping Luo

Comments Website: https://manitwin.github.io/

2603.16864 2026-03-18 cs.CV cs.AI

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Jiongze Yu, Xiangbo Gao, Pooja Verlani, Akshay Gadde, Yilin Wang, Balu Adsumilli, Zhengzhong Tu

2603.16862 2026-03-18 cs.CL

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

Sahil Sen, Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah

2603.16860 2026-03-18 cs.RO

DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models

Emily Yue-Ting Jia, Weiduo Yuan, Tianheng Shi, Vitor Guizilini, Jiageng Mao, Yue Wang

2603.16859 2026-03-18 cs.AI

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Tianyu Xie, Jinfa Huang, Yuexiao Ma, Rongfang Luo, Yan Yang, Wang Chen, Yuhui Zeng, Ruize Fang, Yixuan Zou, Xiawu Zheng, Jiebo Luo, Rongrong Ji

Comments Code is available at https://github.com/MAC-AutoML/SocialOmni and dataset is available at https://huggingface.co/datasets/alexisty/SocialOmni

2603.16858 2026-03-18 cs.CV cs.AI

SOMA: Unifying Parametric Human Body Models

Jun Saito, Jiefeng Li, Michael de Ruyter, Miguel Guerrero, Edy Lim, Ehsan Hassani, Roger Blanco Ribera, Hyejin Moon, Magdalena Dadela, Marco Di Lucca, Qiao Wang, Xueting Li, Jan Kautz, Simon Yuen, Umar Iqbal

2603.16857 2026-03-18 cs.LG

Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Transformers

Mayur Patil, Qadeer Ahmed, Shawn Midlam-Mohler, Stephanie Marik, Allen Sheldon, Rajeev Chhajer, Nithin Santhanam

2603.16856 2026-03-18 cs.CL

Online Experiential Learning for Language Models

Tianzhu Ye, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei

2603.16853 2026-03-18 cs.RO cs.GR

BrickSim: A Physics-Based Simulator for Manipulating Interlocking Brick Assemblies

Haowei Wen, Ruixuan Liu, Weiyi Piao, Siyu Li, Changliu Liu

Comments 9 pages, 9 figures

2603.16851 2026-03-18 eess.SY cs.SY math.OC

Koopman Lifted Finite Memory Identification via Truncated Grunwald Letnikov Kernels

Navid Mojahed, Mahdis Rabbani, Shima Nazari

Comments 6 pages, 1 figure, submitted to IEEE Control Systems Letters (L-CSS)

2603.16850 2026-03-18 math.NA cs.AI cs.DC cs.NA math.DS math.OC

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

Xavier Gonzalez

Comments PhD Dissertation; Stanford University

详情

DOI: 10.25740/vf943fc9855

英文摘要

Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical systems can in fact be parallelized across the sequence length by reframing their evaluation as a system of nonlinear equations, which can be solved with Newton's method using a parallel associative scan. However, these parallel Newton methods struggled with limitations, primarily inefficiency, instability, and lack of convergence guarantees. This thesis addresses these limitations with methodological and theoretical contributions, drawing particularly from optimization. Methodologically, we develop scalable and stable parallel Newton methods, based on quasi-Newton and trust-region approaches. The quasi-Newton methods are faster and more memory efficient, while the trust-region approaches are significantly more stable. Theoretically, we unify many fixed-point methods into our parallel Newton framework, including Picard and Jacobi iterations. We establish a linear convergence rate for these techniques that depends on the method's approximation accuracy and stability. Moreover, we give a precise condition, rooted in dynamical stability, that characterizes when parallelization provably accelerates a dynamical system and when it cannot. Specifically, the sign of the Largest Lyapunov Exponent of a dynamical system determines whether or not parallel Newton methods converge quickly. In sum, this thesis unlocks scalable and stable methods for parallelizing sequential computation, and provides a firm theoretical basis for when such techniques will and will not work. This thesis also serves as a guide to parallel Newton methods for researchers who want to write the next chapter in this ongoing story.

URL PDF HTML ☆

赞 0 踩 0

2603.16848 2026-03-18 cs.CL

Mediocrity is the key for LLM as a Judge Anchor Selection

Shachar Don-Yehiya, Asaf Yehudai, Leshem Choshen, Omri Abend

2603.16846 2026-03-18 cs.LG

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Reek Das, Biplab Kanti Sen

Comments 15 pages, 3 figures

2603.16844 2026-03-18 cs.CV

M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM

Kerui Ren, Guanghao Li, Changjian Jiang, Yingxiang Xu, Tao Lu, Linning Xu, Junting Dong, Jiangmiao Pang, Mulin Yu, Bo Dai

Comments Project page: https://city-super.github.io/M3/

2603.16843 2026-03-18 cs.AI

Internalizing Agency from Reflective Experience

Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, Hao Zhang

Comments 17 pages, 5 figures; Submitted to ICML 2026

2603.16842 2026-03-18 cs.LG cond-mat.dis-nn cond-mat.stat-mech cs.SY eess.SY physics.bio-ph

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Jello Zhou, Vudtiwat Ngampruetikorn, David J. Schwab

Comments 18 pages, 17 figures

2603.16841 2026-03-18 eess.SY cs.SY math.PR

Typical models of the distribution system restoration process

Arslan Ahmad, Ian Dobson

2603.16840 2026-03-18 cs.CV cond-mat.mtrl-sci

What DINO saw: ALiBi positional encoding reduces positional bias in Vision Transformers

Moritz Pawlowsky, Antonis Vamvakeros, Alexander Weiss, Anja Bielefeld, Samuel J. Cooper, Ronan Docherty

2603.16839 2026-03-18 cs.AI

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam

Comments 12 pages, 11 figures, 13 tables, 26 references. Code: https://github.com/pushing-the-frontier/slide-forge-llm Dataset: https://huggingface.co/datasets/KarthikRagunathAnandaKumar/sliderl-multi-turn-rollouts

2603.16835 2026-03-18 cs.CV

An assessment of data-centric methods for label noise identification in remote sensing data sets

Felix Kröber, Genc Hoxha, Ribana Roscher

Comments Accepted for publication in International Society for Photogrammetry and Remote Sensing (ISPRS) Annals 2026

2603.16832 2026-03-18 eess.SY cs.SY math.PR

Measuring outage resilience in a distribution system with the number of outages in large events

Arslan Ahmad, Ian Dobson

2603.16829 2026-03-18 stat.ML cs.LG math.ST stat.ME stat.TH

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing

Saksham Jain, Alex Luedtke

2603.16827 2026-03-18 cs.AI cs.CL

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales

Comments 10 pages, pre-print

2603.16825 2026-03-18 cs.RO cs.AI cs.HC

Real-Time Decoding of Movement Onset and Offset for Brain-Controlled Rehabilitation Exoskeleton

Kanishka Mitra, Satyam Kumar, Frigyes Samuel Racz, Deland Liu, Ashish D. Deshpande, José del R. Millán

Comments Accepted to ICRA 2026. 8 pages, 5 figures. Project page available at https://mitrakanishka.github.io/projects/startstop-bci/

2603.16823 2026-03-18 cs.CV

Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines

Sourya Saha, Saptarshi Debroy

Comments Accepted at the The 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2026)

2603.16822 2026-03-18 cs.AI

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

Zhitao Zeng, Mengya Xu, Jian Jiang, Pengfei Guo, Yunqiu Xu, Zhu Zhuo, Chang Han Low, Yufan He, Dong Yang, Chenxi Lin, Yiming Gu, Jiaxin Guo, Yutong Ban, Daguang Xu, Qi Dou, Yueming Jin

2603.16818 2026-03-18 cs.PF

Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)

Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup

详情

DOI: 10.1145/3777911.3801103
Journal ref: 17th ACM/SPEC International Conference on Performance Engineering (ICPE Companion 2026)

英文摘要

Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures and recovery process to help minimize their impact. However, such reports are often lengthy and unstructured, making them difficult to understand, analyze, and use for long-term dependability improvements. The emergence of LLMs offers new opportunities to address this challenge, but how to achieve this is currently understudied. In this paper, we explore the use of cutting-edge LLMs to extract key information from unstructured cloud incident reports. First, we collect more than 3,000 incident reports from 3 leading cloud service providers (AWS, AZURE, and GCP), and manually annotate these collected samples. Then, we design and compare 6 prompt strategies to extract and classify different types of information. We consider 6~LLM models, including 3 lightweight and 3 state-of-the-art (SotA), and evaluate model accuracy, latency, and token cost across datasets, models, prompts, and extracted fields. Our study has uncovered the following key findings: (1) LLMs achieve high metadata extraction accuracy, $75\%\text{--}95\%$ depending on the dataset. (2) Few-shot prompting generally improves accuracy for meta-data fields except for classification, and has better (lower) latency due to shorter output-tokens but requires $1.5\text{--}2\times$ more input-tokens. (3) Lightweight models (e.g., Gemini~2.0, GPT~3.5) offer favorable trade-offs in accuracy, cost, and latency; SotA models yield higher accuracy at significantly greater cost and latency. Our study provides tools, methodologies, and insights for leveraging LLMs to accurately and efficiently extract incident-report information. The FAIR data and code are publicly available at https://github.com/atlarge-research/llm-cloud-incident-extraction.

URL PDF HTML ☆

赞 0 踩 0

2603.16817 2026-03-18 cs.AI cs.CL cs.LG

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights

Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar, Caitlyn Heqi Yin, Ramya Korlakai Vinayak

Comments 56 pages