arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.04090 2026-03-24 cs.CV

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, Yiyi Liao

Comments Project page: https://xdimlab.github.io/Gen3R/

详情

英文摘要

We present Gen3R, a method that bridges the strong priors of foundational reconstruction models and video diffusion models for scene-level 3D generation. We repurpose the VGGT reconstruction model to produce geometric latents by training an adapter on its tokens, which are regularized to align with the appearance latents of pre-trained video diffusion models. By jointly generating these disentangled yet aligned latents, Gen3R produces both RGB videos and corresponding 3D geometry, including camera poses, depth maps, and global point clouds. Experiments demonstrate that our approach achieves state-of-the-art results in single- and multi-image conditioned 3D scene generation. Additionally, our method can enhance the robustness of reconstruction by leveraging generative priors, demonstrating the mutual benefit of tightly coupling reconstruction and generative models.

URL PDF HTML ☆

赞 0 踩 0

2601.03385 2026-03-24 cs.LG math.PR

SIGMA: Scalable Spectral Insights for LLM Model Collapse

Yi Gu, Lingyou Pang, Xiangkun Ye, Tianyu Wang, Jianyu Lin, Carey E. Priebe, Alexander Aue

2601.01547 2026-03-24 cs.CV cs.AI cs.LG

Vision-language models lag human performance on physical dynamics and intent reasoning

Tianjun Gu, Jingyu Gong, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan, Athanasios V

2601.00834 2026-03-24 cs.LG cs.AI

Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds

Julian Evan Chrisnanto, Salsabila Rahma Alia, Nurfauzi Fadillah, Yulison Herry Chrisnanto

Comments 19 pages, 7 figures

2601.00614 2026-03-24 cs.RO cs.SY eess.SY

From 2D to 3D terrain-following area coverage path planning

Mogens Plessen

Comments 6 pages, 10 figures, 1 table, IEEE ICARSC 2026

2512.19735 2026-03-24 cs.LG

Improving Fairness of Large Language Model-Based ICU Mortality Prediction via Case-Based Prompting

Gangxiong Zhang, Yongchao Long, Yuxi Zhou, Yong Zhang, Shenda Hong

2512.17495 2026-03-24 cs.CV

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Rang Li, Lei Li, Shuhuai Ren, Hao Tian, Shuhao Gu, Shicheng Li, Zihao Yue, Yudong Wang, Wenhan Ma, Zhe Yang, Jingyuan Ma, Zhifang Sui, Fuli Luo

2512.16523 2026-03-24 cs.CV cs.AI

TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models

Zhiwei Li, Yitian Pang, Weining Wang, Zhenan Sun, Qi Li

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2512.11192 2026-03-24 cs.CL

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

Luca Foppiano, Sotaro Takeshita, Pedro Ortiz Suarez, Ekaterina Borisova, Raia Abu Ahmad, Malte Ostendorff, Fabio Barth, Julian Moreno-Schneider, Georg Rehm

Comments 13 pages, 3 figures, 3 tables

2512.09278 2026-03-24 cs.CV

LoGoColor: Local-Global 3D Colorization for 360° Scenes

Yeonjin Chang, Juhwan Cho, Seunghyeon Seo, Wonsik Shin, Nojun Kwak

Comments Project page is available at: https://yeonjin-chang.github.io/LoGoColor/

2512.08713 2026-03-24 cs.CL cs.AI

Automatic Essay Scoring and Feedback Generation in Basque Language Learning

Ekhi Azurmendi, Xabier Arregi, Oier Lopez de Lacalle

Comments Accepted to LREC 2026

2512.08441 2026-03-24 cs.CV

Leveraging Multispectral Sensors for Color Correction in Mobile Cameras

Luca Cogo, Marco Buzzelli, Simone Bianco, Javier Vazquez-Corral, Raimondo Schettini

Comments Accepted to CVPR 2026. Camera-ready version

2512.05905 2026-03-24 cs.CV

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie Tang

2512.04619 2026-03-24 cs.CV

Denoise to Track: Harnessing Video Diffusion Priors for Robust Correspondence

Tianyu Yuan, Yuanbo Yang, Lin-Zhuo Chen, Yao Yao, Zhuzhong Qian

2512.03989 2026-03-24 cs.CL

Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models

Taido Purason, Pavel Chizhov, Ivan P. Yamshchikov, Mark Fishel

Comments Accepted to Findings of EACL 2026

2512.03903 2026-03-24 cs.CL cs.AI

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Ekhi Azurmendi, Joseba Fernandez de Landa, Jaione Bengoetxea, Maite Heredia, Julen Etxaniz, Mikel Zubillaga, Ander Soraluze, Aitor Soroa

Comments Under review for the Journal Procesamiento de Lenguaje Natural 2026 // En revisión en la revista de Procesamiente de Lenguaje Natural 2026

2512.03290 2026-03-24 cs.LG physics.app-ph

ASPEN: An Adaptive Spectral Physics-Enabled Network for Ginzburg-Landau Dynamics

Julian Evan Chrisnanto, Nurfauzi Fadillah, Yulison Herry Chrisnanto

Comments 15 pages, 7 figures

详情

英文摘要

Physics-Informed Neural Networks (PINNs) have emerged as a powerful, mesh-free paradigm for solving partial differential equations (PDEs). However, they notoriously struggle with stiff, multi-scale, and nonlinear systems due to the inherent spectral bias of standard multilayer perceptron (MLP) architectures, which prevents them from adequately representing high-frequency components. In this work, we introduce the Adaptive Spectral Physics-Enabled Network (ASPEN), a novel architecture designed to overcome this critical limitation. ASPEN integrates an adaptive spectral layer with learnable Fourier features directly into the network's input stage. This mechanism allows the model to dynamically tune its own spectral basis during training, enabling it to efficiently learn and represent the precise frequency content required by the solution. We demonstrate the efficacy of ASPEN by applying it to the complex Ginzburg-Landau equation (CGLE), a canonical and challenging benchmark for nonlinear, stiff spatio-temporal dynamics. Our results show that a standard PINN architecture catastrophically fails on this problem, diverging into non-physical oscillations. In contrast, ASPEN successfully solves the CGLE with exceptional accuracy. The predicted solution is visually indistinguishable from the high-resolution ground truth, achieving a low median physics residual of 5.10 x 10^-3. Furthermore, we validate that ASPEN's solution is not only pointwise accurate but also physically consistent, correctly capturing emergent physical properties, including the rapid free energy relaxation and the long-term stability of the domain wall front. This work demonstrates that by incorporating an adaptive spectral basis, our framework provides a robust and physically-consistent solver for complex dynamical systems where standard PINNs fail, opening new options for machine learning in challenging physical domains.

URL PDF HTML ☆

赞 0 踩 0

2512.01495 2026-03-24 cs.CV

ELVIS: Enhance Low-Light for Video Instance Segmentation in the Dark

Joanne Lin, Ruirui Lin, Yini Li, David Bull, Nantheera Anantrasirichai

Comments Accepted to CVPR 2026

2512.00385 2026-03-24 cs.CV

EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation

Louis Geist, Loic Landrieu, Damien Robert

Comments Accepted at ICRA 2026. Camera-ready version with Appendix

2512.00021 2026-03-24 cs.RO cs.CV

Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

Kemal Oksuz, Alexandru Buburuzan, Anthony Knittel, Yuhan Yao, Puneet K. Dokania

Comments Accepted to TMLR (Survey Certification)

2511.23455 2026-03-24 cs.LG cs.AI cs.CY

The Price of Progress: Price Performance and the Future of AI

Hans Gundlach, Jayson Lynch, Matthias Mertens, Neil Thompson

2511.22169 2026-03-24 cs.CV cs.AI

Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

Inha Kang, Eunki Kim, Wonjeong Ryu, Jaeyo Shin, Seungjun Yu, Yoon-Hee Kang, Seongeun Jeong, Eunhye Kim, Soontae Kim, Hyunjung Shim

Comments 31 pages

2511.21565 2026-03-24 cs.CV

UAVLight: A Benchmark for Illumination-Robust 3D Reconstruction in Unmanned Aerial Vehicle (UAV) Scenes

Kang Du, Xue Liao, Junpeng Xia, Chaozheng Guo, Yi Gu, Yirui Guan, Duotun Wang, Sheng Huang, Zeyu Wang

Comments 10 pages, 6 figures

2511.20279 2026-03-24 cs.CV

SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors

Fabian Gülhan, Emil Mededovic, Yuli Wu, Johannes Stegmaier

Comments 18 pages, 7 figures, 7 tables

2511.19299 2026-03-24 cs.LG cs.AI

Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

James R. M. Black, Moritz S. Hanke, Aaron Maiwald, Tina Hernandez-Boussard, Oliver M. Crook, Jaspreet Pannu

Comments 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Biosecurity Safeguards for Generative AI

2511.19235 2026-03-24 cs.CV

IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes

Carl Lindström, Mahan Rafidashti, Maryam Fatemi, Lars Hammarstrand, Martin R. Oswald, Lennart Svensson

2511.17561 2026-03-24 cs.CL cs.AI

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Huimin Ren, Yan Liang, Baiqiao Su, Chaobo Sun, Hengtong Lu, Kaike Zhang, Chen Wei

2511.15700 2026-03-24 cs.CV

First Frame Is the Place to Go for Video Content Customization

Jingxi Chen, Zongxia Li, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermuller, Brandon Y. Feng, Yiannis Aloimonos

Comments Accepted to CVPR 2026

2511.14977 2026-03-24 cs.RO cs.AI

SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

Xiangyu Li, Tianyi Wang, Junfeng Jiao, Christian Claudel, Zhaomiao Guo

详情

英文摘要

As autonomous vehicles (AVs) are increasingly deployed on public roads, understanding their real-world behaviors is critical for traffic safety analysis and regulatory oversight. However, many data-driven methods lack interpretability and cannot provide verifiable explanations of AV behavior in mixed traffic. This paper proposes SVBRD-LLM, a self-verifying behavioral rule discovery framework that automatically extracts interpretable behavioral rules from real-world traffic videos through zero-shot large language model (LLM) reasoning. The framework first derives vehicle trajectories using YOLOv26-based detection and ByteTrack-based tracking, then computes kinematic features and contextual information. It then employs GPT-5 zero-shot prompting to perform comparative behavioral analysis between AVs and human-driven vehicles (HDVs) across lane-changing and normal driving behaviors, generating 26 structured rule hypotheses that comprises both numerical thresholds and statistical behavioral patterns. These rules are subsequently evaluated through the AV identification task using an independent validation dataset, and iteratively refined through failure case analysis to filter spurious correlations and improve robustness. The resulting rule library contains 20 high-confidence behavioral rules, each including semantic description, quantitative thresholds or behavioral patterns, applicable context, and validation confidence. Experiments conducted on over 1,500 hours of real-world traffic videos from Waymo's commercial operating area demonstrate that the proposed framework achieves 90.0% accuracy and 93.3% F1-score in AV identification, with 98.0% recall. The discovered rules capture key AV traits in smoothness, conservatism, and lane discipline, informing safety assessment, regulatory compliance, and traffic management in mixed traffic. The dataset is available at: svbrd-llm-roadside-video-av.

URL PDF HTML ☆

赞 0 踩 0

2511.14783 2026-03-24 cs.CL cs.CY

Human or LLM as Standardized Patients? A Comparative Study for Medical Education

Bingquan Zhang, Xiaoxiao Liu, Yuchi Wang, Lei Zhou, Qianqian Xie, Benyou Wang

Comments 24 pages, 13 figures, 10 table