arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.16299 2026-04-20 cs.CV

Repurposing 3D Generative Model for Autoregressive Layout Generation

Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng

Comments https://fenghora.github.io/LaviGen-Page/

详情

英文摘要

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model that integrates scene, object, and instruction information and employs a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. Our code is publicly available at https://github.com/fenghora/LaviGen.

URL PDF HTML ☆

赞 0 踩 0

2604.16298 2026-04-20 cs.CV cs.RO

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo

Comments Accepted by CVPR 2026 Findings

2604.16284 2026-04-20 cs.CV

Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan

Shivarth Rai, Tejeswar Pokuri

Comments Accepted at CV4Animals Workshop, CVPR 2025

2604.16282 2026-04-20 cs.LG math.DS math.PR

Geometric regularization of autoencoders via observed stochastic dynamics

Sean Hill, Felix X. -F. Ye

详情

英文摘要

Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional ambient space. Building a reduced simulator from short-burst ambient ensembles is a long-standing problem: local-chart methods like ATLAS suffer from exponential landmark scaling and per-step reprojection, while autoencoder alternatives leave tangent-bundle geometry poorly constrained, and the errors propagate into the learned drift and diffusion. We observe that the ambient covariance~$Λ$ already encodes coordinate-invariant tangent-space information, its range spanning the tangent bundle. Using this, we construct a tangent-bundle penalty and an inverse-consistency penalty for a three-stage pipeline (chart learning, latent drift, latent diffusion) that learns a single nonlinear chart and the latent SDE. The penalties induce a function-space metric, the $ρ$-metric, strictly weaker than the Sobolev $H^1$ norm yet achieving the same chart-quality generalization rate up to logarithmic factors. For the drift, we derive an encoder-pullback target via Itô's formula on the learned encoder and prove a bias decomposition showing the standard decoder-side formula carries systematic error for any imperfect chart. Under a $W^{2,\infty}$ chart-convergence assumption, chart-level error propagates controllably to weak convergence of the ambient dynamics and to convergence of radial mean first-passage times. Experiments on four surfaces embedded in up to $201$ ambient dimensions reduce radial MFPT error by $50$--$70\%$ under rotation dynamics and achieve the lowest inter-well MFPT error on most surface--transition pairs under metastable Müller--Brown Langevin dynamics, while reducing end-to-end ambient coefficient errors by up to an order of magnitude relative to an unregularized autoencoder.

URL PDF HTML ☆

赞 0 踩 0

2604.16280 2026-04-20 cs.AI

Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing

Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, Wolfram Höpken

Comments 14 pages, 8 figures, Submittet to conference

2604.16279 2026-04-20 cs.LG physics.chem-ph

Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design

Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir, Colin Grambow, John Bradshaw, Patricia Suriana, Chen Cheng, Kangway Chuang

2604.16275 2026-04-20 cs.CL

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

Hitesh Mehta, Arjit Saxena, Garima Chhikara, Rohit Kumar

2604.16270 2026-04-20 cs.CL cs.AI

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

Van-Truong Le

Comments 7 pages, 2 figures. Accepted at the FISU Joint Conference on Artificial Intelligence (FJCAI 2026), Vietnam

2604.16266 2026-04-20 cs.CV

Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement

Tejeswar Pokuri, Shivarth Rai

Comments Accepted at AI4ES Workshop AAAI 2026

2604.16265 2026-04-20 cs.LG

FL-MHSM: Spatially-adaptive Fusion and Ensemble Learning for Flood-Landslide Multi-Hazard Susceptibility Mapping at Regional Scale

Aswathi Mundayatt, Jaya Sreevalsan-Nair

详情

英文摘要

Existing multi-hazard susceptibility mapping (MHSM) studies often rely on spatially uniform models, treat hazards independently, and provide limited representation of cross-hazard dependence and uncertainty. To address these limitations, this study proposes a deep learning (DL) workflow for joint flood-landslide multi-hazard susceptibility mapping (FL-MHSM) that combines two-level spatial partitioning, probabilistic Early Fusion (EF), a tree-based Late Fusion (LF) baseline, and a soft-gating Mixture of Experts (MoE) model, with MoE serving as final predictive model. The proposed design preserves spatial heterogeneity through zonal partitions and enables data-parallel large-area prediction using overlapping lattice grids. In Kerala, EF remained competitive with LF, improving flood recall from 0.816 to 0.840 and reducing Brier score from 0.092 to 0.086, while MoE provided strongest performance for flood susceptibility, achieving an AUC-ROC of 0.905, recall of 0.930, and F1-score of 0.722. In Nepal, EF similarly improved flood recall from 0.820 to 0.858 and reduced Brier score from 0.057 to 0.049 relative to LF, while MoE outperformed both EF and LF for landslide susceptibility, achieving an AUC-ROC of 0.914, recall of 0.901, and F1-score of 0.559. GeoDetector analysis of MoE outputs further showed that dominant factors varied more across zones in Kerala, where susceptibility was shaped by different combinations of topographic, land-cover, and drainage-related controls, while Nepal showed a more consistent influence of topographic and glacier-related factors across zones. These findings show that EF and LF provide complementary predictive behavior, and that their spatially adaptive integration through MoE yields robust overall predictive performance for FL-MHSM while supporting interpretable characterization of multi-hazard susceptibility in spatially heterogeneous landscapes.

URL PDF HTML ☆

赞 0 踩 0

2604.16264 2026-04-20 cs.CV cs.LG

Information Router for Mitigating Modality Dominance in Vision-Language Models

Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib

2604.16263 2026-04-20 cs.RO

Semantic Area Graph Reasoning for Multi-Robot Language-Guided Search

Ruiyang Wang, Hao-Lun Hsu, Jiwoo Kim, Miroslav Pajic

2604.16262 2026-04-20 cs.CL

SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough, Saman Jayasinghe

Comments 6 pages, 5 Tables, 1 figure, Accepted to SemEval 2026

2604.16259 2026-04-20 cs.LG cs.AI

Beyond Distribution Sharpening: The Importance of Task Rewards

Sarthak Mittal, Leo Gagnon, Guillaume Lajoie

2604.16258 2026-04-20 cs.AI

Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis

Comments arXiv admin note: text overlap with arXiv:2507.02989

2604.16256 2026-04-20 cs.CV cs.CL

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

Yige Xu, Yongjie Wang, Zizhuo Wu, Kaisong Song, Jun Lin, Zhiqi Shen

2604.16248 2026-04-20 cs.CV

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem, Shruti Vyas

Comments Accepted to the CVPR EarthVision 2026 Workshop

2604.16247 2026-04-20 cs.LG cs.AI

Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization

Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

2604.16242 2026-04-20 cs.LG cs.CL

Detecting and Suppressing Reward Hacking with Gradient Fingerprints

Songtao Wang, Quang Hieu Pham, Fangcong Yin, Xinpeng Wang, Jocelyn Qiaochu Chen, Greg Durrett, Xi Ye

2604.16241 2026-04-20 cs.CL cs.AI

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

Jiacheng Shen, Masato Hagiwara, Milad Alizadeh, Ellen Gilsenan-McMahon, Marius Miron, David Robinson, Emmanuel Chemla, Sara Keen, Gagan Narula, Mathieu Laurière, Matthieu Geist, Olivier Pietquin

Comments 28 pages, 3 figures

2604.16240 2026-04-20 cs.CV

CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

Nishq Poorav Desai, Ali Etemad, Michael Greenspan

Comments Accepted to ICPR 2026

2604.16238 2026-04-20 cs.LG physics.ao-ph stat.ML

Enhancing AI and Dynamical Subseasonal Forecasts with Probabilistic Bias Correction

Hannah Guan, Soukayna Mouatadid, Paulo Orenstein, Judah Cohen, Haiyu Dong, Zekun Ni, Jeremy Berman, Genevieve Flaspohler, Alex Lu, Jakob Schloer, Joshua Talib, Jonathan A. Weyn, Lester Mackey

2604.16235 2026-04-20 cs.CL

Optimizing Korean-Centric LLMs via Token Pruning

Hoyeol Kim, Hyeonwoo Kim

Comments 5 pages

2604.16234 2026-04-20 cs.CV cs.AI

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

Van-Truong Le, Le-Khanh Nguyen, Trong-Doanh Nguyen

Comments 7 pages, 5 figures. Accepted at the FISU Joint Conference on Artificial Intelligence (FJCAI 2026), Vietnam

2604.16232 2026-04-20 cs.LG cs.AI cs.CE cs.SC

Neuro-Symbolic ODE Discovery with Latent Grammar Flow

Karin Yu, Eleni Chatzi, Georgios Kissas

2604.16231 2026-04-20 cs.CV

Dental Panoramic Radiograph Analysis Using YOLO26 From Tooth Detection to Disease Diagnosis

Khawaja Azfar Asif, Rafaqat Alam Khan

2604.16220 2026-04-20 cs.LG

OT on the Map: Quantifying Domain Shifts in Geographic Space

Haoran Zhang, Livia Betti, Konstantin Klemmer, Esther Rolf, David Alvarez-Melis

2604.16217 2026-04-20 cs.CL cs.AI

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

Yanli Wang, Peng Kuang, Xiaoyu Han, Kaidi Xu, Haohan Wang

2604.16214 2026-04-20 cs.CV

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

Deepak Kumar, Abhishek Pratap Singh, Puneet Kumar, Xiaobai Li, Balasubramanian Raman

2604.16201 2026-04-20 cs.RO cs.CV

DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs

Nikhil Behari, Diego Rivero, Luke Apostolides, Suman Ghosh, Paul Pu Liang, Ramesh Raskar