arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02346 2026-04-06 cs.LG cs.AI cs.SE q-bio.BM

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao

Comments 29 pages, 6 figures

2604.02345 2026-04-06 cs.LG cs.AI

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, Xinyi Wang, Wei Dai, Gang Cao, Yuetang Deng, Zhi Gong, Dezhi Ran, Linyi Li, Wei Yang, Tao Xie

2604.02344 2026-04-06 cs.LG cs.DC cs.PF

Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

Jędrzej Maczan

详情

英文摘要

WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterized. We present a systematic characterization of WebGPU dispatch overhead for LLM inference at batch size 1, spanning four GPU vendors (NVIDIA, AMD, Apple, Intel), two native implementations (Dawn, wgpu-native) and three browsers (Chrome, Safari, Firefox), and two model sizes (Qwen2.5-0.5B and 1.5B). Our primary contribution is a sequential-dispatch methodology that reveals naive single-operation benchmarks overestimate dispatch cost by ${\sim}20\times$. The true per-dispatch cost of WebGPU API overhead alone is 24-36 $μ$s on Vulkan and 32-71 $μ$s on Metal, while the total per-operation overhead including Python cost is ${\sim}95$~$μ$s, which turns out to be a distinction critical for optimization. On Vulkan, kernel fusion improves throughput by 53%, while CUDA fusion provides no benefit, confirming that per-operation overhead is a primary differentiator. LLM inference was tested across three major operating systems (Linux, Windows, macOS). We built $\texttt{torch-webgpu}$, a PrivateUse1-based out-of-tree PyTorch backend and an FX-to-WebGPU compiler, which on our reference platform achieves 11--12% of CUDA performance. At dtype-matched float32, RTX PRO 2000 achieves 1.4$\times$ WebGPU's throughput despite ${\sim}6\times$ less compute than RTX 5090. For dispatch overhead, backend choice is the dominant factor, although implementation choice also matters substantially within a backend (2.2$\times$ for Metal). In terms of dispatch vs kernel compute efficiency, we conclude that at batch=1 with the current dispatch-heavy pipeline, per-operation overhead dominates regardless of kernel quality. All code, benchmarks, and raw data are open source.

URL PDF HTML ☆

赞 0 踩 0

2604.02342 2026-04-06 cs.LG

Homophily-aware Supervised Contrastive Counterfactual Augmented Fair Graph Neural Network

Mahdi Tavassoli Kejani, Fadi Dornaika, Charlotte Laclau, Jean-Michel Loubes

Comments This paper has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning, 2026

2604.02341 2026-04-06 cs.LG cs.AI

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Mohammad Rezaei, Jens Lehmann, Sahar Vahdati

Comments 8 pages, 3 figures, 2 tables, submitted to IJCNN 2026 conference

2604.02339 2026-04-06 cs.LG cs.CL

SIEVE: Sample-Efficient Parametric Learning from Natural Language

Parth Asawa, Alexandros G. Dimakis, Matei Zaharia

2604.02338 2026-04-06 cs.LG cs.CL cs.CV

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

Md Kowsher, Haris Mansoor, Nusrat Jahan Prottasha, Ozlem Garibay, Victor Zhu, Zhengping Ji, Chen Chen

2604.02337 2026-04-06 cs.LG

Generating Counterfactual Patient Timelines from Real-World Data

Yu Akagi, Tomohisa Seki, Toru Takiguchi, Hiromasa Ito, Yoshimasa Kawazoe, Kazuhiko Ohe

2604.02335 2026-04-06 cs.LG cs.NA math.NA

Convolutional Surrogate for 3D Discrete Fracture-Matrix Tensor Upscaling

Martin Špetlík, Jan Březina

Comments 28 pages, 9 figures, published, https://github.com/ martinspetlik/MLMC-DFM/tree/MS_3d

详情

DOI: 10.1016/j.cageo.2026.106105
Journal ref: Computers and Geosciences 209, 106105 (2026)

英文摘要

Modeling groundwater flow in three-dimensional fractured crystalline media requires accounting for strong spatial heterogeneity induced by fractures. Fine-scale discrete fracture-matrix (DFM) simulations can capture this complexity but are computationally expensive, especially when repeated evaluations are needed. To address this, we aim to employ a multilevel Monte Carlo (MLMC) framework in which numerical homogenization is used to upscale sub-resolution fracture effects when transitioning between accuracy levels. To reduce the cost of conventional 3D numerical homogenization, we develop a surrogate model that predicts the equivalent hydraulic conductivity tensor Keq from a voxelized 3D domain representing tensor-valued random fields of matrix and fracture conductivities. Fracture size, orientation, and aperture are sampled from distributions informed by natural observations. The surrogate architecture combines a 3D convolutional neural network with feed-forward layers, enabling it to capture both local spatial features and global interactions. Three surrogates are trained on data generated by DFM simulations, each corresponding to a different fracture-to-matrix conductivity contrast. Performance is evaluated across a wide range of fracture network parameters and matrix-field correlation lengths. The trained models achieve high accuracy, with normalized root-mean-square errors below 0.22 across most test cases. Practical applicability is demonstrated by comparing numerically homogenized conductivities with surrogate predictions in two macro-scale problems: computing equivalent conductivity tensors and predicting outflow from a constrained 3D domain. In both cases, surrogate-based upscaling preserves accuracy while substantially reducing computational cost, achieving speedups exceeding 100x when inference is performed on a GPU.

URL PDF HTML ☆

赞 0 踩 0

2604.02334 2026-04-06 cs.AI cs.MA

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, Zhi Han, Zimian Peng, Linyao Chen, Wenzheng Tom Tang, Zongkai Liu, Tao Zhou, Botao Amber Hu, Shuyang Tang, Jianghao Lin, Weiwen Liu, Muning Wen, Yuanjian Zhou, Weinan Zhang

Comments 38 pages, 8 figures, and 4 tables

2604.02315 2026-04-06 cs.AI

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Sarath Shekkizhar, Romain Cosentino, Adam Earle

2604.01989 2026-04-06 cs.CV cs.AI

Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Boyang Gong, Yu Zheng, Fanye Kong, Jie Zhou, Jiwen Lu

2604.01949 2026-04-06 cs.LG q-bio.GN

annbatch unlocks terabyte-scale training of biological data in anndata

Ilan Gold, Felix Fischer, Lucas Arnoldt, F. Alexander Wolf, Fabian J. Theis

2604.01903 2026-04-06 cs.CV

Light-ResKAN: A Parameter-Sharing Lightweight KAN with Gram Polynomials for Efficient SAR Image Recognition

Pan Yi, Weijie Li, Xiaodong Chen, Jiehua Zhang, Li Liu, Yongxiang Liu

Comments 16 pages, 8 figures, accepted by JSTARS

2604.01848 2026-04-06 cs.CV

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance

Jason Qiu, Zachary Meurer, Xavier Thomas, Deepti Ghadiyaram

2604.01833 2026-04-06 cs.CV cs.CL cs.LG

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Yaxin Luo, Zhiqiang Shen

Comments Main manuscript: 13 pages, 9 figures. Appendix: 8 pages, 5 figures. Accepted in Transactions on Machine Learning Research (TMLR) 2026

2604.01732 2026-04-06 cs.AI cs.LO

Solving the Two-dimensional single stock size Cutting Stock Problem with SAT and MaxSAT

Tuyen Van Kieu, Chi Linh Hoang, Khanh Van To

2604.01730 2026-04-06 cs.LG cs.SY eess.SY

Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine

David Grasev

Comments 21 pages, 23 figures

2604.01624 2026-04-06 cs.AI cs.CL

OSCAR: Orchestrated Self-verification and Cross-path Refinement

Yash Shah, Abhijit Chakraborty, Naresh Kumar Devulapally, Vishnu Lokhande, Vivek Gupta

2604.01581 2026-04-06 cs.CV

Satellite-Free Training for Drone-View Geo-Localization

Tao Liu, Yingzhi Zhang, Kan Ren, Xiaoqi Zhao

详情

英文摘要

Drone-view geo-localization (DVGL) aims to determine the location of drones in GPS-denied environments by retrieving the corresponding geotagged satellite tile from a reference gallery given UAV observations of a location. In many existing formulations, these observations are represented by a single oblique UAV image. In contrast, our satellite-free setting is designed for multi-view UAV sequences, which are used to construct a geometry-normalized UAV-side location representation before cross-view retrieval. Existing approaches rely on satellite imagery during training, either through paired supervision or unsupervised alignment, which limits practical deployment when satellite data are unavailable or restricted. In this paper, we propose a satellite-free training (SFT) framework that converts drone imagery into cross-view compatible representations through three main stages: drone-side 3D scene reconstruction, geometry-based pseudo-orthophoto generation, and satellite-free feature aggregation for retrieval. Specifically, we first reconstruct dense 3D scenes from multi-view drone images using 3D Gaussian splatting and project the reconstructed geometry into pseudo-orthophotos via PCA-guided orthographic projection. This rendering stage operates directly on reconstructed scene geometry without requiring camera parameters at rendering time. Next, we refine these orthophotos with lightweight geometry-guided inpainting to obtain texture-complete drone-side views. Finally, we extract DINOv3 patch features from the generated orthophotos, learn a Fisher vector aggregation model solely from drone data, and reuse it at test time to encode satellite tiles for cross-view retrieval. Experimental results on University-1652 and SUES-200 show that our SFT framework substantially outperforms satellite-free generalization baselines and narrows the gap to methods trained with satellite imagery.

URL PDF HTML ☆

赞 0 踩 0

2604.01479 2026-04-06 cs.CV

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

Zhisheng Huang, Jiahao Chen, Cheng Lin, Chenyu Hu, Hanzhuo Huang, Zhengming Yu, Mengfei Li, Yuheng Liu, Zekai Gu, Zibo Zhao, Yuan Liu, Xin Li, Wenping Wang

2604.01447 2026-04-06 cs.CV cs.AI

Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Derek Austin

2604.01432 2026-04-06 cs.CL

Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation

Hexuan Wang, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi

2604.01202 2026-04-06 cs.AI

Therefore I am. I Think

Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani

2604.01021 2026-04-06 cs.LG cs.AI

Transfer learning for nonparametric Bayesian networks

Rafael Sojo, Pedro Larrañaga, Concha Bielza

Comments An earlier version was previously posted on SSRN. This version includes improvements in experiments and evaluation metrics following reviewer comments. Revision submitted to Knowledge-Based Systems

2604.00901 2026-04-06 cs.AI

Experience as a Compass: Multi-agent RAG with Evolving Orchestration and Agent Prompts

Sha Li, Naren Ramakrishnan

2604.00799 2026-04-06 cs.CV cs.CL cs.LG

Multimodal Language Models Cannot Spot Spatial Inconsistencies

Om Khangaonkar, Hadi J. Rad, Hamed Pirsiavash

2603.29512 2026-04-06 cs.RO cs.SY eess.SY

Communication Outage-Resistant UUV State Estimation: A Variational History Distillation Approach

Shuyue Li, Miguel López-Benítez, Eng Gee Lim, Fei Ma, Qian Dong, Mengze Cao, Limin Yu, Xiaohui Qin

Comments 7 pages, 2 figures. Accepted for publication in 2026 IEEE/OES OCEANS Sanya. \c{opyright} 2026 IEEE. Personal use of this material is permitted. See PDF for the full IEEE copyright notice

2603.29093 2026-04-06 cs.CL cs.AI cs.IR

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

Pratyay Banerjee, Masud Moshtaghi, Ankit Chadha

Comments 17 pages, 13 figures

2603.28225 2026-04-06 cs.LG

Detecting the Unexpected: AI-Driven Anomaly Detection in Smart Bridge Monitoring

Rahul Jaiswal, Joakim Hellum, Halvor Heiberg

Comments 6 pages, 14 figures