arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24594 2026-03-26 cs.LG cs.NA math.NA stat.ML

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

Arthur Jacot

详情

英文摘要

We introduce the Multilevel Euler-Maruyama (ML-EM) method compute solutions of SDEs and ODEs using a range of approximators $f^1,\dots,f^k$ to the drift $f$ with increasing accuracy and computational cost, only requiring a few evaluations of the most accurate $f^k$ and many evaluations of the less costly $f^1,\dots,f^{k-1}$. If the drift lies in the so-called Harder than Monte Carlo (HTMC) regime, i.e. it requires $ε^{-γ}$ compute to be $ε$-approximated for some $γ>2$, then ML-EM $ε$-approximates the solution of the SDE with $ε^{-γ}$ compute, improving over the traditional EM rate of $ε^{-γ-1}$. In other terms it allows us to solve the SDE at the same cost as a single evaluation of the drift. In the context of diffusion models, the different levels $f^{1},\dots,f^{k}$ are obtained by training UNets of increasing sizes, and ML-EM allows us to perform sampling with the equivalent of a single evaluation of the largest UNet. Our numerical experiments confirm our theory: we obtain up to fourfold speedups for image generation on the CelebA dataset downscaled to 64x64, where we measure a $γ\approx2.5$. Given that this is a polynomial speedup, we expect even stronger speedups in practical applications which involve orders of magnitude larger networks.

URL PDF HTML ☆

赞 0 踩 0

2603.24584 2026-03-26 cs.CV cs.RO

TAG: Target-Agnostic Guidance for Stable Object-Centric Inference in Vision-Language-Action Models

Jiaying Zhou, Zhihao Zhan, Ruifeng Zhai, Qinhan Lyu, Hao Liu, Keze Wang, Liang Lin, Guangrun Wang

2603.24582 2026-03-26 cs.AI

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya

Comments 22 pages, 5 figures, submitted to Engineering Applications of Artificial Intelligence

2603.24581 2026-03-26 cs.CV cs.RO

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, Pengxuan Yang, Yihang Dong, Ce Hao, Xiaoqing Ye, Junyu han, Yifeng Pan, Dongbin Zhao

2603.24580 2026-03-26 cs.CL cs.AI cs.CY cs.IR cs.LG

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, Tunazzina Islam

2603.24579 2026-03-26 cs.CL

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie Hu, Yu Qin, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

2603.24578 2026-03-26 cs.CV eess.IV

Vision-Language Models vs Human: Perceptual Image Quality Assessment

Imran Mehmood, Imad Ali Shah, Ming Ronnier Luo, Brian Deegan

2603.24575 2026-03-26 cs.CV cs.AI

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

2603.24572 2026-03-26 cs.AI

Completeness of Unbounded Best-First Minimax and Descent Minimax

Quentin Cohen-Solal

2603.24571 2026-03-26 cs.CV

Towards Training-Free Scene Text Editing

Yubo Li, Xugong Qin, Peng Zhang, Hailun Lin, Gangyan Zeng, Kexin Zhang

Comments Accepted by CVPR 2026

2603.24570 2026-03-26 cs.CV cs.AI

Anti-I2V: Safeguarding your photos from malicious image-to-video generation

Duc Vu, Anh Nguyen, Chi Tran, Anh Tran

Comments Accepted to CVPR 2026 (Main Conference)

2603.24558 2026-03-26 cs.CV cs.AI

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan

Comments To be published in CVPR 2026

2603.24552 2026-03-26 cs.CV

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mirela Tulbure, Patrick Hostert, Stefan Erasmi

详情

英文摘要

Organic farming is a key element in achieving more sustainable agriculture. For a better understanding of the development and impact of organic farming, comprehensive, spatially explicit information is needed. This study presents an approach for the discrimination of organic and conventional farming systems using intra-annual Sentinel-2 time series. In addition, it examines two factors influencing this discrimination: the joint learning of crop type information in a concurrent task and the role of spatial context. A Vision Transformer model based on the Temporo-Spatial Vision Transformer (TSViT) architecture was used to construct a classification model for the two farming systems. The model was extended for simultaneous learning of the crop type, creating a multitask learning setting. By varying the patch size presented to the model, we tested the influence of spatial context on the classification accuracy of both tasks. We show that discrimination between organic and conventional farming systems using multispectral remote sensing data is feasible. However, classification performance varies substantially across crop types. For several crops, such as winter rye, winter wheat, and winter oat, F1 scores of 0.8 or higher can be achieved. In contrast, other agricultural land use classes, such as permanent grassland, orchards, grapevines, and hops, cannot be reliably distinguished, with F1 scores for the organic management class of 0.4 or lower. Joint learning of farming system and crop type provides only limited additional benefits over single-task learning. In contrast, incorporating wider spatial context improves the performance of both farming system and crop type classification. Overall, we demonstrate that a classification of agricultural farming systems is possible in a diverse agricultural region using multispectral remote sensing data.

URL PDF HTML ☆

赞 0 踩 0

2603.24549 2026-03-26 cs.CL cs.AI cs.CV cs.SD

A Sociolinguistic Analysis of Automatic Speech Recognition Bias in Newcastle English

Dana Serditova, Kevin Tang

Comments 54 pages, 11 figures

2603.24541 2026-03-26 cs.CV cs.AI

SEGAR: Selective Enhancement for Generative Augmented Reality

Fanjun Bu, Chenyang Yuan, Hiroshi Yasuda

2603.24539 2026-03-26 cs.CV cs.AI

CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition

Florian Stilz, Vinkle Srivastav, Nassir Navab, Nicolas Padoy

2603.24535 2026-03-26 cs.CL cs.CY

Representation Learning to Study Temporal Dynamics in Tutorial Scaffolding

Conrad Borchers, Jiayi Zhang, Ashish Gurung

Comments Accepted as short paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

2603.24533 2026-03-26 cs.LG cs.AI cs.CV

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye, Jie Jiang

Comments Code and models are available at https://github.com/ui-voyager/UI-Voyager

2603.24528 2026-03-26 cs.CV

Cross-Modal Prototype Alignment and Mixing for Training-Free Few-Shot Classification

Dipam Goswami, Simone Magistri, Gido M. van de Ven, Bartłomiej Twardowski, Andrew D. Bagdanov, Tinne Tuytelaars, Joost van de Weijer

Comments Preprint

2603.24527 2026-03-26 cs.AI

From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference

Shalender Singh, Vishnu Priya Singh Parmar

Comments 46 pages

详情

英文摘要

We introduce incongruent normal form (INF), a structural representation for self-referential semantic sentences. An INF replaces a self-referential sentence with a finite family of non-self-referential sentences that are individually satisfiable but not jointly satisfiable. This transformation isolates the semantic obstruction created by self-reference while preserving classical semantics locally and is accompanied by correctness theorems characterizing when global inconsistency arises from locally compatible commitments. We then study the role of incongruence as a structural source of semantic informativeness. Using a minimal model-theoretic notion of informativeness-understood as the ability of sentences to distinguish among admissible models-we show that semantic completeness precludes informativeness, while incongruence preserves it. Moreover, incongruence is not confined to paradoxical constructions: any consistent incomplete first-order theory admits finite incongruent families arising from incompatible complete extensions. In this sense, incompleteness manifests structurally as locally realizable but globally incompatible semantic commitments, providing a minimal formal basis for semantic knowledge. Finally, we introduce a quantitative semantic framework. In a canonical finite semantic-state setting, we model semantic commitments as Boolean functions and define a Fourier-analytic notion of semantic energy based on total influence. We derive uncertainty-style bounds relating semantic determinacy, informativeness, and spectral simplicity, and establish a matrix inequality bounding aggregate semantic variance by total semantic energy. These results show quantitatively that semantic informativeness cannot collapse into a single determinate state without unbounded energy cost, identifying incongruence as a fundamental structural and quantitative feature of semantic representation.

URL PDF HTML ☆

赞 0 踩 0

2603.24524 2026-03-26 cs.LG cs.AI

No Single Metric Tells the Whole Story: A Multi-Dimensional Evaluation Framework for Uncertainty Attributions

Emily Schiller, Teodor Chiaburu, Marco Zullich, Luca Longo

Comments Accepted at the Fourth World Conference on Explainable Artificial Intelligence, xAI 2026, Fortaleza, Brazil, July 1-3, 2026

2603.24518 2026-03-26 cs.LG

TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models

Yushi Guan, Jeanine Ohene-Agyei, Daniel Kwan, Jean Sebastien Dandurand, Yifei Zhang, Nandita Vijaykumar

2603.24517 2026-03-26 cs.LG

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi

2603.24503 2026-03-26 cs.LG cs.RO cs.SY eess.SY

Towards Safe Learning-Based Non-Linear Model Predictive Control through Recurrent Neural Network Modeling

Mihaela-Larisa Clement, Mónika Farsang, Agnes Poks, Johannes Edelmann, Manfred Plöchl, Radu Grosu, Ezio Bartocci

2603.24500 2026-03-26 cs.LG physics.flu-dyn

Project and Generate: Divergence-Free Neural Operators for Incompressible Flows

Xigui Li, Hongwei Zhang, Ruoxi Jiang, Deshu Chen, Chensen Lin, Limei Han, Yuan Qi, Xin Guo, Yuan Cheng

2603.24493 2026-03-26 cs.LG math.ST stat.TH

Uniform Laws of Large Numbers in Product Spaces

Ron Holzman, Shay Moran, Alexander Shlimovich

2603.24484 2026-03-26 cs.CV

Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models

Siqi Liu, Xinyang Li, Bochao Zou, Junbao Zhuo, Huimin Ma, Jiansheng Chen

Comments 20 pages, 7 figures, accepted at CVPR 2026, project page: see https://founce.github.io/VisionToM

2603.24475 2026-03-26 cs.LG cs.SY eess.SY

Conformalized Transfer Learning for Li-ion Battery State of Health Forecasting under Manufacturing and Usage Variability

Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Marcello Canova

Comments Submitted to the 2026 American Control Conference (ACC)

2603.24465 2026-03-26 cs.CL

Mechanic: Sorrifier-Driven Formal Decomposition Workflow for Automated Theorem Proving

Ruichen Qiu, Yichuan Cao, Junqi Liu, Dakai Guo, Xiao-Shan Gao, Lihong Zhi, Ruyong Feng

2603.24461 2026-03-26 cs.RO

Design, Modelling and Characterisation of a Miniature Fibre-Reinforced Soft Bending Actuator for Endoluminal Interventions

Xiangyi Tan, Aoife McDonald-Bowyer, Danail Stoyanov, Agostino Stilli