arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2507.15085 2026-03-24 cs.CV

OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities

Peirong Zhang, Haowei Xu, Jiaxin Zhang, Xuhan Zheng, Guitao Xu, Yuyi Zhang, Junle Liu, Zhenhua Yang, Wei Zhou, Lianwen Jin

详情

英文摘要

Improving visual text synthesis has long been a challenging and evolving frontier for image generation models. While recent state-of-the-art (SOTA) models have made remarkable strides in text generation capabilities, existing benchmarks inadequately assess their true performance due to narrow scope (scene text and posters only), isolated evaluation (T2I generation or editing separately), and insufficient difficulty (lacking challenging scenarios). To bridge this gap, we pioneer the unification of text-centric T2I generation, text editing, and OCR-related image-to-image translation to evaluate a model's holistic visual text synthesis abilities, i.e., OCR generative capabilities. Accordingly, we propose OCRGenBench, the most comprehensive benchmark to date for evaluating these abilities. OCRGenBench covers five common text categories and 33 OCR generative tasks, encompassing T2I generation, text editing, and other image-to-image OCR tasks (e.g., document dewarping and handwriting removal). The benchmark includes 1,060 human-annotated samples consisting of instruction-image-GT triplets, deliberately featuring high text density, diverse generation scales, varied aspect ratios, and bilingual content to capture real-world complexity. Furthermore, we introduce OCRGenScore, a unified metric integrating text accuracy, aesthetic quality, and instruction following. Extensive experiments on 19 cutting-edge generative models reveal that most score below 60/100. Our analysis exposes critical, previously overlooked limitations, including poor text localization, unintended content modifications, and failures with dense or small-scale text. We hope OCRGenBench establishes a robust standard to evaluate OCR generative capabilities, driving the evolution of reliable visual text synthesis. The benchmark and evaluation code are available at https://github.com/NiceRingNode/Awesome-Generative-Models-for-OCR.

URL PDF HTML ☆

赞 0 踩 0

2507.11474 2026-03-24 cs.CV

HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing

Pan Du, Mingqi Xu, Xiaozhi Zhu, Jian-xun Wang

Comments 64 pages, 9 figures, 6 supplementary figures

2506.19609 2026-03-24 cs.LG nlin.CD physics.comp-ph

Beyond Static Models: Hypernetworks for Adaptive and Generalizable Forecasting in Complex Parametric Dynamical Systems

Pantelis R. Vlachas, Konstantinos Vlachas, Eleni Chatzi

2506.11128 2026-03-24 cs.CL cs.AI

Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

Andrew Keenan Richardson, Ryan Othniel Kearns, Sean Moss, Vincent Wang-Mascianica, Philipp Koralus

2506.10634 2026-03-24 cs.CV cs.AI

Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models

Francisco Caetano, Christiaan Viviers, Peter H. N. De With, Fons van der Sommen

Comments AAAI 2026

2505.10347 2026-03-24 cs.LG cs.AI

Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning

Gabriel S. Gama, Valdir Grassi

2503.19740 2026-03-24 cs.CV

LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings

Chengan Che, Chao Wang, Tom Vercauteren, Sophia Tsoka, Luis C. Garcia-Peraza-Herrera

Comments Accepted at CVPR2026 main conference

2503.05721 2026-03-24 cs.CL

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

Marco Antonio Stranisci, Christian Hardmeier

2501.16562 2026-03-24 cs.LG stat.ME

C-HDNet: Hyperdimensional Computing for Causal Effect Estimation from Observational Data Under Network Interference

Abhishek Dalvi, Neil Ashtekar, Vasant Honavar

Comments Published at Social Network Analysis and Mining

2410.14038 2026-03-24 cs.LG

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Murilo L. da Luz, Telma W. de L. Soares, Luckeciano C. Melo

Comments Accepted at ICML 2025

2405.14108 2026-03-24 cs.LG cs.AI q-bio.BM q-bio.QM

Assessing the potential of deep learning for protein-ligand docking

Alex Morehead, Nabin Giri, Jian Liu, Pawan Neupane, Jianlin Cheng

Comments 55 pages, 2 tables, 37 figures. Results updated (in v1.1.0) after addressing primary-ligand scoring bug (in v1.0.0). Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench

详情

英文摘要

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of the latest docking and structure prediction methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to new proteins); (2) binding multiple (cofactor) ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for generalization to unknown pockets). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL methods for apo-to-holo protein-ligand docking and protein-ligand structure prediction using both primary ligand and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL co-folding methods generally outperform comparable conventional and DL docking baseline algorithms, yet popular methods such as AlphaFold 3 are still challenged by prediction targets with novel binding poses; (2) certain DL co-folding methods are highly sensitive to their input multiple sequence alignments, while others are not; and (3) DL methods struggle to strike a balance between structural accuracy and chemical specificity when predicting novel or multi-ligand protein targets. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

URL PDF HTML ☆

赞 0 踩 0

2404.12339 2026-03-24 cs.RO cs.CV

SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints

Spencer Carmichael, Rahul Agrawal, Ram Vasudevan, Katherine A. Skinner

Comments Expanded version with added appendix. Published in ICRA 2024. Project page: https://umautobots.github.io/spot

2402.00261 2026-03-24 cs.CV cs.LG

Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse Maps

Rebecca Pattichis, Marios S. Pattichis

Comments Accepted to IEEE's Southwest Symposium on Image Analysis and Interpretation 2024

2312.04140 2026-03-24 cs.CV eess.IV

Polarimetric Light Transport Analysis for Specular Inter-reflection

Ryota Maeda, Shinsaku Hiura

Comments Accepted to IEEE Transactions on Computational Imaging (TCI)

2303.10371 2026-03-24 cs.LG

Geometric Imbalance in Semi-Supervised Node Classification

Liang Yan, Shengzhong Zhang, Bisheng Li, Menglin Yang, Chen Yang, Min Zhou, Weiyang Ding, Yutong Xie, Zengfeng Huang

Comments Accepted by NeurIPS 2025

2603.22254 2026-03-24 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG physics.atm-clus physics.chem-ph

Characterizing High-Capacity Janus Aminobenzene-Graphene Anode for Sodium-Ion Batteries with Machine Learning

Claudia Islas-Vargas, L. Ricardo Montoya, Carlos A. Vital-José, Oliver T. Unke, Klaus-Robert Müller, Huziel E. Sauceda

Comments 8 pages, 5 figures, research article

2603.22252 2026-03-24 eess.AS cs.SD

SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation

Lucas H. Ueda, João G. T. Lima, Pedro R. Corrêa, Flávio O. Simões, Mário U. Neto, Paula D. P. Costa

Comments Submitted to Interspeech 2026

2603.22231 2026-03-24 cs.IR cs.AI cs.GT cs.LG

One Model, Two Markets: Bid-Aware Generative Recommendation

Yanchen Jiang, Zhe Feng, Christopher P. Mah, Aranyak Mehta, Di Wang

2603.22227 2026-03-24 cs.HC cs.AI cs.CL

Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research

David M. Markowitz

2603.22214 2026-03-24 cs.CR cs.AI cs.LG

Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models

Tom Biskupski, Stephan Kleber

2603.22195 2026-03-24 hep-th cs.AI cs.LG math.CO math.GR

CayleyPy-4: AI-Holography. Towards analogs of holographic string dualities for AI tasks

A. Chervov, F. Levkovich-Maslyuk, A. Smolensky, F. Khafizov, I. Kiselev, D. Melnikov, I. Koltsov, S. Kudashev, D. Shiltsov, M. Obozov, S. Krymskii, V. Kirova, E. V. Konstantinova, A. Soibelman, S. Galkin, L. Grunwald, A. Kotov, A. Alexandrov, S. Lytkin, D. Fedoriaka, A. Chevychelov, Z. Kogan, A. Natyrova, L. Cheldieva, O. Nikitina, S. Fironov, A. Vakhrushev, A. Lukyanenko, V. Ilin, D. Gorodkov, N. Bogachev, I. Gaiur, M. Zaitsev, F. Petrov, L. Petrov, T. Gaintseva, A. Gavrilova, M. N. Smirnov, N. Kalinin, A. Khan, K. Jung, H. Mousset, H. Isambert, O. Debeaupuis

Comments 20+120 pages

2603.22174 2026-03-24 cs.HC cs.RO

Feasibility of Augmented Reality-Guided Robotic Ultrasound with Cone-Beam CT Integration for Spine Procedures

Tianyu Song, Felix Pabst, Feng Li, Yordanka Velikova, Miruna-Alexandra Gafencu, Yuan Bi, Ulrich Eck, Nassir Navab

Comments 8 pages, 7 figures

2603.22160 2026-03-24 stat.AP cs.LG

Data Curation for Machine Learning Interatomic Potentials by Determinantal Point Processes

Joanna Zou, Youssef Marzouk

Comments Original publication at https://openreview.net/forum?id=PKGP7tg65A

2603.22152 2026-03-24 cs.HC cs.AI

More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice

Yuta Tsuchiya, Yukino Baba

Comments 21 pages, 12 figures, accepted to CHI 2026

2603.22146 2026-03-24 eess.SY cs.RO cs.SY

From Singleton Obstacles to Clutter: Translation Invariant Compositional Avoid Sets

Prashant Solanki, Jasper Van Beers, Coen De Visser

2603.22055 2026-03-24 cs.GR cs.RO

MineRobot: A Unified Framework for Kinematics Modeling and Solving of Underground Mining Robots in Virtual Environments

Shengzhe Hou, Xinming Lu, Tianyu Zhang, Changqing Yan, Xingli Zhang

2603.22050 2026-03-24 stat.ML cs.LG

MAGPI: Multifidelity-Augmented Gaussian Process Inputs for Surrogate Modeling from Scarce Data

Atticus Rex, Elizabeth Qian, David Peterson

2603.22008 2026-03-24 cs.IR cs.CL

On the Challenges and Opportunities of Learned Sparse Retrieval for Code

Simon Lupart, Maxime Louis, Thibault Formal, Hervé Déjean, Stéphane Clinchant

Comments 15 pages, 5 figures, 12 tables

2603.22006 2026-03-24 astro-ph.CO astro-ph.IM cs.LG stat.ME

A plug-and-play approach with fast uncertainty quantification for weak lensing mass mapping

Hubert Leterme, Andreas Tersenov, Jalal Fadili, Jean-Luc Starck

详情

英文摘要

Upcoming stage-IV surveys such as Euclid and Rubin will deliver vast amounts of high-precision data, opening new opportunities to constrain cosmological models with unprecedented accuracy. A key step in this process is the reconstruction of the dark matter distribution from noisy weak lensing shear measurements. Current deep learning-based mass mapping methods achieve high reconstruction accuracy, but either require retraining a model for each new observed sky region (limiting practicality) or rely on slow MCMC sampling. Efficient exploitation of future survey data therefore calls for a new method that is accurate, flexible, and fast at inference. In addition, uncertainty quantification with coverage guarantees is essential for reliable cosmological parameter estimation. We introduce PnPMass, a plug-and-play approach for weak lensing mass mapping. The algorithm produces point estimates by alternating between a gradient descent step with a carefully chosen data fidelity term, and a denoising step implemented with a single deep learning model trained on simulated data corrupted by Gaussian white noise. We also propose a fast, sampling-free uncertainty quantification scheme based on moment networks, with calibrated error bars obtained through conformal prediction to ensure coverage guarantees. Finally, we benchmark PnPMass against both model-driven and data-driven mass mapping techniques. PnPMass achieves performance close to that of state-of-the-art deep-learning methods while offering fast inference (converging in just a few iterations) and requiring only a single training phase, independently of the noise covariance of the observations. It therefore combines flexibility, efficiency, and reconstruction accuracy, while delivering tighter error bars than existing approaches, making it well suited for upcoming weak lensing surveys.

URL PDF HTML ☆

赞 0 踩 0

2603.21975 2026-03-24 cs.CR cs.AI cs.CL cs.LG

SecureBreak -- A dataset towards safe and secure models

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera