arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25892 2026-03-30 cs.CV

THFM: A Unified Video Foundation Model for 4D Human Perception and Beyond

Letian Wang, Andrei Zanfir, Eduard Gabriel Bazavan, Misha Andriluka, Cristian Sminchisescu

详情

英文摘要

We present THFM, a unified video foundation model for human-centric perception that jointly addresses dense tasks (depth, normals, segmentation, dense pose) and sparse tasks (2d/3d keypoint estimation) within a single architecture. THFM is derived from a pretrained text-to-video diffusion model, repurposed as a single-forward-pass perception model and augmented with learnable tokens for sparse predictions. Modulated by the text prompt, our single unified model is capable of performing various perception tasks. Crucially, our model is on-par or surpassing state-of-the-art specialized models on a variety of benchmarks despite being trained exclusively on synthetic data (i.e.~without training on real-world or benchmark specific data). We further highlight intriguing emergent properties of our model, which we attribute to the underlying diffusion-based video representation. For example, our model trained on videos with a single human in the scene generalizes to multiple humans and other object classes such as anthropomorphic characters and animals -- a capability that hasn't been demonstrated in the past.

URL PDF HTML ☆

赞 0 踩 0

2603.25891 2026-03-30 cs.CV

Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods

Ofer Idan, Vladi Vexler, Gil Lederman, Dima Sivov, Aviad Cohen Zada, Shir Niego Komforti

2603.25889 2026-03-30 cs.CV

Polarization-Based Eye Tracking with Personalized Siamese Architectures

Beyza Kalkanli, Tom Bu, Mahsa Shakeri, Alexander Fix, Dave Stronks, Dmitri Model, Mantas Žurauskas

Comments Accepted to ETRA 2026 as full paper

2603.25887 2026-03-30 cs.CV

World Reasoning Arena

PAN Team, Qiyue Gao, Kun Zhou, Jiannan Xiang, Zihan Liu, Dequan Yang, Junrong Chen, Arif Ahmad, Cong Zeng, Ganesh Bannur, Xinqi Huang, Zheqi Liu, Yi Gu, Yichi Yang, Guangyi Liu, Zhiting Hu, Zhengzhong Liu, Eric Xing

2603.25886 2026-03-30 cs.CV

Automated Quality Assessment of Blind Sweep Obstetric Ultrasound for Improved Diagnosis

Prasiddha Bhandari, Kanchan Poudel, Nishant Luitel, Bishram Acharya, Angelina Ghimire, Tyler Wellman, Kilian Koepsell, Pradeep Raj Regmi, Bishesh Khanal

2603.25872 2026-03-30 cs.LG

DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease

Runsheng Bai, Chengyu Zhang, Yangdong Deng

2603.25870 2026-03-30 cs.CV cs.LG

Speech-Synchronized Whiteboard Generation via VLM-Driven Structured Drawing Representations

Suraj Prasad, Pinak Mahapatra

2603.25867 2026-03-30 cs.CV

Seeing Through Smoke: Surgical Desmoking for Improved Visual Perception

Jingpei Lu, Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Omid Mohareri

Comments 8 pages, 4 figures, 3 tables

2603.25864 2026-03-30 cs.CV cs.AI cs.HC

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks

Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim

Comments Accepted at CVPR 2026

2603.25863 2026-03-30 cs.CV cs.AI

Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

Jasmine Moreira

Comments 6 pages, 10 figures, 1 table

2603.25862 2026-03-30 cs.CL cs.AI

Methods for Knowledge Graph Construction from Text Collections: Development and Applications

Vanni Zavarella

详情

英文摘要

Virtually every sector of society is experiencing a dramatic growth in the volume of unstructured textual data that is generated and published, from news and social media online interactions, through open access scholarly communications and observational data in the form of digital health records and online drug reviews. The volume and variety of data across all this range of domains has created both unprecedented opportunities and pressing challenges for extracting actionable knowledge for several application scenarios. However, the extraction of rich semantic knowledge demands the deployment of scalable and flexible automatic methods adaptable across text genres and schema specifications. Moreover, the full potential of these data can only be unlocked by coupling information extraction methods with Semantic Web techniques for the construction of full-fledged Knowledge Graphs, that are semantically transparent, explainable by design and interoperable. In this thesis, we experiment with the application of Natural Language Processing, Machine Learning and Generative AI methods, powered by Semantic Web best practices, to the automatic construction of Knowledge Graphs from large text corpora, in three use case applications: the analysis of the Digital Transformation discourse in the global news and social media platforms; the mapping and trend analysis of recent research in the Architecture, Engineering, Construction and Operations domain from a large corpus of publications; the generation of causal relation graphs of biomedical entities from electronic health records and patient-authored drug reviews. The contributions of this thesis to the research community are in terms of benchmark evaluation results, the design of customized algorithms and the creation of data resources in the form of Knowledge Graphs, together with data analysis results built on top of them.

URL PDF HTML ☆

赞 0 踩 0

2603.25861 2026-03-30 cs.LG cs.AI cs.CR

Why Safety Probes Catch Liars But Miss Fanatics

Kristiyan Haralambiev

Comments 18 pages, 4 figures, 14 tables

2603.25857 2026-03-30 cs.LG

In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts

Matthias Busch, Marius Tacke, Sviatlana V. Lamaka, Mikhail L. Zheludkevich, Christian J. Cyron, Christian Feiler, Roland C. Aydin

2603.25855 2026-03-30 cs.LG

Incorporating contextual information into KGWAS for interpretable GWAS discovery

Cheng Jiang, Brady Ryan, Megan Crow, Kipper Fletez-Brant, Kashish Doshi, Sandra Melo Carlos, Kexin Huang, Burkhard Hoeckendorf, Heming Yao, David Richmond

2603.25841 2026-03-30 cs.CV cs.AI

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

Trong Thang Pham, Hien Nguyen, Ngan Le

2603.25839 2026-03-30 cs.LG cs.AI

A Compression Perspective on Simplicity Bias

Tom Marty, Eric Elmoznino, Leo Gagnon, Tejas Kasetty, Mizu Nishikawa-Toomey, Sarthak Mittal, Guillaume Lajoie, Dhanya Sridhar

2603.25836 2026-03-30 cs.CL

Gradient-Informed Training for Low-Resource Multilingual Speech Translation

Ruiyan Sun, Satoshi Nakamura

2603.25834 2026-03-30 cs.RO

Massive Parallel Deep Reinforcement Learning for Active SLAM

Martín Arce Llobera, Julio A. Placed, Mariano De Paula, Pablo De Cristóforis

2603.25827 2026-03-30 cs.CV

Fus3D: Decoding Consolidated 3D Geometry from Feed-forward Geometry Transformer Latents

Laura Fink, Linus Franke, George Kopanas, Marc Stamminger, Peter Hedman

2603.25823 2026-03-30 cs.CV cs.AI

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Haonan Han, Jiancheng Huang, Xiaopeng Sun, Junyan He, Rui Yang, Jie Hu, Xiaojiang Peng, Lin Ma, Xiaoming Wei, Xiu Li

2603.25821 2026-03-30 cs.CL cs.AI cs.LG cs.MA

Doctorina MedBench: End-to-End Evaluation of Agent-Based Medical AI

Anna Kozlova, Stanislau Salavei, Pavel Satalkin, Hanna Plotnitskaya, Sergey Parfenyuk

2603.25819 2026-03-30 cs.CV

Geo$^\textbf{2}$: Geometry-Guided Cross-view Geo-Localization and Image Synthesis

Yancheng Zhang, Xiaohan Zhang, Guangyu Sun, Zonglin Lyu, Safwan Wshah, Chen Chen

2603.25813 2026-03-30 cs.LG cs.AI

MAGNET: Autonomous Expert Model Generation via Decentralized Autoresearch and BitNet Training

Yongwan Kim, Sungchul Park

Comments 20 pages, 4 figures, 8 tables

2603.25804 2026-03-30 cs.CL

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Jiajun Zhang, Yuying Li, Zhixun Li, Xingyu Guo, Jingzhuo Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Junfei Wu, Zilei Wang, Qiang Liu, Liang Wang

2603.25803 2026-03-30 cs.CV cs.LG

Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment

Spiros Baxevanakis, Platon Karageorgis, Ioannis Dravilas, Konrad Szewczyk

Comments Preprint. Submitted to Transactions on Machine Learning Research (TMLR). 26 pages, 17 figures

2603.25802 2026-03-30 cs.CV

LEMON: a foundation model for nuclear morphology in Computational Pathology

Loïc Chadoutaud, Alice Blondel, Hana Feki, Jacqueline Fontugne, Emmanuel Barillot, Thomas Walter

2603.25798 2026-03-30 cs.CV

End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution

Parniyan Farvardin, David Chapman

2603.25791 2026-03-30 cs.CV

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

Zikai Wang, Zhilu Zhang, Yiqing Wang, Hui Li, Wangmeng Zuo

Comments Accepted to CVPR 2026

2603.25779 2026-03-30 cs.LG cs.AI

Pure and Physics-Guided Deep Learning Solutions for Spatio-Temporal Groundwater Level Prediction at Arbitrary Locations

Matteo Salis, Gabriele Sartor, Rosa Meo, Stefano Ferraris, Abdourrahmane M. Atto

详情

英文摘要

Groundwater represents a key element of the water cycle, yet it exhibits intricate and context-dependent relationships that make its modeling a challenging task. Theory-based models have been the cornerstone of scientific understanding. However, their computational demands, simplifying assumptions, and calibration requirements limit their use. In recent years, data-driven models have emerged as powerful alternatives. In particular, deep learning has proven to be a leading approach for its design flexibility and ability to learn complex relationships. We proposed an attention-based pure deep learning model, named STAINet, to predict weekly groundwater levels at an arbitrary and variable number of locations, leveraging both spatially sparse groundwater measurements and spatially dense weather information. Then, to enhance the model's trustworthiness and generalization ability, we considered different physics-guided strategies to inject the groundwater flow equation into the model. Firstly, in the STAINet-IB, by introducing an inductive bias, we also estimated the governing equation components. Then, by adopting a learning bias strategy, we proposed the STAINet-ILB, trained with additional loss terms adding supervision on the estimated equation components. Lastly, we developed the STAINet-ILRB, leveraging the groundwater body recharge zone information estimated by domain experts. The STAINet-ILB performed the best, achieving overwhelming test performances in a rollout setting (median MAPE 0.16%, KGE 0.58). Furthermore, it predicted sensible equation components, providing insights into the model's physical soundness. Physics-guided approaches represent a promising opportunity to enhance both the generalization ability and the trustworthiness, thereby paving the way to a new generation of disruptive hybrid deep learning Earth system models.

URL PDF HTML ☆

赞 0 踩 0

2603.25778 2026-03-30 cs.CV

Focus-to-Perceive Representation Learning: A Cognition-Inspired Hierarchical Framework for Endoscopic Video Analysis

Yuan Zhang, Sihao Dou, Kai Hu, Shuhua Deng, Chunhong Cao, Fen Xiao, Xieping Gao

Comments Accepted to CVPR 2026