arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26015 2026-03-30 cs.CV cs.AI

VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation

Rakib Hossain Sajib, Md Kishor Morol, Rajan Das Gupta, Mohammad Sakib Mahmood, Shuvra Smaran Das

详情

英文摘要

Human age estimation from facial images represents a challenging computer vision task with significant applications in biometrics, healthcare, and human-computer interaction. While traditional deep learning approaches require extensive labeled datasets and domain-specific training, recent advances in large vision-language models (LVLMs) offer the potential for zero-shot age estimation. This study presents a comprehensive zero-shot evaluation of state-of-the-art Large Vision-Language Models (LVLMs) for facial age estimation, a task traditionally dominated by domain-specific convolutional networks and supervised learning. We assess the performance of GPT-4o, Claude 3.5 Sonnet, and LLaMA 3.2 Vision on two benchmark datasets, UTKFace and FG-NET, without any fine-tuning or task-specific adaptation. Using eight evaluation metrics, including MAE, MSE, RMSE, MAPE, MBE, $R^2$, CCC, and $\pm$5-year accuracy, we demonstrate that general-purpose LVLMs can deliver competitive performance in zero-shot settings. Our findings highlight the emergent capabilities of LVLMs for accurate biometric age estimation and position these models as promising tools for real-world applications. Additionally, we highlight performance disparities linked to image quality and demographic subgroups, underscoring the need for fairness-aware multimodal inference. This work introduces a reproducible benchmark and positions LVLMs as promising tools for real-world applications in forensic science, healthcare monitoring, and human-computer interaction. The benchmark focuses on strict zero-shot inference without fine-tuning and highlights remaining challenges related to prompt sensitivity, interpretability, computational cost, and demographic fairness.

URL PDF HTML ☆

赞 0 踩 0

2603.26013 2026-03-30 cs.CL

Toward Culturally Grounded Natural Language Processing

Sina Bagheri Nezhad

2603.26008 2026-03-30 cs.CV cs.AI

FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants

Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong

Comments Accepted to CVPR 2026

2603.26005 2026-03-30 cs.AI

AutoB2G: A Large Language Model-Driven Agentic Framework For Automated Building-Grid Co-Simulation

Borui Zhang, Nariman Mahdavi, Subbu Sethuvenkatraman, Shuang Ao, Flora Salim

2603.25994 2026-03-30 cs.CV cs.CR

Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models

Zhuan Shi, Alireza Dehghanpour Farashah, Rik de Vries, Golnoosh Farnadi

Comments Accepted by CVPR 2026 main

2603.25993 2026-03-30 cs.CV

FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation

Changyang Li, Xueqing Huang, Shin-Fang Chng, Huangying Zhan, Qingan Yan, Yi Xu

2603.25985 2026-03-30 cs.CV

JRM: Joint Reconstruction Model for Multiple Objects without Alignment

Qirui Wu, Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Richard Newcombe, Angel X. Chang, Jakob Engel, Henry Howard-Jenkins

2603.25981 2026-03-30 cs.RO cs.AI cs.CL

Policy-Guided World Model Planning for Language-Conditioned Visual Navigation

Amirhosein Chahe, Lifeng Zhou

2603.25977 2026-03-30 cs.CV

Diffusion MRI Transformer with a Diffusion Space Rotary Positional Embedding (D-RoPE)

Gustavo Chau Loo Kung, Mohammad Abbasi, Camila Blank, Juze Zhang, Alan Q. Wang, Sophie Ostmeier, Akshay Chaudhari, Kilian Pohl, Ehsan Adeli

2603.25976 2026-03-30 cs.LG

Second-Order, First-Class: A Composable Stack for Curvature-Aware Training

Mikalai Korbit, Mario Zanon

Comments 22 pages, 3 figures. Code available at https://github.com/cor3bit/somax

2603.25973 2026-03-30 cs.CL

MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

Weizhi Zhang, Xiaokai Wei, Wei-Chieh Huang, Zheng Hui, Chen Wang, Michelle Gong, Philip S. Yu

Comments Published as a workshop paper in Lifelong Agent @ ICLR 2026

2603.25963 2026-03-30 cs.CV

BEVMAPMATCH: Multimodal BEV Neural Map Matching for Robust Re-Localization of Autonomous Vehicles

Shounak Sural, Ragunathan Rajkumar

Comments 8 pages, 5 figures

2603.25960 2026-03-30 cs.CL cs.AI

When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models

Binesh Sadanandan, Vahid Behzadan

2603.25956 2026-03-30 cs.LG

Adversarial-Robust Multivariate Time-Series Anomaly Detection via Joint Information Retention

Hadi Hojjati, Narges Armanfard

Comments 22 pages, 4 figures

2603.25955 2026-03-30 cs.LG

EngineAD: A Real-World Vehicle Engine Anomaly Detection Dataset

Hadi Hojjati, Christopher Roth, Rory Woods, Ken Sills, Narges Armanfard

Comments 12 pages, 2 figures

2603.25954 2026-03-30 cs.LG math.OC

Online Learning for Dynamic Constellation Topologies

João Norberto, Ricardo Ferreira, Cláudia Soares

2603.25951 2026-03-30 cs.CV

Low-Rank-Modulated Functa: Exploring the Latent Space of Implicit Neural Representations for Interpretable Ultrasound Video Analysis

Julia Wolleb, Cristiana Baloescu, Alicia Durrer, Hemant D. Tagare, Xenophon Papademetris

2603.25946 2026-03-30 cs.CV cs.AI cs.LG

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

Alex Koran, Dimitrios Sinodinos, Hadi Hojjati, Takuya Nanri, Fangge Chen, Narges Armanfard

Comments 33 pages, 11 figures

2603.25944 2026-03-30 cs.CL cs.AI

Can Small Models Reason About Legal Documents? A Comparative Study

Snehit Vaddi

Comments 17 pages, 9 models, 5 prompting strategies, 3 legal benchmarks, 405 experiments

2603.25942 2026-03-30 cs.CV cs.AI

Reinforcing Structured Chain-of-Thought for Video Understanding

Peiyao Wang, Haotian Xu, Noranart Vesdapunt, Rui Hou, Jingyi Zhang, Haibin Ling, Oleksandr Obiednikov, Ning Zhou, Kah Kuen Fu

Comments Accepted to CVPR 2026 (Main Conference)

2603.25935 2026-03-30 cs.CV cs.AI

DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification

Shah Saood, Saddam Hussain Khan

Comments 30 Pages, 12 Figures, 3 Tables

2603.25931 2026-03-30 cs.CV cs.AI

DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation

Abolfazl Meyarian, Amin Karimi Monsefi, Rajiv Ramnath, Ser-Nam Lim

2603.25926 2026-03-30 cs.CL

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Yijiong Yu, Shuai Yuan, Jie Zheng, Huazheng Wang, Ji Pei

2603.25925 2026-03-30 cs.LG cs.CY

Personalizing Mathematical Game-based Learning for Children: A Preliminary Study

Jie Gao, Adam K. Dubé

Comments Short research paper accepted at 27th International Conference on AI in Education (AIED 2026)

2603.25923 2026-03-30 cs.LG

Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework

Yixin Zhou, Zhixiang Liu, Vladimir I. Zadorozhny, Jonathan Elmer

Comments 9 pages, 2 figures. Preliminary version

2603.25916 2026-03-30 cs.LG stat.ML

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Alberto Rumi, Andrew Jacobsen, Nicolò Cesa-Bianchi, Fabio Vitale

Comments 10 pages. v1: AISTATS 2026

2603.25906 2026-03-30 cs.CV

Shared Representation for 3D Pose Estimation, Action Classification, and Progress Prediction from Tactile Signals

Isaac Han, Seoyoung Lee, Sangyeon Park, Ecehan Akan, Yiyue Luo, Joseph DelPreto, Kyung-Joong Kim

2603.25902 2026-03-30 cs.RO

Chasing Autonomy: Dynamic Retargeting and Control Guided RL for Performant and Controllable Humanoid Running

Zachary Olkin, William D. Compton, Ryan M. Bena, Aaron D. Ames

Comments This work has been submitted to the IEEE for possible publication

2603.25901 2026-03-30 cs.LG cs.AI cs.CV

Decoding Defensive Coverage Responsibilities in American Football Using Factorized Attention Based Transformer Models

Kevin Song, Evan Diewald, Ornob Siddiquee, Chris Boomhower, Keegan Abdoo, Mike Band, Amy Lee

Comments 19 pages, 8 figures, ISACE 2026

2603.25894 2026-03-30 cs.LG

Data-Driven Plasticity Modeling via Acoustic Profiling

Khalid El-Awady