arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25227 2026-03-27 cs.CL

Comparing Natural and Synthetic Structured Data: A Study of the Passive Verb Alternation in French and Italian

Giuseppe Samo, Paola Merlo

Comments 13 pages, 8 figures, paper accepted at the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

详情

英文摘要

This study compares the impact of natural and synthetic data on training and evaluating large language models (LLMs), using the case of passive verb alternation in French and Italian. We use Blackbird Language Matrices (BLMs), structured datasets designed to probe linguistic knowledge of underlying patterns across sentence sets. We compare structured templates instantiated with natural sentences extracted from Universal Dependencies to structured templates of synthetic sentences. Experiments show that while models achieve ceiling performance when trained and tested on synthetic datasets, they do not reliably generalize to natural sentences. In contrast, models trained on natural data exhibit robust performance across both natural and synthetic test suites, demonstrating their superior ability to capture abstract linguistic patterns. These results corroborate the value of natural data and of structured set ups in linguistic evaluation for probing LLMs' syntactic and semantic knowledge.

URL PDF HTML ☆

赞 0 踩 0

2603.25222 2026-03-27 cs.CL cs.LG

Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages

Danlu Chen, Ka Sing He, Jiahe Tian, Chenghao Xiao, Zhaofeng Wu, Taylor Berg-Kirkpatrick, Freda Shi

2603.25221 2026-03-27 cs.LG math.OC

Gap Safe Screening Rules for Fast Training of Robust Support Vector Machines under Feature Noise

Tan-Hau Nguyen, Thu-Le Tran, Kien Trung Nguyen

Comments 19 pages

2603.25218 2026-03-27 cs.CV

SDD-YOLO: A Small-Target Detection Framework for Ground-to-Air Anti-UAV Surveillance with Edge-Efficient Deployment

Pengyu Chen, Haotian Sa, Yiwei Hu, Yuhan Cheng, Junbo Wang

2603.25209 2026-03-27 cs.CV cs.AI

Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction

Jiahao Tian, Chenxi Song, Wei Cheng, Chi Zhang

Comments Accepted to CVPR 2026. Code: https://github.com/Westlake-AGI-Lab/FreeLOC

2603.25204 2026-03-27 cs.LG

A CDF-First Framework for Free-Form Density Estimation

Chenglong Song, Mazharul Islam, Lin Wang, Bing Chen, Bo Yang

2603.25203 2026-03-27 cs.CV cs.CL

Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection

Ruichao Yang, Wei Gao, Xiaobin Zhu, Jing Ma, Hongzhan Lin, Ziyang Luo, Bo-Wen Zhang, Xu-Cheng Yin

Comments Accepted by CVPR 2026

2603.25202 2026-03-27 cs.CV cs.MM

CIV-DG: Conditional Instrumental Variables for Domain Generalization in Medical Imaging

Shaojin Bai, Yuting Su, Weizhi Nie

Comments 10 pages, 2 figures

2603.25201 2026-03-27 cs.CL cs.CY

SafeMath: Inference-time Safety improves Math Accuracy

Sagnik Basu, Subhrajit Mitra, Aman Juneja, Somnath Banerjee, Rima Hazra, Animesh Mukherjee

Comments Submitted in ARR March 2026

2603.25199 2026-03-27 cs.CV

TacSIm: A Dataset and Benchmark for Football Tactical Style Imitation

Peng Wen, Yuting Wang, Qiurui Wang

Comments Accepted to CVPR 2026

2603.25196 2026-03-27 cs.CL cs.AI

A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations

Andong Tan, Shuyu Dai, Jinglu Wang, Fengtao Zhou, Yan Lu, Xi Wang, Yingcong Chen, Can Yang, Shujie Liu, Hao Chen

详情

英文摘要

Clinical practice guidelines (CPGs) play a pivotal role in ensuring evidence-based decision-making and improving patient outcomes. While Large Language Models (LLMs) are increasingly deployed in healthcare scenarios, it is unclear to which extend LLMs could identify and adhere to CPGs during conversations. To address this gap, we introduce CPGBench, an automated framework benchmarking the clinical guideline detection and adherence capabilities of LLMs in multi-turn conversations. We collect 3,418 CPG documents from 9 countries/regions and 2 international organizations published in the last decade spanning across 24 specialties. From these documents, we extract 32,155 clinical recommendations with corresponding publication institute, date, country, specialty, recommendation strength, evidence level, etc. One multi-turn conversation is generated for each recommendation accordingly to evaluate the detection and adherence capabilities of 8 leading LLMs. We find that the 71.1%-89.6% recommendations can be correctly detected, while only 3.6%-29.7% corresponding titles can be correctly referenced, revealing the gap between knowing the guideline contents and where they come from. The adherence rates range from 21.8% to 63.2% in different models, indicating a large gap between knowing the guidelines and being able to apply them. To confirm the validity of our automatic analysis, we further conduct a comprehensive human evaluation involving 56 clinicians from different specialties. To our knowledge, CPGBench is the first benchmark systematically revealing which clinical recommendations LLMs fail to detect or adhere to during conversations. Given that each clinical recommendation may affect a large population and that clinical applications are inherently safety critical, addressing these gaps is crucial for the safe and responsible deployment of LLMs in real world clinical practice.

URL PDF HTML ☆

赞 0 踩 0

2603.25194 2026-03-27 cs.CV

CardioDiT: Latent Diffusion Transformers for 4D Cardiac MRI Synthesis

Marvin Seyfarth, Sarah Kaye Müller, Arman Ghanaat, Isabelle Ayx, Fabian Fastenrath, Philipp Wild, Alexander Hertel, Theano Papavassiliu, Salman Ul Hassan Dar, Sandy Engelhardt

2603.25189 2026-03-27 cs.CL

A Catalog of Basque Dialectal Resources: Online Collections and Standard-to-Dialectal Adaptations

Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri

2603.25188 2026-03-27 cs.CV

AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References

Jiahao Wang, Hualian Sheng, Sijia Cai, Yuxiao Yang, Weizhan Zhang, Caixia Yan, Bing Deng, Jieping Ye

2603.25187 2026-03-27 cs.CL cs.AI

Probing the Lack of Stable Internal Beliefs in LLMs

Yifan Luo, Kangping Xu, Yanzhen Lu, Yang Yuan, Andrew Chi-Chih Yao

Comments Accepted by NeurIPS 2025 Workshop Mexico City PersonaNLP

2603.25186 2026-03-27 cs.LG

Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation

Adam Jakobsen, Sushant Gautam, Hugo Lewi Hammer, Susanne Olofsdotter, Miriam S Johanson, Pål Halvorsen, Vajira Thambawita

Comments Submitted to CBMS 2026

2603.25183 2026-03-27 cs.CL

Cross-Preference Learning for Sentence-Level and Context-Aware Machine Translation

Ying Li, Xinglin Lyu, Junhui Li, Jinlong Yang, Hengchao Shang, Min Zhang, Shimin Tao, Daimeng Wei

2603.25181 2026-03-27 cs.CV

VolDiT: Controllable Volumetric Medical Image Synthesis with Diffusion Transformers

Marvin Seyfarth, Salman Ul Hassan Dar, Yannik Frisch, Philipp Wild, Norbert Frey, Florian André, Sandy Engelhardt

2603.25178 2026-03-27 cs.CV cs.CL

Bilingual Text-to-Motion Generation: A New Benchmark and Baselines

Wanjiang Weng, Xiaofeng Tan, Xiangbo Shu, Guo-Sen Xie, Pan Zhou, Hongsong Wang

Comments 11 pages, 7 figures

2603.25176 2026-03-27 cs.CL

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Hieu Xuan Le, Benjamin Goh, Quy Anh Tang

Comments 16 pages, 3 figures

2603.25175 2026-03-27 cs.CV

AG-EgoPose: Leveraging Action-Guided Motion and Kinematic Joint Encoding for Egocentric 3D Pose Estimation

Md Mushfiqur Azam, John Quarles, Kevin Desai

2603.25170 2026-03-27 cs.CV cs.AI

Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Shiji Zhao, Shukun Xiong, Maoxun Yuan, Yao Huang, Ranjie Duan, Qing Guo, Jiansheng Chen, Haibin Duan, Xingxing Wei

Comments Accepted for publication in the International Journal of Computer Vision (IJCV)

2603.25169 2026-03-27 cs.CL

To Write or to Automate Linguistic Prompts, That Is the Question

Marina Sánchez-Torrón, Daria Akselrod, Jason Rauchwerk

Comments 10 pages, to be submitted for EAMT 2026

2603.25168 2026-03-27 cs.CV

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis

Xike Zhang, Maoyuan Ye, Juhua Liu, Bo Du

Comments 20 pages, 8 figures, 8 tables. Submitted to ECCV 2026

2603.25163 2026-03-27 cs.CV

SportSkills: Physical Skill Learning from Sports Instructional Videos

Kumar Ashutosh, Chi Hsuan Wu, Kristen Grauman

Comments Technical report

2603.25159 2026-03-27 cs.CV

A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection

SuYeon Kim, Wongyu Lee, MyeongAh Cho

Comments Accepted by CVPR 2026

2603.25155 2026-03-27 cs.CV cs.AI

Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

Chengyu Fang, Heng Guo, Zheng Jiang, Chunming He, Xiu Li, Minfeng Xu

Comments Accepted by ICLR 2026

2603.25150 2026-03-27 cs.CL cs.AI cs.HC cs.LG

Goodness-of-pronunciation without phoneme time alignment

Jeremy H. M. Wong, Nancy F. Chen

2603.25145 2026-03-27 cs.CV cs.LG

Learning to Rank Caption Chains for Video-Text Alignment

Ansel Blume, Burak Uzkent, Shalini Chaudhuri, Garin Kessler

2603.25144 2026-03-27 cs.CV cs.AI

FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

Hongxu Ma, Guang Li, Shijie Wang, Dongzhan Zhou, Baoli Sun, Takahiro Ogawa, Miki Haseyama, Zhihui Wang