It's Time to Get It Right: Improving Analog Clock Reading and Clock-Hand Spatial Reasoning in Vision-Language Models
是时候正确了:提升视觉语言模型中的模拟时钟读取和指针空间推理能力
AI总结 针对视觉语言模型在真实环境中读取模拟时钟的挑战,提出TickTockVQA数据集和Swap-DPO微调框架,显著提升时钟读取准确性和鲁棒性。
Comments Accepted to CVPR 2026 Findings