2605.11405
2026-05-14
cs.LG
20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone
DatologyAI, :, Siddharth Joshi, Haoli Yin, Rishabh Adiga, Haakon Mongstad, Alvin Deng, Aldo Carranza, Alex Fang, Amro Abbas, Anshuman Suri, Brett Larsen, Daniel Zayas, Darren Teh, David Schwab, Diego Kiner, Fan Pan, Jack Urbanek, Jason Lee, Jason Telanoff, Josh Wills, Kaleigh Mentzer, Luke Merrick, Maximilian Böther, Parth Doshi, Paul Burstein, Pratyush Maini, Ties Robroek, Tony Jiang, Vidhi Jain, Vineeth Dorna, Zhengping Wang, Bogdan Gaza, Ari Morcos, Matthew Leavitt
AI总结
该研究探讨了仅通过数据筛选能否提升视觉语言模型(VLM)的性能,并在固定模型架构、训练策略和计算资源的前提下,对MAmmoTH-VL数据集进行筛选,显著提升了模型在多个公开基准和能力维度上的表现。实验表明,筛选后的20亿参数模型在多项指标上超越了现有模型,且在可靠性、泛化能力、行为表现和推理效率等方面均有明显优势,展示了数据筛选作为构建高效VLM的高杠杆工具的潜力。