MMClima: A Framework for Multimodal Climate Science Data and Evaluation
MMClima:多模态气候科学数据与评估框架
Muhammad Umer Sheikh, Hassan Abid, Khawar Shehzad, Ufaq Khan, Muhammad Haris Khan
AI总结 提出MMClima,一个包含10万+专家验证问答对的多模态气候问答框架,覆盖文本、视频和图表,用于评估多模态语言模型在气候科学中的表现。
详情
气候变化研究日益需要能够推理文本、动态视觉内容和科学图表的AI系统,然而现有的气候问答基准规模小、大多为文本,且覆盖模型范围狭窄。我们提出MMClima,一个大规模多模态气候问答框架,包含10万+专家验证的问答对,涵盖五个核心气候科学领域的文章、视频转录和图表。MMClima通过自动化的声明提取和问答合成构建,并采用人在回路验证以确保规模和可靠性。利用MMClima,我们在需要事实回忆、视觉解释和跨模态合成的任务上对最先进的多模态语言模型进行基准测试。此外,我们在文本分割上进行微调,得到mmclima-70b-txt,一个领域适应的基线模型,在文本问答上优于强大的开源和闭源模型。我们发布数据集、评估流程、微调模型权重和数据创建框架,以支持气候科学的标准多模态评估。
Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climate QA benchmarks are small, mostly textual, and cover a narrow range of models. We introduce MMClima, a large-scale multimodal climate question answering framework with 104k+ expert-validated question-answer pairs spanning articles, video transcriptions, and figures across five core climate science domains. MMClima is constructed via automated claim extraction and QA synthesis with human-in-the-loop validation to ensure both scale and reliability. Using MMClima, we benchmark state-of-the-art multimodal language models on tasks requiring factual recall, visual interpretation, and cross-modal synthesis. We additionally fine-tune on the textual split to produce mmclima-70b-txt, a domain-adapted baseline that outperforms strong open- and closed-source models on textual QA. We release the dataset, evaluation pipeline, fine-tuned model weights, and data creation framework to support standardized multimodal evaluation for climate science.