Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks
Earth-OneVision:将遥感多模态大语言模型扩展到更多传感器模态和任务
发表机构 * National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing (SBIIP), Beijing Institute of Technology(北京理工大学空间智能信息处理国家重点实验室) ; Aerospace Information Research Institute, Chinese Academy of Sciences(中国科学院空天信息创新研究院) ; Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences(中国科学院地理空间信息处理与应用系统技术重点实验室) ; Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology(北京理工大学前沿交叉科学研究院) ; School of Mechatronical Engineering, Beijing Institute of Technology(北京理工大学机电学院) ; School of Earth and Space Sciences, Peking University(北京大学地球与空间科学学院) ; School of Electronics, Peking University(北京大学电子学院) ; School of Computer Science and Hubei Key Laboratory of Intelligent Geo-Information Processing(华中科技大学计算机科学与技术学院&湖北省智能地理信息处理重点实验室)
AI总结 提出Earth-OneVision,一个2B参数的RS-MLLM,通过全粒度视觉语言对齐、空间语言同构序列化和渐进式跨模态适应机制,统一六种传感器模态和九类任务,在多个基准上达到或超越4B-72B模型。