2605.12500
2026-05-13
cs.CV
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Haiwen Diao, Penghao Wu, Hanming Deng, Jiahao Wang, Shihao Bai, Silei Wu, Weichen Fan, Wenjie Ye, Wenwen Tong, Xiangyu Fan, Yan Li, Yubo Wang, Zhijie Cao, Zhiqian Lin, Zhitao Yang, Zhongang Cai, Yuwei Niu, Yue Zhu, Bo Liu, Chengguang Lv, Haojia Yu, Haozhe Xie, Hongli Wang, Jianan Fan, Jiaqi Li, Jiefan Lu, Jingcheng Ni, Junxiang Xu, Kaihuan Liang, Lianqiang Shi, Linjun Dai, Linyan Wang, Oscar Qian, Peng Gao, Pengfei Liu, Qingping Sun, Rui Shen, Ruisi Wang, Shengnan Ma, Shuang Yang, Siyi Xie, Siying Li, Tianbo Zhong, Xiangli Kong, Xuanke Shi, Yang Gao, Yongqiang Yao, Yves Wang, Zhengqi Bai, Zhengyu Lin, Zixin Yin, Wenxiu Sun, Ruihao Gong, Quan Wang, Lewei Lu, Lei Yang, Ziwei Liu, Dahua Lin
AI总结
本文提出了一种名为 SenseNova-U1 的统一多模态模型,旨在解决当前视觉-语言模型中理解与生成分离的问题。该模型基于 NEO-unify 架构,将理解和生成视为同一底层过程的协同视角,从而实现更自然的多模态智能。研究展示了该模型在多种任务上的优越性能,并提供了详细的设计与训练策略,为多模态研究提供了新的方向。