BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models
BlazeEdit: 基于图像到图像扩散模型的移动设备通用图像编辑
Fei Deng, Yanwu Xu, Zhipeng Bao, Zhixing Zhang, Haolin Jia, Karthik Raveendran, Jianing Wei
AI总结 提出BlazeEdit,一个仅195M参数的轻量级通用图像编辑扩散模型,通过消除文本条件组件和多任务架构,在移动设备上实现快速、隐私保护的图像编辑。
详情
- Comments
- Accepted to CVPR 2026 EDGE Workshop
现代扩散模型卓越的生成质量往往以巨大的参数量为代价,这需要服务器端推理,带来显著的计算成本和潜在的隐私风险。因此,开发高效的设备端替代方案日益受到关注。尽管最近的努力优化了移动硬件上的文本到图像模型,但它们仍然相对庞大,通常有0.5B到1B参数。我们提出了BlazeEdit,一个专为设备端部署设计的高效通用图像到图像扩散模型。通过识别许多实际图像编辑任务不需要基于文本的指导,我们消除了文本条件组件,并开发了一个多任务架构,将对象移除、外扩、色调校正、重新照明和贴纸生成整合到一个仅195M参数的紧凑模型中。BlazeEdit大幅减少了下载大小和内存开销,同时保持了具有竞争力的生成质量。它在Pixel 10上仅需290ms即可完成一次完整推理,为边缘设备上的通用图像编辑提供了无缝、隐私保护和闪电般的体验。
The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts, which necessitate server-side inference with significant computational costs and potential privacy risks. Consequently, there is growing momentum toward developing efficient on-device alternatives. While recent efforts have optimized text-to-image models for mobile hardware, they remain relatively bulky, typically ranging from 0.5B to 1B parameters. We present BlazeEdit, a highly efficient, generalist image-to-image diffusion model tailored for on-device deployment. By identifying that many practical image editing tasks do not require text-based guidance, we eliminate the text-conditioning components and develop a multi-task architecture that consolidates object removal, outpainting, tone correction, relighting, and sticker generation into a single, compact model of only 195M parameters. BlazeEdit achieves a substantial reduction in download size and memory overhead while maintaining competitive generation quality. It completes a full inference pass in just 290ms on a Pixel 10, delivering a seamless, privacy-preserving, and lightning-fast experience for generalist image editing on the edge.