arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

图像生成

图像生成、文生图、图像编辑、扩散模型和可控生成。

今日/当前日期收录 4 信号源:cs.CV, cs.GR, cs.MM
2606.19195 2026-06-18 cs.CV 新提交 95%

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Moebius: 0.2B轻量级图像修复框架,性能达10B级别

Kangsheng Duan, Ziyang Xu, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang

发表机构 * Huazhong University of Science and Technology(华中科技大学) VIVO AI Lab(VIVO人工智能实验室)

专题命中 图像修复 :轻量级图像修复框架,属于图像修复

AI总结 提出Moebius轻量级图像修复框架,通过局部-λ混合交互模块和自适应多粒度蒸馏策略,以0.22B参数实现与10B级模型FLUX.1-Fill-Dev相当甚至更优的生成质量,推理速度提升15倍以上。

详情
AI中文摘要

尽管10B级别的工业基础模型推动了图像修复的边界,但其高昂的计算成本严重阻碍了实际部署。构建高度优化的任务特定专家模型是一个有前景的解决方案,然而极端的结构压缩不可避免地引发了严重的表示瓶颈。为解决这一问题,我们提出了Moebius,一个高效的轻量级修复框架。我们通过引入局部-λ混合交互($L\lambda MI$)模块系统地重构了扩散主干。该模块由局部-λ和交互-λ子模块组成,巧妙地将空间上下文和全局语义先验总结为固定大小的线性矩阵,在保留复杂潜在交互的同时大幅减少参数。此外,为了释放这种高度紧凑架构的全部表示能力,我们将其与自适应多粒度蒸馏策略协同配对。该策略严格在潜在空间内操作以避免昂贵的像素空间解码,动态平衡多个基于梯度的损失以实现高保真对齐。在自然和肖像基准上的大量实验表明,这种最优协同使Moebius能够媲美甚至超越10B级工业通用模型FLUX.1-Fill-Dev的生成质量。值得注意的是,Moebius仅使用不到2%的参数(0.22B vs. 11.9B)就实现了这一点,同时总推理时间加速超过15倍,为高保真修复设立了新的效率标准。项目页面见此https URL。

英文摘要

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-$λ$ Mix Interaction ($LλMI$) block. Comprising Local-$λ$ and Interactive-$λ$ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a $>15\times$ acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

2603.05010 2026-06-18 cs.CV 版本更新 90%

How far have we gone in Generative Image Restoration? A study on its capability, limitations and evaluation practices

生成式图像恢复进展:能力、局限性与评估实践研究

Xiang Yin, Jinfan Hu, Zhiyuan You, Kainan Yan, Yu Tang, Chao Dong, Jinjin Gu

发表机构 * Fudan University(复旦大学) Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences(深圳先进技术研究院,中国科学院) University of the Chinese Academy of Sciences(中国科学院大学) Multimedia Laboratory, The Chinese University of Hong Kong(香港中文大学多媒体实验室) Shenzhen University of Advanced Technology(深圳先进技术大学)

专题命中 图像修复 :研究生成式图像恢复,包括扩散和GAN模型

AI总结 通过多维度评估管道系统比较扩散、GAN等生成式模型与PSNR导向模型,揭示从细节不足到细节质量与语义控制的范式转变,并训练了更符合人类感知的IQA模型。

Comments Accepted by CVPR 2026 Findings

详情
AI中文摘要

生成式图像恢复(GIR)在感知真实感方面取得了显著进展,但与先前方法相比,其实际能力究竟有多大提升?为回答这一问题,我们基于新的多维度评估管道开展大规模研究,该管道从细节、清晰度、语义正确性和整体质量四个维度评估模型。我们的分析涵盖多种架构,包括基于扩散的、基于GAN的、PSNR导向的以及通用生成模型,揭示了关键的性能差异。此外,我们的分析揭示了失败模式的演变,这标志着以感知为导向的低层视觉领域发生了范式转变。核心挑战正从先前的细节稀缺(欠生成)问题演变为细节质量和语义控制(防止过生成)的新前沿。我们还利用我们的基准训练了一个新的IQA模型,该模型更符合人类感知判断。最终,本工作对现代生成式图像恢复模型进行了系统研究,提供了关键见解,重新定义了对其真实状态的理解,并为未来发展指明了方向。

英文摘要

Generative Image Restoration (GIR) has achieved impressive perceptual realism, but how far have its practical capabilities truly advanced compared with previous methods? To answer this, we present a large-scale study grounded in a new multi-dimensional evaluation pipeline that assesses models on detail, sharpness, semantic correctness, and overall quality. Our analysis covers diverse architectures, including diffusion-based, GAN-based, PSNR-oriented, and general-purpose generation models, revealing critical performance disparities. Furthermore, our analysis uncovers a key evolution in failure modes that signifies a paradigm shift for the perception-oriented low-level vision field. The central challenge is evolving from the previous problem of detail scarcity (under-generation) to the new frontier of detail quality and semantic control (preventing over-generation). We also leverage our benchmark to train a new IQA model that better aligns with human perceptual judgments. Ultimately, this work provides a systematic study of modern generative image restoration models, offering crucial insights that redefine our understanding of their true state and chart a course for future development.

2602.00176 2026-06-18 cs.CV cs.AI 版本更新 70%

Posterior Continuation with Noise-Conditioned Frequency Exposure for Diffusion Inverse Problems

基于噪声条件频率暴露的扩散逆问题后验延续

Feng Tian, Yixuan Li, Weili Zeng, Weitian Zhang, Yichao Yan, Xiaokang Yang

发表机构 * Shanghai Jiao Tong University(上海交通大学)

专题命中 图像修复 :提出后验延续框架用于扩散逆问题,包括图像修复。

AI总结 提出后验延续框架,根据扩散噪声水平逐步暴露测量频率,结合稳定采样器实现超分辨率、修复和去模糊的先进性能。

详情
AI中文摘要

扩散后验采样通过将预训练的扩散先验与测量一致性指导相结合来解决逆问题。然而,在高噪声水平下,全频带指导可能不可靠,因为干净估计包含分数诱导误差,且高频测量方向弱可识别。我们认为后验指导应根据瞬时扩散噪声水平暴露测量频率。基于这一原则,我们提出一个后验延续框架,构建一系列中间后验,其似然强调当前可靠频带并逐渐恢复全频带一致性。我们通过一个稳定采样器实例化该框架,该采样器结合了扩散预测器、频率受限似然细化以及Haar域承诺规则,该规则提交可靠粗校正同时推迟弱可识别细节。在超分辨率、修复和去模糊任务中,我们的方法实现了具有竞争力乃至最先进的恢复性能,包括在FFHQ和ImageNet评估中,运动去模糊相比强基线PSNR提升高达5 dB。

英文摘要

Diffusion posterior sampling solves inverse problems by combining a pretrained diffusion prior with measurement-consistency guidance. However, full-band guidance can be unreliable at high noise levels, where clean estimates contain score-induced errors and high-frequency measurement directions are weakly identifiable. We argue that posterior guidance should expose measurement frequencies according to the instantaneous diffusion noise level. Based on this principle, we propose a posterior continuation framework that constructs a family of intermediate posteriors whose likelihood emphasizes currently reliable frequency bands and gradually returns to full-band consistency. We instantiate this framework with a stabilized sampler that combines a diffusion predictor, frequency-limited likelihood refinement, and a Haar-domain commitment rule that commits reliable coarse corrections while deferring weakly identifiable details. Across super-resolution, inpainting, and deblurring, our method achieves competitive-to-state-of-the-art restoration performance, including up to 5 dB PSNR improvement on motion deblurring over strong baselines in evaluations on FFHQ and ImageNet.

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新 65%

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University(萨特巴耶夫大学) Institute of Ionosphere LLP(电离层研究所) Information Technology Department(信息技术部门) Assiut University(阿西乌特大学)

专题命中 图像修复 :使用GAN进行图像修复,重建缺失细节。

AI总结 提出结合目标检测、GAN(CRA)修复和Transformer/CNN分类的端到端框架,发现重建质量高(PSNR 28.7dB)但分类准确率仅53%,通过置信度混合集成将MCA从48%提升至58%,揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情
AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻,这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功,但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线,利用目标检测进行样本分割,随后使用具有上下文残差聚合(CRA)的生成对抗网络(GAN)进行图像修复,以重建缺失的高频细节。接着,我们在重建数据上评估了现代基于Transformer(Swin、ViT)和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧:尽管结构保真度高(PSNR 28.7 dB,FID 74.01),分类准确率却停滞在53%。为了改善少数类检测,我们提出了一种基于置信度的混合集成方法,将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性,它们可能产生视觉上合理但语义模糊的特征(“幻觉”),从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系,为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围,我们将所得到的系统定位为岩相解释的决策支持和筛选工具,而非完全自主的分类器。代码可在以下网址获取:https://github.com/your-repo(注:原文URL未提供,此处为示例)

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition