MagicFusion

Jing Zhao¹ , Heliang Zheng², Chaoyue Wang², Long Lan¹, Wenjing Yang^1†

National University of Defense Technology¹; JD Explore Academy²

Abstract

The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications.

An overview of our Saiency-aware Noise Blending. Given two diffusion models, we first design a ''Noise to salience map'' module to obtain salience maps. After that, we can generate saliency-aware masks based on the salience maps. Finally, we blend the diffusion models in the noise space according to the mask. (*) classifier-free guidances are noises instead of noisy images, and we add the image here just for visualization.

MagicFusion: Boosting Text-to-Image Generation Performance

by Fusing Diffusion Models(ICCV 2023)

Accepted by ICCV 2023

Abstract

An overview of our Saiency-aware Noise Blending.

Fine-grained Fusion

Recontextualization

Cross-domain Fusion

Displaying more experimental results.

Fine-grained Fusion

Recontextualization

Cross-domain Fusion