StyleGAN-Fusion: Diffusion Guided Image Generator Domain Adaptation

Abstract

Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to another domain? In this paper, we show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models. Generators can be efficiently shifted into new domains indicated by text prompts without access to groundtruth samples from target domains. We demonstrate the effectiveness and controllability of our method through extensive experiments. Although not trained to minimize CLIP loss, our model achieves equally high CLIP scores and significantly lower FID than prior work on short prompts, and outperforms the baseline qualitatively and quantitatively on long and complicated prompts. To our best knowledge, the proposed method is the first attempt at incorporating large-scale pre-trained diffusion models and distillation sampling for text-driven image generator domain adaptation and gives a quality previously beyond possible. Moreover, we extend our work to 3D-aware style-based generators.

Model Overview

Overview of our StyleGAN-Fusion framework. The style-based generator receives the gradient backpropagated from diffusion UNet through encoder. All noises and noisy images are the decoded corresponding latents for visualization purposes.

Experiments

AFHQ-cat to eight other animals

Uncurated samples from our method on AFHQ-cat to dog, otter, hamster, fox, badger, liom, bear, and pig.

FFHQ experiments

Prompt:

"3d human face, closeup cute and adorable, cute big circular reflective eyes, Pixar render, unreal engine cinematic smooth, intricate detail, cinematic"

1 / 3

2 / 3

3 / 3

❮ ❯

Prompts:

"Mark Zuckerberg, portrait, face"
"a very beautiful anime girl, full body, long braided curly silver hair, sky blue eyes, full round face, short smile, casual clothes, ice snowy lake setting, cinematic lightning, medium shot, mid-shot, highly detailed, trending on Artstation, Unreal Engine 4k, cinematic wallpaper by Stanley Artgerm Lau, WLOP, Rossdraws, James Jean, Andrei Riabovitchev, Marc Simonetti, and Sakimichan"
"Werewolf"
"sketch portrait, closeup face, pen and ink sketch"
"The Joker"
"very beautiful portrait of an extremely cute and adorable face, smooth, perfect face, fantasy, character design by mark ryden and pixar and hayao miyazaki, sharp focus, concept art, harvest fall vibrancy, intricate detail, cinematic lighting, hyperrealistic, 35 mm, diorama macro photography, 8k, 4k"

AFHQ-Cat experiments

Prompt:

"3d cat face, closeup cute and adorable, cute big circular reflective eyes, Pixar render, unreal engine cinematic smooth, intricate detail, cinematic"

Extend our work to adapting EG3D-Face and Cat:

We extend our method to 3D Geometry-aware generators from EG3D on the face and cat models provided by its authors.

Prompt: "3d human face, closeup cute and adorable, cute big circular reflective eyes, Pixar render, unreal engine cinematic smooth, intricate detail, cinematic"

Prompt: "3d cat face, closeup cute and adorable, cute big circular reflective eyes, Pixar render, unreal engine cinematic smooth, intricate detail, cinematic"

For more details, please check out our paper: arXiv

BibTeX

@article{song2022diffusion,
      title={Diffusion Guided Domain Adaptation of Image Generators},
      author={Song, Kunpeng and Han, Ligong and Liu, Bingchen and Metaxas, Dimitris and Elgammal, Ahmed},
      journal={arXiv preprint https://arxiv.org/abs/2212.04473},
      year={2022}
}

Diffusion Guided Image Generator Domain Adaptation

Generated images after adapting FFHQ/AFHQ-cat generators to a 3D rendering style.

Abstract

Model Overview

Experiments

AFHQ-cat to eight other animals

FFHQ experiments

Prompt:

Prompts:

AFHQ-Cat experiments

Prompt:

Extend our work to adapting EG3D-Face and Cat:

For more details, please check out our paper: arXiv

BibTeX