Skip to content

Curated list of methods that focuses on improving the efficiency of diffusion models

Notifications You must be signed in to change notification settings

A-suozhang/Awesome-Efficient-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome Efficient Diffusion

Awesome PRs Welcome

A curated list of methods that focus on improving the efficiency of diffusion models

Index

  • Algorithms:
      1. Basics
      1. Arch-level Compression
      1. Time-step-level Compression
      1. Data-precision-level Compression
      1. Input-level Compression
      1. Efficient Tuning
  • Applications:
    • a. Personalization
    • b. Controllable Generation
    • c. Multi-Media Generation
  • Deployment:
    • Ⅰ. GPU
    • Ⅱ. Mobile
    • Ⅲ. Miscellaneous Devices

Design Principles

  • Simple: Summarize structural points as paper description, omit the details (dont get lost in low information texts)
  • Quantitative: give the relative speedup for certain method (how far have we come?)

🔬 Algotithms

0. Basics

Some basic diffusion model papers, specifying the preliminaries. Noted that the main focus of this awesome list is the efficient method part. Therefore it only contains minimum essential preliminary studies you need to know before acclerating diffusion.

The [🤗 Huggingface Diffuser Doc] is also a good way of getting started with diffusion

0.1. Methods (Variants/Techniques)

some famous diffusion models (e.g., DDPM, Stable-Diffusion), and key techniques

0.2. Architecture Components

  • "Vision-Language Model"
    • CLIP, T5
    • Containing Operations:
      • Self-Attention (Cross-Attention)
      • FFN (FC)
      • LayerNorm (GroupNorm)
  • Diffusion Model (also sometimes used for SuperReoslution)
    • U-Net
    • Containing Operations:
      • Conv
      • DeConv (ConvTransposed, Interpolation)
      • Low-range Shortcut Connection
  • Encoder-Decoder (for latent-space diffusion)
    • VAE (in latent diffusion)
    • Containing Operations:
      • Conv
      • DeConv (ConvTransposed, Interpolation)

0.3. Solver (Sampler)

  • [ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";

    • determinstic sampling,
    • reduce time-steps
  • [NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";

    • follow-up work: [Arxiv2211 - DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
    • multi-order ODE, faster convergence

0.4. Evaluation Metric

  • InceptionDistance

    • Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
    • Kernel Inception Distance
    • Inception Score
    • limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
      • The specific Inception model used during computation.
      • The image format (not the same if we start from PNGs vs JPGs)
  • Clip-related

    • CLIP score: compatibility of image-text pair
    • CLIP directional similarity: compatibility of image-text pair
    • limitation: The captions tags were crawled from the web, may not align with human description.
  • Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)

0.5. Datasets & Settings

0.5.1 Unconditional Generation

0.5.2 Text-to-Image Generation

  • CIFAR-10:

  • CelebA:

0.5.3 Image/Depth-to-Image Generation

1. Arch-level compression

reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques

2. Timestep-level Compression

reduce the timestep (the number of u-net inference)

2.1 Improved Sampler

Improved sampler, faster convergence, less timesteps

  • [ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";
    • 📊 典型结果:50100 Steps -> 1020 Steps with moderate perf. loss
  • [NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
    • 📊 典型结果:NFE(num of unet forward)=10 achieves similar performance with DDIM NFE=100

2.2 Improved Training

Distillation/New Scheme

  • [Arxiv2305 - CatchUpDistillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";

  • [ICML23 - ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";

    • Skip intemediate steps:
    • Retrieval: find similar partially generated scheduling in early stage
  • [Arxiv2303 - Consistency Model]: "Consistency Models";

    • New objective: consistency based

3. Data-precision-level Compression

quantization & low-bit inference/training

4. Input-level Compression

4.1 Adaptive Inference

save computation for different sample condition (noise/prompt/task)

4.2 Patched Inference

reduce the processing resolution

5. Efficient Tuning

5.1. Low-Rank

The LORA family

🖨 Applications

a. Personalization

b. Controllable Generation

c. Multi-modal Generation

🔋 Deployments

Ⅰ. GPU

Ⅱ. Mobile

  • [Arxiv2306 - SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
    • Platform: Iphone 14 Pro, 1.84s
    • Model Evolution: 3.8x less param compared with SD-V1.5
    • Step Distilaltion into 8 steps

Ⅲ. Miscellaneous Devices

Related

License

This list is under the Creative Commons licenses License.

Releases

No releases published

Packages

No packages published