Awesome Efficient Diffusion

A curated list of methods that focus on improving the efficiency of diffusion models

Index

Algorithms:
- 1. Basics
- 1. Arch-level Compression
- 1. Time-step-level Compression
- 1. Data-precision-level Compression
- 1. Input-level Compression
- 1. Efficient Tuning
Applications:
- a. Personalization
- b. Controllable Generation
- c. Multi-Media Generation
Deployment:
- Ⅰ. GPU
- Ⅱ. Mobile
- Ⅲ. Miscellaneous Devices

Design Principles

Simple: Summarize structural points as paper description, omit the details (dont get lost in low information texts)
Quantitative: give the relative speedup for certain method (how far have we come?)

🔬 Algotithms

0. Basics

Some basic diffusion model papers, specifying the preliminaries. Noted that the main focus of this awesome list is the efficient method part. Therefore it only contains minimum essential preliminary studies you need to know before acclerating diffusion.

The [🤗 Huggingface Diffuser Doc] is also a good way of getting started with diffusion

0.1. Methods (Variants/Techniques)

some famous diffusion models (e.g., DDPM, Stable-Diffusion), and key techniques

[ICML15 - DPM] : "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- Pre-DDPM, early diffusion model
[NeurIPS20 - DDPM]: "Denoising Diffusion Probabilistic Models"
- Discrete time diffusion model
[ICLR21 - SDE]: "Score-Based Generative Modeling through Stochastic Differential Equations";
- Continuous time Neural SDE formulation of diffusion
[Arxiv2105 - Classifier-Guidance]: "Diffusion Models Beat GANs on Image Synthesis";
- Conditional generation with classifier guidance
[CVPR22 - Stable-Diffusion]: "High-Resolution Image Synthesis with Latent Diffusion Models";
- Latent space diffusion (with VAE)
- Latent class guidance (with CLIP embedding fed into cross_attn)
- [📎 Code:CompVis/stable-diffusion]
[TechReport-2204 - DeepFlyoid-IF] ;
- following [NeurIPS22 - Imagen]: “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”;
- Larger Language Model (T5 over CLIP)
- Pixel-space Diffusion
- Diffusion for SR
- [📎 Code: sdeep-floyd/IF]

0.2. Architecture Components

"Vision-Language Model"
- CLIP, T5
- Containing Operations:
  - Self-Attention (Cross-Attention)
  - FFN (FC)
  - LayerNorm (GroupNorm)
Diffusion Model (also sometimes used for SuperReoslution)
- U-Net
- Containing Operations:
  - Conv
  - DeConv (ConvTransposed, Interpolation)
  - Low-range Shortcut Connection
Encoder-Decoder (for latent-space diffusion)
- VAE (in latent diffusion)
- Containing Operations:
  - Conv
  - DeConv (ConvTransposed, Interpolation)

0.3. Solver (Sampler)

[ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";
- determinstic sampling,
- reduce time-steps
[NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- follow-up work: [Arxiv2211 - DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
- multi-order ODE, faster convergence

0.4. Evaluation Metric

InceptionDistance
- Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
- Kernel Inception Distance
- Inception Score
- limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
  - The specific Inception model used during computation.
  - The image format (not the same if we start from PNGs vs JPGs)
Clip-related
- CLIP score: compatibility of image-text pair
- CLIP directional similarity: compatibility of image-text pair
- limitation: The captions tags were crawled from the web, may not align with human description.
Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)

0.5. Datasets & Settings

0.5.1 Unconditional Generation

0.5.2 Text-to-Image Generation

CIFAR-10:
CelebA:

0.5.3 Image/Depth-to-Image Generation

1. Arch-level compression

reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques

[Arxiv2305] "Structural Pruning for Diffusion Models";
- Code

2. Timestep-level Compression

reduce the timestep (the number of u-net inference)

2.1 Improved Sampler

Improved sampler, faster convergence, less timesteps

[ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";
- 📊 典型结果：50~~100 Steps -> 10~~20 Steps with moderate perf. loss
[NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 📊 典型结果：NFE(num of unet forward)=10 achieves similar performance with DDIM NFE=100

2.2 Improved Training

Distillation/New Scheme

[Arxiv2305 - CatchUpDistillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";
[ICML23 - ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";
- Skip intemediate steps:
- Retrieval: find similar partially generated scheduling in early stage
[Arxiv2303 - Consistency Model]: "Consistency Models";
- New objective: consistency based

3. Data-precision-level Compression

quantization & low-bit inference/training

[Arxiv2305 - PTQD]: "PTQD: Accurate Post-Training Quantization for Diffusion Models";
[Arxiv2304 - BiDiffusion] "Binary Latent Diffusion";

4. Input-level Compression

4.1 Adaptive Inference

save computation for different sample condition (noise/prompt/task)

[Arxiv2304 - ToMe]: "Token Merging for Fast Stable Diffusion";

4.2 Patched Inference

reduce the processing resolution

[Arxiv2304 - PatchDiffusion]: "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models"
[CVPR23W - MemEffPatchGen]: "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation";

5. Efficient Tuning

[Arxiv2304 - DiffFit]: "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning"
[Arxiv2303 - ParamEffTuningSummary]: "A Closer Look at Parameter-Efficient Tuning in Diffusion Models";

5.1. Low-Rank

The LORA family

🖨 Applications

a. Personalization

b. Controllable Generation

c. Multi-modal Generation

🔋 Deployments

Ⅰ. GPU

Ⅱ. Mobile

[Arxiv2306 - SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
- Platform: Iphone 14 Pro, 1.84s
- Model Evolution: 3.8x less param compared with SD-V1.5
- Step Distilaltion into 8 steps

Ⅲ. Miscellaneous Devices

License

This list is under the Creative Commons licenses License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

A-suozhang/Awesome-Efficient-Diffusion

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Awesome Efficient Diffusion

Index

Design Principles

🔬 Algotithms

0. Basics

0.1. Methods (Variants/Techniques)

0.2. Architecture Components

0.3. Solver (Sampler)

0.4. Evaluation Metric

0.5. Datasets & Settings

0.5.1 Unconditional Generation

0.5.2 Text-to-Image Generation

0.5.3 Image/Depth-to-Image Generation

1. Arch-level compression

2. Timestep-level Compression

2.1 Improved Sampler

2.2 Improved Training

3. Data-precision-level Compression

4. Input-level Compression

4.1 Adaptive Inference

4.2 Patched Inference

5. Efficient Tuning

5.1. Low-Rank

🖨 Applications

a. Personalization

b. Controllable Generation

c. Multi-modal Generation

🔋 Deployments

Ⅰ. GPU

Ⅱ. Mobile

Ⅲ. Miscellaneous Devices

Related

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages