A curated list of methods that focus on improving the efficiency of diffusion models
- Algorithms:
-
- Basics
-
- Arch-level Compression
-
- Time-step-level Compression
-
- Data-precision-level Compression
-
- Input-level Compression
-
- Efficient Tuning
-
- Applications:
- a. Personalization
- b. Controllable Generation
- c. Multi-Media Generation
- Deployment:
- Ⅰ. GPU
- Ⅱ. Mobile
- Ⅲ. Miscellaneous Devices
- Simple: Summarize structural points as paper description, omit the details (dont get lost in low information texts)
- Quantitative: give the relative speedup for certain method (how far have we come?)
Some basic diffusion model papers, specifying the preliminaries. Noted that the main focus of this awesome list is the efficient method part. Therefore it only contains minimum essential preliminary studies you need to know before acclerating diffusion.
The [🤗 Huggingface Diffuser Doc] is also a good way of getting started with diffusion
some famous diffusion models (e.g., DDPM, Stable-Diffusion), and key techniques
-
[ICML15 - DPM] : "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- Pre-DDPM, early diffusion model
-
[NeurIPS20 - DDPM]: "Denoising Diffusion Probabilistic Models"
- Discrete time diffusion model
-
[ICLR21 - SDE]: "Score-Based Generative Modeling through Stochastic Differential Equations";
- Continuous time Neural SDE formulation of diffusion
-
[Arxiv2105 - Classifier-Guidance]: "Diffusion Models Beat GANs on Image Synthesis";
- Conditional generation with classifier guidance
-
[CVPR22 - Stable-Diffusion]: "High-Resolution Image Synthesis with Latent Diffusion Models";
- Latent space diffusion (with VAE)
- Latent class guidance (with CLIP embedding fed into cross_attn)
- [📎 Code:CompVis/stable-diffusion]
-
[TechReport-2204 - DeepFlyoid-IF] ;
- following [NeurIPS22 - Imagen]: “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”;
- Larger Language Model (T5 over CLIP)
- Pixel-space Diffusion
- Diffusion for SR
- [📎 Code: sdeep-floyd/IF]
- "Vision-Language Model"
- Diffusion Model (also sometimes used for SuperReoslution)
- U-Net
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
- Low-range Shortcut Connection
- Encoder-Decoder (for latent-space diffusion)
- VAE (in latent diffusion)
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
-
[ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";
- determinstic sampling,
- reduce time-steps
-
[NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- follow-up work: [Arxiv2211 - DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
- multi-order ODE, faster convergence
-
- Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
- Kernel Inception Distance
- Inception Score
- limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
- The specific Inception model used during computation.
- The image format (not the same if we start from PNGs vs JPGs)
-
- CLIP score: compatibility of image-text pair
- CLIP directional similarity: compatibility of image-text pair
- limitation: The captions tags were crawled from the web, may not align with human description.
-
Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)
-
CIFAR-10:
-
CelebA:
reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques
reduce the timestep (the number of u-net inference)
Improved sampler, faster convergence, less timesteps
- [ICLR21 - DDIM]: "Denoising Diffusion Implicit Models";
- 📊 典型结果:50
100 Steps -> 1020 Steps with moderate perf. loss
- 📊 典型结果:50
- [NeurIPS22 - DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 📊 典型结果:NFE(num of unet forward)=10 achieves similar performance with DDIM NFE=100
Distillation/New Scheme
-
[Arxiv2305 - CatchUpDistillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";
-
[ICML23 - ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";
- Skip intemediate steps:
- Retrieval: find similar partially generated scheduling in early stage
-
[Arxiv2303 - Consistency Model]: "Consistency Models";
- New objective: consistency based
quantization & low-bit inference/training
-
[Arxiv2305 - PTQD]: "PTQD: Accurate Post-Training Quantization for Diffusion Models";
-
[Arxiv2304 - BiDiffusion] "Binary Latent Diffusion";
save computation for different sample condition (noise/prompt/task)
- [Arxiv2304 - ToMe]: "Token Merging for Fast Stable Diffusion";
reduce the processing resolution
-
[Arxiv2304 - PatchDiffusion]: "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models"
-
[CVPR23W - MemEffPatchGen]: "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation";
-
[Arxiv2304 - DiffFit]: "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning"
-
[Arxiv2303 - ParamEffTuningSummary]: "A Closer Look at Parameter-Efficient Tuning in Diffusion Models";
The LORA family
- [Arxiv2306 - SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
- Platform: Iphone 14 Pro, 1.84s
- Model Evolution: 3.8x less param compared with SD-V1.5
- Step Distilaltion into 8 steps
- heejkoo/Awesome-Diffusion-Models
- awesome-stable-diffusion/awesome-stable-diffusion
- hua1995116/awesome-ai-painting
- PRIV-Creation/Awesome-Diffusion-Personalization
- Schuture/Benchmarking-Awesome-Diffusion-Models
- shogi880/awesome-controllable-stable-diffusion
- Efficient Diffusion Models for Vision: A Survey
- Tracking Papers on Diffusion Models
This list is under the Creative Commons licenses License.