Diffusion models are generative AI models that excel at generating high-quality images, video, text, sound, etc. These models work by adding random noise to a dataset and then learning to reverse this process to get the original data distribution. The idea is rooted in statistical physics, specifically the concept of diffusion processes which describes how molecules move from high-concentration to low-concentration areas.
Let's ask GPT-4 to explain diffusion models:
Diffusion models were first proposed in a research paper ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’ by Jascha Sohl-Dickstein et al., from Stanford University and the University of California, Berkeley, in 2015. Since then, the models have been adopted in the AI industry. Some industry products include OpenAI’s DALL-E-3 for text-to-image, OpenAI’s Sora for text-to-video, or Stability AI’s Stable Diffusion 3.