April 26, 2024

Diffusion Models

 




Diffusion models are generative AI models that excel at generating high-quality images, video, text, sound, etc.  These models work by adding random noise to a dataset and then learning to reverse this process to get the original data distribution.  The idea is rooted in statistical physics, specifically the concept of diffusion processes which describes how molecules move from high-concentration to low-concentration areas. 


Let's ask GPT-4 to explain diffusion models:






Diffusion models were first proposed in a research paper ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’ by Jascha Sohl-Dickstein et al., from Stanford University and the University of California, Berkeley, in 2015.  Since then, the models have been adopted in the AI industry.  Some industry products include OpenAI’s DALL-E-3 for text-to-image, OpenAI’s Sora for text-to-video, or Stability AI’s Stable Diffusion 3.


Reference for cat picture.


April 19, 2024

GAN

 



Generative adversarial network (GAN) is a machine learning model used in unsupervised learning where two neural networks compete against each other.  Both networks are trained simultaneously through adversarial training to become more accurate in their predictions.


The architecture of GAN would have one generator network that artificially creates random outputs that look real and one discriminator network that identifies whether the outputs are real or not.  As the training progresses, the generator learns to produce samples that are increasingly difficult for the discriminator to distinguish from the real ones.  At convergence, the generator generates samples that are almost indistinguishable from real data.


GAN model was first written in a research paper ‘Generative Adversarial Nets’ by Ian Goodfellow et al. from the University of Montreal in 2014.  Since then, GAN models have seen their growth in image, video, and text generation because GANs are more focused on generating new samples where previous samples have not existed before.  The major benefit of GANs is that they can be used to create new data outputs where data collection is difficult or impossible. 


Some examples of real-world GAN models include (1) NVIDIA's GANverse3D that generates 3D models from single 2D images, (2) The Fabricant, the digital fashion house that generates digital innovative clothing designs, or (3) This-person-does-not-exist, a website that generates lifelike images of human faces that don't belong to real people.  

 

In the next post, we will explore a different model that also excels at image, video, and text generation.



April 5, 2024

Inference

 



Inference is the process that a trained AI model uses new data to make prediction or to solve a task.  The AI model typically has 2 phases:


1)  The first phase is to train the model or to develop the intelligence by storing, recording, labeling data.  For example, if you’re training a model to identify a stop sign, you would feed the model with thousands of stop sign images so the model can refer to later. 


2)  The second phase is the inference, the AI model’s shining moment to prove that its intelligence developed during training can make a right prediction or solve a task.  During inference, the model applies its learned knowledge to real data to provide accurate predictions or generate outputs, such as images, text, or video.  This allows businesses to make real-time data-driven decisions and improve efficiency.



Inferencing is very expensive


Both training and inferencing are computationally expensive, however, training is more or less a one-time compute.  On another hand, inferencing is on-going, every time a user asks a question on a LLM and expects an answer, that’s inferencing.


Now multiple that by millions of users with millions of questions, you can imagine the huge compute cost that would incur on the AI system.  In fact, up to 90% of the AI model’s life might be spent in inference mode.  Inferencing is by an order of magnitude more expensive computationally than training.