April 5, 2024

Inference

 



Inference is the process that a trained AI model uses new data to make prediction or to solve a task.  The AI model typically has 2 phases:


1)  The first phase is to train the model or to develop the intelligence by storing, recording, labeling data.  For example, if you’re training a model to identify a stop sign, you would feed the model with thousands of stop sign images so the model can refer to later. 


2)  The second phase is the inference, the AI model’s shining moment to prove that its intelligence developed during training can make a right prediction or solve a task.  During inference, the model applies its learned knowledge to real data to provide accurate predictions or generate outputs, such as images, text, or video.  This allows businesses to make real-time data-driven decisions and improve efficiency.



Inferencing is very expensive


Both training and inferencing are computationally expensive, however, training is more or less a one-time compute.  On another hand, inferencing is on-going, every time a user asks a question on a LLM and expects an answer, that’s inferencing.


Now multiple that by millions of users with millions of questions, you can imagine the huge compute cost that would incur on the AI system.  In fact, up to 90% of the AI model’s life might be spent in inference mode.  Inferencing is by an order of magnitude more expensive computationally than training.


No comments:

Post a Comment