Transformers are considered revolutionary in the field of AI since this deep learning architecture was first written in Google’s research paper “Attention Is All You Need” by Vaswani et al., in 2017. The transformer model has now become the foundation for many state-of-the-art AI models and products like ChatGPT, Google Bard, Midjourney…
Before the transformer model, recurrent neural networks (RNNs) where data is processed sequentially, was a popular language AI architecture. However, the transformer architecture uses the attention mechanism to look at the relationships among all the words simultaneously from many different angles and determine which words carry more weights.
Let’s ask GPT-4 to explain transformers in an easy-to-understand way.