Distilling the core components of generative LLMs into an accessible framework…
Over the past few years, we have witnessed a rapid evolution of generative large language models (LLMs), culminating in the creation of unprecedented tools like ChatGPT. Generative AI has now become a popular topic among both researchers and the general public. Now more than ever before, it is important that researchers and engineers (i.e., those building the technology) develop an ability to communicate the nuances of their creations to others. A failure to communicate the technical aspects of AI in an understandable and accessible manner could lead to widespread public skepticism (e.g., research on nuclear energy went down a comparable path) or the enactment of overly-restrictive legislation that hinders forward progress in our field. Within this overview, we will take a small step towards solving these issues by proposing and outlining a simple, three-part framework for understanding and explaining generative LLMs.
Presentation resources. This post was inspired by a presentation that I recently gave for O’Reilly on the basics of LLMs. The goal of this presentation was to provide a “primer” that brought everyone up to speed with how generative LLMs work. The presentation lasted ~20 minutes (hence, the title of this article). For those interested in using the resources from this presentation, the slides are here.
The purpose of this overview is simple. The quality of generative language models has drastically improved in the last year (see above), and we want to understand what changes and new techniques catalyzed this boost in quality. Here, we will stick to transformer-based language models, though the concept of a language model predates the transformer architecture — dating back to recurrent neural network-based architectures (e.g., ULMFit [4]) or even n-gram language models.
Top-level view. To explain generative LLMs in a clear and simple manner, we must first identify the key ideas…