An Intuition for How Models like ChatGPT Work | by David Hundley

Providing an intuition on the ideas behind popular transformer models like ChatGPT and other large language models (LLMs)

10 min read

11 hours ago

As we wind down 2023, it’s incredible to think about how much Generative AI has already impacted our daily lives. Starting with ChatGPT’s release in November 2022, this space has evolved so quickly that it’s hard to believe that it’s been just one year in which all these advancements have come out.

While the results are quite amazing, the underlying complexity has led a lot of people to publicly speculate on how these large language models (LLMs) work. Some people have speculated that these models are pulling from a preformulated database of responses, and some have gone as far to speculate that these LLMs have gained a human-level of sentience. These are extreme stances, and as you might guess, both are incorrect.

You may have heard that these LLMs are next-word predictors, meaning that they use probability to determine the next word that should come in a sentence. This understanding is technically correct, but it’s a little too high level to sufficiently understand these models. In order to build a stronger intuition, we need to go deeper. The intention of this post is to provide business leaders with a deep enough understanding of these models that they can make educated decisions on how to appropriately approach Generative AI for their respective companies. We’ll keep things at more of a conceptual and intuitive level and stray away from the deep math behind these models.

Consider the sentence, “I like to drink _______ in the morning.” How might you discern how to fill in that blank? Most reasonable people might fill in answers like coffee, water, or juice. The more silly among us might say something like beer or sour milk, but all these various options fixate on one important context clue: drinking. That alone narrows down what that blank could be, but those who took in the full context of the sentence also noticed the word “morning” and thus narrowed the context even further. In other words, “drink” + “morning” = something in the neighborhood of a breakfast beverage.