Pre-Training Context is All You Need | by Benjamin Thürer

The driving force behind modern transformer models stems to a large extent from its pertaining data, allowing for strong in-context learning capabilities.

Generative Artificial Intelligence and its popular transformer models are advertised everywhere these days and new models are being released every hour (see the inflation of AI). In this rapidly evolving field of AI, the possibilities of values these models could bring seem to be endless. Large Language Models (LLM) like chatGPT already made it into every Engineers’ pile of resources, writers use them to support their articles, and designers create the first visuals or seek inspiration from the outcome of computer vision models.

If it is not magic, what really powers these impressive transformer models?

However, even though the achievements and usefulness are great and generative AI enhances productivity, it is important to recall that modern Machine Learning models (like LLMs or VisionTransformers) are not performing any magic at all (similar to the fact that ML, or statistical models in general, never have been magical). Even though the remarkable abilities of models might be perceived as magic-like and some experts in the field even talk about things like hallucinations of models, still, the foundation of every model is just math and statistical probabilities (sometimes complex, but still math). This leads to the fundamental question: If it is not magic, what really powers these impressive transformer models?

Figure 1: Showcasing that ChatGPT (using GPT4) points towards its “advanced technology” and “extensive training” as the main performance drivers.

As with any model (statistical or ML), it is the training data that has the largest impact on the later model performance. If you don’t have a high volume of quality data reflecting the relationships you would like the model to learn, there is nothing to train on and the resulting model will perform poorly (the famous GIGO principle: Garbage In Garbage Out). This fundamental principle of data modeling has not changed at all over the years. Behind every revolutionary new transformer model stands first of all just one thing: data. It is the amount, quality, and context of that data that will…