The Power of Retrieval Augmented Generation: A Comparison between Base and RAG LLMs with Llama2

A deep dive into tailoring pre-trained LLMs for custom use cases using a RAG approach, featuring LangChain and Hugging Face integration

This post was co-authored with Rafael Guedes.

Since the release of ChatGPT in November of 2022, Large Language Models (LLMs) have been the hot topic in the AI community for their capabilities in understanding and generating human-like text, pushing the boundaries of what was previously possible in natural language processing (NLP).

LLMs have been shown to be versatile by tackling different use cases in various industries since they are not limited to a specific task. They can be adapted to several domains, making them attractive for organizations and the research community. Several applications have been explored using LLMs such as content generation, chatbots, code generation, creative writing, virtual assistants, and many more.

Another characteristic that makes LLMs so attractive is the fact that there are open-source options. Companies like Meta made their pre-trained LLM (Llama2 🦙) available in repositories like Hugging Face 🤗. Are these pre-trained LLMs good enough for each company’s specific use case? Certainly not.

Organizations could train an LLM from scratch with their own data. But the vast majority of them (almost all of them) wouldn’t have either the data or the computing capacity required for the task. It requires datasets with trillions of tokens, thousands of GPUs, and several months. Another option is to use a pre-trained LLM and tailor it for a specific use case. There are two main approaches to follow: fine-tuning and RAGs (Retrieval Augmented Generation).

In this article, we will compare the performance of an isolated pre-trained Llama2 with a pre-trained LLama2 integrated in a RAG system to answer questions about the latest news regarding OpenAI. We will start by explaining how RAGs work and the architecture of their sub-modules (the retriever and the generator). We finish with a step-by-step implementation of how we can build a RAG system for any use case using LangChain 🦜️ and Hugging Face.