RAG Evaluation Using Ragas. Best Practices RAG with Milvus vector… | by Christy Bergman

Best Practices RAG with Milvus vector database, part 1

Left: image by Microsoft Designer, drawn on March 18, 2024, with the prompt “Retrieval Augmented Generation components starting with docs, ending with LLM answer as a stained glass window”. Right: image by author depicting RAG evaluation.

Retrieval, a cornerstone of Generative AI systems, is still challenging. Retrieval Augmented Generation, or RAG for short, is an approach to building AI-powered chatbots that answer questions based on data the AI model, an LLM, has been trained on.

Evaluation data from sources like WikiEval show very low natural language retrieval accuracy. This means you will probably need to conduct experiments to tune RAG parameters for your GenAI system before deploying it. However, before you can do RAG experimentation, you need a way to evaluate which experiments had the best results!

Image source: https://arxiv.org/abs/2309.15217

Using Large Language Models (LLMs) as judges has gained prominence in modern RAG evaluation. This approach involves using powerful language models, like OpenAI’s GPT-4, to assess the quality of components in RAG systems. LLMs serve as judges by evaluating the relevance, precision, adherence to instructions, and overall quality of the responses produced by the RAG system.