What Are The Dimensions For Creating Retrieval Augmented Generation (RAG) Pipelines?

Editor
4 Min Read


In the dynamic realm of Artificial Intelligence, Natural Language Processing (NLP), and Information Retrieval, advanced architectures like Retrieval Augmented Generation (RAG) have gained a significant amount of attention. However, most data science researchers suggest not to leap into sophisticated RAG models until the evaluation pipeline is completely reliable and robust.

Carefully assessing RAG pipelines is vital, but it is frequently overlooked in the rush to incorporate cutting-edge features. It is recommended that researchers and practitioners strengthen their evaluation set up as a top priority before tackling intricate model improvements. 

Comprehending the assessment nuances for RAG pipelines is critical because these models depend on both generation capabilities and retrieval quality. The dimensions have been divided into two important categories, which are as follows.

 1. Retrieval Dimensions  

a. Context Precision: It determines if every ground-truth item in the context has a higher priority ranking than any other item.

b. Context Recall: It assesses the degree to which the ground-truth response and the recovered context correspond. It is dependent on the retrieved context as well as the ground truth.

c. Context Relevance: It evaluates the contexts that are offered in order to assess the relevance of the retrieved context.

d. Context Entity Recall: By comparing the number of entities present in the ground truths and the contexts to the number of entities present in the ground truths alone, the Context Entity Recall metric calculates the recall of the retrieved context.

e. Noise Robustness: The Noise Robustness metric assesses the model’s ability to handle question-related noise documents that don’t provide much information.

2. Generation dimensions

a. Faithfulness: It evaluates the generated response’s factual consistency in according to the given context. 

b. Answer Relevance It calculates how well the generated response responds to the given question. Lower points are awarded for answers that contain redundant or missing information, and vice versa. 

c. Negative Rejection: It assesses the model’s capacity to hold off on responding when the documents it has obtained don’t include enough information to address a query. 

d. Information Integration: It evaluates how well the model can integrate data from different documents to provide answers to complex questions.

e. Counterfactual Robustness: It assesses the model’s ability to recognize and ignore known errors in documents, even while it is aware of possible disinformation.

Here are some frameworks consisting of these dimensions which can be accessed by the following links.

1. Ragas https://docs.ragas.io/en/stable/

2. TruLenshttps://www.trulens.org/

3. ARES https://ares-ai.vercel.app/

4. DeepEvalhttps://docs.confident-ai.com/docs/getting-started

5. Tonic Validate – https://docs.tonic.ai/validate

6. LangFuse https://langfuse.com/


This article is inspired by this LinkedIn post.


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.


Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.