Introduction
Not too long ago, I attempted to build a simple custom chatbot that would be run entirely on my CPU.
The results were appalling, with the application crashing frequently. That being said, this is not a shocking outcome. As it turns out, housing a 13B parameter model on a $600 computer is the programming equivalent to making a toddler trek a mountain.
This time, I made a more serious attempt towards building a research chatbot with an end-to-end project that uses AWS to house and provide access to the models needed to build the application.
The following article details my efforts in leveraging RAG to build a high-performant research chatbot that answers questions with information from research papers.
Objective
The aim of this project is to build a QA chatbot using the RAG framework. It will answer questions using the content in pdf documents available on the arXIV repository.
Before delving into the project, let’s consider the architecture, the tech stack, and the procedure for building the chatbot.
Chatbot Architecture
The diagram above illustrates the workflow for the LLM application.
When a user submits a query on a user interface, the query will get transformed using an embedding model. Then, the vector database will retrieve the most similar embeddings and send them along with the embedded query to the LLM. The LLM will use the provided context to generate an accurate response, which will be shown to the user on the user interface.
Tech Stack
Building the RAG application with the components shown in the architecture will require several tools. The noteworthy tools are the following:
- Amazon Bedrock
Amazon Bedrock is a serverless service that allows users access to models via API…