A practical overview of the processes, roles, and complexities involved in the development of a GPT-based virtual assistant
Throughout 2023, we have been developing a GPT-model-based virtual assistant for the employees of Enefit (one of the largest energy companies in the Baltics). In the first article (read here), I overviewed the problem, development process, and initial results. In this article, I will delve into the significance of the non-technology related challenges in developing a virtual assistant.
At the beginning of 2023, it was clear that a breakthrough had occurred in the technology of large language models. Unlike the chatbots from the past decade that often disappointed users by the second or third question, ChatGPT came out as precise, versatile, and genuinely helpful. The decision by OpenAI and Microsoft to offer programmatic access to their GPT models, through an open API service, created the opportunity to implement company-specific use cases.
We approached the Enefit virtual assistant project knowing that the base technology was ready, internal interest was high, and the software development challenge that, while novel and complex, was achievable with good specialists.
In the early stages of development, this narrative proved correct: nearly 80% of the project activities consisted of software development tasks, and 20% were non-technology related activities. As the project progressed, these proportions changed drastically, leading to the need for entirely new processes and roles.
A virtual assistant can provide company-specific information only as accurately as the underlying documents permit. In other words, if the foundational documents contain incorrect, poorly structured, or outdated information, the virtual assistant can’t provide much better answers. This is commonly referred to as the GIGO (garbage in, garbage out) principle, which sets fundamental limits to AI capabilities.
Therefore, a significant part of building a virtual assistant is ensuring the quality of the data/information. This includes:
- Assigning an owner to every document/information group who will be responsible for the accuracy of the information.
- Agreeing on a feedback mechanism that allows virtual assistant users to report bad answers or misinformation.
- Establishing a feedback management process to ensure that user feedback reaches the information owner and is acted upon.
Essentially, this means that all parties are involved in data management: users who provide continuous feedback and data owners responsible for responding to that feedback.
Document owners can also contribute to improving how the virtual assistant finds the information under their purview by enriching document sections with keywords, testing the virtual assistant’s accuracy, restructuring content when necessary, testing, improving, testing, improving, … In essence, information owners should view the virtual assistant as a colleague with whom they need to collaborate!
To conclude this section, I’ll touch upon Microsoft’s new Copilot. At the moment, all eyes are on the launch of Copilot. Most tech enthusiasts have watched the demo videos and expect it to be a plug-and-play product that semi-magically provides good answers to company-related questions. However, this expectation will likely lead to disappointment, as even Copilot isn’t immune to the GIGO principle.
Looking beyond Copilot’s marketing videos, we find extensive documentation on document management requirements. In summary, Microsoft expects that (read more):
- All outdated documents will be deleted.
- All documents should contain accurate and relevant information.
- Companies should establish a new data governance process to ensure the above.
- Documents should be enriched with keywords to enhance the search.
These are high requirements, especially when we’re talking about documents stored on employees’ computers.
To be clear, I consider Copilot a fantastic new technology. However, it’s crucial to emphasize that no virtual assistant technology can be successfully implemented without a data governance process.
Large pre-trained language models (e.g., GPT, Llama) are robotic logic machines. This means that if we want them to fulfill a specific role (e.g., executive assistant, contract assistant, legal expert), we need to guide them and provide style examples.
Directing the virtual assistant means giving the language model both the user’s question and a response guide. For example, “You are a virtual assistant of Enefit who knows about company policies and rules. If you can’t find the answer in the available information, say you don’t know…”
With this type of guidance, we can instruct the virtual assistant on how to behave, dictate the format it should use to respond, and highlight what it should avoid.
However, a general guide is often insufficient. For instance, a company may want the virtual assistant to follow a specific style (formal, friendly, etc.). In such cases, style examples, which are essentially question-answer pairs, can be provided. As language models are trained to continue existing text, the virtual assistant tries to answer user questions similarly to the provided style examples.
Creating response guides and style examples, testing different versions, and refining them constitutes the third significant part of virtual assistant development.
The ‘Virtual Assistant Trainer/Guide’ role is entirely new and can be effectively filled only by someone knowledgeable in the domain for which the virtual assistant is created. Effective development of the virtual assistant necessitates close collaboration among software developers, information owners, and the virtual assistant trainer, since the cause of each ‘poor’ response could lie with different specialists.
Developing a chatbot that functions at 80% efficiency is easy with today’s technologies, but creating a virtual assistant with 95% quality is a complex task.
At first glance, one might think that 80% is sufficient, so why go through so much trouble for the last 20 percentage points? In reality, based on the experience of the past decade with chatbots, we know that a chatbot that is precise 80% of the time does not surpass the “cognitive usefulness threshold” of users.
This cognitive usefulness threshold is a hidden benchmark that exists in all our minds, but we cannot precisely define where this limit lies. However, using technology, we quickly understand whether or not this limit has been crossed. If the quality of the technology falls below this threshold, we will completely abandon the use of the given technology.
In other words, the difference between 80% and 95% lies in the fact that in the first case, no one will start using this technology, and in the second case, it becomes an everyday assistant for many employees.
The difference between 80% and 95% accuracy lies in the fact that in the first case, no one will start using this technology, and in the second case, it becomes an everyday assistant for many employees!
To achieve this last 15–20%, it is necessary to implement a data management system that ensures the relevance of the foundational information, create new roles and processes associated with the development of the virtual assistant, train all parties on the new technology, and support the implementation and adoption at both strategic and operational levels. Therefore, technology only constitutes 1/3 of the virtual assistant development, while organizational and process-related challenges make up the remainder.