Portable Document Format files (PDFs) have been floating around in the digital world since their inception by Adobe in the early 1990s. Designed to preserve formatting across different devices, PDFs quickly became the go-to format for sharing everything from contracts to annual reports and complex financial documents.
In finance, legal services, and many (if not all) other sectors, PDFs have remained a mainstay to this day. Anyone can open a PDF, and it always displays the same way, no matter what reader is being used. This is an advantage for files that should not change — unlike, say, editable word or PowerPoint files.
One disadvantage of PDFs is that they are meant for human eyes. In other words, if you want to process a 400-page report, initially you might need to open it manually and at least scroll through to the relevant sections yourself. This is a problem when working with large volumes of data, stored in PDFs.
Training chatbots on such large files remains challenging, not to mention energy-consuming. Even when you succeed, state-of-the-art chatbots give unreliable answers at best when queried about the contents. Fine-tuning such chatbots to the type…