The History of Open-Source LLMs: Imitation and Alignment (Part Three) | by Cameron R. Wolfe, Ph.D.

Open-source LLMs need alignment to become truly remarkable…

20 min read

16 hours ago

A majority of prior research on open-source large language models (LLMs) focused heavily upon creating pre-trained base models. However, these models have not undergone any fine-tuning, so they fail to match the quality of top closed-source LLMs (e.g., ChatGPT or Claude) due to their lack of alignment. Paid models are aligned extensively using techniques like SFT and RLHF, which greatly enhances their usability. In comparison, open-source models are typically fine-tuned to a lesser extent using smaller, public datasets. Within this overview, however, we will take a look at recent research that aims to improve the quality of open-source LLMs via more extensive fine-tuning and alignment.

This overview is the third (and final) part of my series on the history of open-source LLMs. In the first part of the series, we looked at initial attempts at creating open-source language models. Although these initial pre-trained LLMs performed poorly, they were quickly followed up by much better open-source base models, which we covered in part two of this series. Now, we will cover how these better open-source models can be fine-tuned/aligned to improve their quality and close the gap in performance between open-source and proprietary LLMs, completing the journey from initial models like OPT to the incredibly high-performing open-source LLMs that we have today (e.g., LLaMA-2-Chat).

The alignment process. This overview will study the fine-tuning and alignment process for open-source LLMs. Prior to studying research in this area, however, we need to understand what alignment is and how it is accomplished. We should recall that the training process for language models proceeds in several parts. As shown above, we begin with pre-training, which is followed by several fine-tuning steps. After pre-training, the LLM can accurately perform next…