In this companion article, I’ll show my implementation for training from scratch a GPT-like model, in Rust. No GPUs, only CPUs, with a performance 30 times better than the native C code.
In my last article, I introduced the problem of matrix multiplication, how the attention algorithm uses matrix multiplication to perform an averaging process, and how to efficiently implement — or at least, for me — a matrix multiplication function in Rust with Blas.
In this new article, I want to show my first building block for implementing llm.c in Rust, namely, training a GPT-like model from scratch using Rust. This has been my way of learning more and more about the Rust ecosystem and understanding how comparable is with C. In particular, I want my code to be able to train a GPT-like model, starting from GPT weights, using only CPUs— so no GPUs or TPUs. My aim is to understand how much we can push these models on simple laptops, and how much the Rust ecosystem can be used for this. Eventually, this code may also be useful to fine-tune GPT models with a given input corpus.
All the relevant pieces of code can be found here.