Training LLM, from Scratch, in Rust | by Stefano Bosisio | Dec, 2024

Editor
2 Min Read


In this companion article, I’ll show my implementation for training from scratch a GPT-like model, in Rust. No GPUs, only CPUs, with a performance 30 times better than the native C code.

Image by GoogleDeepMind on Unsplash

In my last article, I introduced the problem of matrix multiplication, how the attention algorithm uses matrix multiplication to perform an averaging process, and how to efficiently implement — or at least, for me — a matrix multiplication function in Rust with Blas.

In this new article, I want to show my first building block for implementing llm.c in Rust, namely, training a GPT-like model from scratch using Rust. This has been my way of learning more and more about the Rust ecosystem and understanding how comparable is with C. In particular, I want my code to be able to train a GPT-like model, starting from GPT weights, using only CPUs— so no GPUs or TPUs. My aim is to understand how much we can push these models on simple laptops, and how much the Rust ecosystem can be used for this. Eventually, this code may also be useful to fine-tune GPT models with a given input corpus.

All the relevant pieces of code can be found here.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.