Demystifying Mixtral of Experts. Mistral AI’s open-source Mixtral 8x7B… | by Samuel Flender | Mar, 2024

Editor
3 Min Read


Mistral AI’s open-source Mixtral 8x7B model made a lot of waves — here’s what’s under the hood

Image generated with GPT-4

Mixtral 8x7B, Mistral AI’s new sparse Mixtures of Experts LLM, recently made a lot of waves, with dramatic headlines such as “Mistral AI Introduces Mixtral 8x7B: a Sparse Mixture of Experts (SMoE) Language Model Transforming Machine Learning or “Mistral AI’s Mixtral 8x7B surpasses GPT-3.5, shaking up the AI world

Mistral AI is a French AI startup founded in 2023 by former engineers from Meta and Google. The company released Mixtral 8x7B — in what was perhaps the most unceremonious release in the history of LLMs — by simply dumping the Torrent magnet link on their Twitter account on December 8th, 2023,

Twitter

sparking numerous memes about Mistral’s unconventional way to release models.

Mixtral of Experts” (Jiang et al 2024), the accompanying research paper, was published about a month later, on January 8th of this year, on Arxiv. Let’s take a look, and see if the hype is warranted.

(Spoiler alert: under the hood, there’s not much that’s technically new.)

But first, for context, a little bit of history.

Sparse MoE in LLMs: a brief history

Mixtures of Experts (MoE) models trace back to research from the early 90s (Jacobs et al 1991). The idea is to model a prediction y using the weighted sum of experts E, where the weights are determined by a gating network G. It’s a way to divide a large and complex problem into distinct and smaller sub-problems. Divide and conquer, if you will. For example, in the original work, the authors showed how different experts learn to specialize in different decision boundaries in a vowel discrimination problem.

However, what really made MoE fly was top-k routing, an idea first introduced in the 2017 paper “Outrageously large neural networks” (Shazeer et al 2017). The key idea is to compute the output of just the top k experts instead of all of them, which allows us to keep FLOPs constant even when…



Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.