Beam Search: the Most Used Algorithm in Sequence Models | by Riccardo Andreoni | Dec, 2023

Editor
2 Min Read


Learn the working principles of the most famous algorithm for text translation and speech recognition.

Beam Search allows to consider multiple streams of candidates. Image source: unsplash.com.

Imagine you’re an AI language model, like ChatGPT, completing a sentence. How do you choose the next word to make it not just grammatically correct but also contextually relevant? This is where Beam Search steps in.

By efficiently exploring multiple possibilities in parallel and maintaining top candidates at each step, Beam Search plays a crucial role in the task of predicting subsequent elements. Being an effective and powerful algorithm, it ensures output aligns with grammatical constraints and the context.

To understand Beam Search’s impact, think about all the applications requiring precise sequence generation, like language translation, text completion, and chatbots. In all these applications, Beam Search plays a critical role.

In this article, I will introduce the theory and guide you through a practical step-by-step example of the Beam Search algorithm. I will also present several Beam Search variants, and detail all the pros and cons of this fundamental algorithm.

Imagine you need to translate the following sentence from Spanish to English:

Pablo estará en Nueva York la próxima semana.

We don’t only want to obtain a correct translation, we want to obtain the best one. For a language model, the best output coincides with the most probable one.

To achieve this task, most sequence-to-sequence models employ Beam Search. It serves as an heuristic algorithm, systematically exploring multiple possibilities in parallel. At each step, a defined “beam width” maintains a fixed number of top candidates. This allows the algorithm to explore several candidates.

This approach mimics decision-making processes, with the model evaluating and selecting the most promising options.

Consider a standard sequence-to-sequence model, represented by the simple network below:

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.