Courage to Learn ML: Unraveling L1 & L2 Regularization (part 2) | by Amy Ma

Unlocking the Intuition Behind L1 Sparsity with Lagrange multipliers

Welcome back to ‘Courage to Learn ML: Unraveling L1 & L2 Regularization,’ Part Two. In our previous discussion, we explored the benefits of smaller coefficients and the means to attain them through weight penalization techniques. Now, in this follow-up, our mentor and learner will delve even deeper into the realm of L1 and L2 regularization.

If you’ve been pondering questions like these, you’re in the right place:

What’s the reason behind the names L1 and L2 regularization?
How do we interpret the classic L1 and L2 regularization graph?
What are Lagrange multipliers, and how can we understand them intuitively?
Applying Lagrange multipliers to comprehend L1 sparsity.

Your engagement — likes, comments, and follows — does more than just boost morale; it powers our journey of discovery! So, let’s dive in.

Photo by Aarón Blanco Tejedor on Unsplash

The name, L1 and L2 regularization, comes from the concept of Lp norms directly. Lp norms represent different ways to calculate distances from a point to the origin in a space. For instance, the L1 norm, also known as Manhattan distance, calculates the distance using the absolute values of coordinates, like ∣x∣+∣y∣. On the other hand, the L2 norm, or Euclidean distance, calculates it as the square root of the sum of the squared values, which is sqrt(x² + y²)

In the context of regularization in machine learning, these norms are used to create penalty terms that are added to the loss function. You can think of Lp regularization as measuring the total distance of the model’s weights from the origin in a high-dimensional space. The choice of norm affects the nature of this penalty: the L1 norm tends to make some coefficients zero, effectively selecting more important features, while the L2 norm shrinks the coefficients towards zero, ensuring no single feature disproportionately influences the model.

Therefore, L1 and L2 regularization are named after these mathematical norms — L1 norm and L2 norm —…