After reading this article you’ll have an in-depth understanding of how the Earth Mover’s Distance (aka EMD or Wasserstein Distance) is calculated. From that knowledge, you’ll have a good idea of its benefits and drawbacks in various applications.
Contents
- Definition and intuition of Earth Mover’s Distance (EMD)
- Applications of EMD
- Calculating EMD from scratch
- Calculating EMD with the scipy package
- Conclusion
Definition and intuition of Earth Mover’s Distance
The Earth Mover’s Distance is a specific calculation to measure the difference between two distributions. The name “Earth Mover’s Distance” comes from its intuitive interpretation. Imagine you have two piles of dirt (or earth) that are in different locations and have different shapes. The EMD is how much work (defined as the total amount of earth moved times the distance) it takes to move the second pile to look like the first pile.
I think this is best illustrated in an example: Let’s say we have two distributions, A and B, and we want to know how different they are. EMD, answers this question by transforming A into B and measuring how much total work was done (i.e. number of units moved X distance moved) to make the transformation. The example below illustrates calculating the EMD for two simple distributions:
The name for the set of moves we make to transform one distribution into the other is called a ‘transport plan’ — think of transporting dirt or material from one location to another.
The transport plan for the graphic above looks like this:
The transport plan shows us the most efficient way of transforming distribution A into…