Ali Gulum

Distance Formulas

Ali Gulum

Algorithms like K-Nearest Neighbor and most clustering methods need a way to answer one basic question: how far apart are two data points? The answer depends on which distance formula you pick, so it is worth knowing the three you will run into most often.

Euclidean Distance

The Euclidean distance, also called the L2 distance or L2 norm, is the "ordinary" straight-line distance between two points, exactly as you would measure it with a ruler in everyday geometric space. It is the default choice for most KNN implementations, including the one I use in the K-Nearest Neighbor article.

Manhattan Distance

Manhattan distance measures the distance you would travel along a grid, like a taxi driving along city blocks, instead of cutting a straight line through space. It sums the absolute differences across each dimension, which makes it more robust to outliers than Euclidean distance in some datasets.

Cosine Distance

Cosine Distance

Cosine distance does not measure distance in the geometric sense at all, it measures the angle between two vectors. Two points can be far apart in magnitude but still point in nearly the same direction, which is exactly the case cosine distance is built for: text similarity, recommendation systems, anywhere magnitude shouldn't dominate the comparison.

Picking the right formula matters more than people expect, the wrong distance metric can make an otherwise correct algorithm perform badly. Thanks for reading.