Mahalanobis Distance

Mahalanobis distance is a Euclidean distance that is transformed through a change of basis to normalize variance.

Contents

Mahalanobis Distance
1. Description
  1. Properties
  2. Geometry
2. Usage
  1. Normalized Euclidean distance

Description

Mahalanobis distance is equivalent to Euclidean distance with a change in basis.

Squared Euclidean distance is commonly formulated as...

given a vector x⃗ and the origin as a reference point, x⃗^Tx⃗.
given two vectors x⃗ and y⃗, (x⃗-y⃗)^T(x⃗-y⃗).
- Let z⃗ = x⃗ - y⃗, so (x⃗-y⃗)^T(x⃗-y⃗) = z⃗^Tz⃗.
given a column x and a column of population means as μ, (x-μ)^T(x-μ).

Never forget to take the square root!

Note that this is equivalent to x^TIx. A change of basis can be affected by swapping the identity matrix with some other A^-1 (so notated because the motivation is generally that there is some other linear transformation A that pre-exists, and needs to be undone).

The squared Mahalanobis distance is then calculated as...

given a vector x⃗ and the origin as a reference point, x⃗^TA^-1x⃗.
given two vectors x⃗ and y⃗, (x⃗-y⃗)^TA^-1(x⃗-y⃗).
- Let z⃗ = x⃗ - y⃗, so (x⃗-y⃗)^TA^-1(x⃗-y⃗) = z⃗^TA^-1z⃗.
given a column x and a column of population means as μ, (x-μ)^TA^-1(x-μ).

Again, never forget to take the square root!

Properties

Mahalanobis distance is invariant under non-singular linear transformations. Let Y₁ = a + bX₁ and Y₂ = a + bX₂, and suppose that b is non-singular. Then d_M(Y₁,Y₂) = d_M(X₁,X₂).

Geometry

In a two-dimensional graph, plotting the points with a Euclidean distance of 1 around the origin results in a unit circle. The change of basis described by A transforms the circle into an ellipsoid.

Note that if A is diagonal, the ellipsoid will be axis-aligned (i.e., appear to be stretched along the x or y axes).

Usage

Mahalanobis distances are appropriate for calculating variance-normalized distance under a multivariate distribution, as for test statistics. The change of basis is established by the inverse covariance matrix, notated as Σ^-1.

Normalized Euclidean distance

Using a diagonal matrix of variance terms ignores correlations between the terms. It is effectively an assumption of independence. Despite not being true Mahalanobis distance, there are still some utilities to this calculation.

The mahascore documentation calls this metric 'normalized Euclidean distance'.

CategoryRicottone