Mahalanobis Distance

Mahalanobis distance is a Euclidean distance that is transformed through a change of basis to normalize variance.


Definition

Euclidean distance is typically described as xTx, but an equivalent formulation is xTITIx. In a two-dimensional graph, plotting the points with a Euclidean distance of 1 around the origin results in a unit circle.

The distance can be transformed to a different basis by swapping the identity matrix with some other A: xTATAx. The two-dimensional graph will now appear as an ellipsoid. This ellipsoid is axis-aligned (i.e. appears to be stretched along the x or y axes) if the A is diagonal.

Of course distance can be calculated from any arbitrary point, not just the origin. Subtract the difference between the origin and the true reference point, leading to (x-m)T(x-m) or (x-m)TATA(x-m).


Application

For computing the variance-normalized distance between two testable measurements, instead of using a simple Euclidean distance (i.e. xTx), use a Mahalanobis distance with the respective means and the covariance matrix (usually notated as Σ).

The measurement must be normalized to the mean: (x-μ).

Given the normalized measurement, the covariance matrix describes how the unit variance was transformed into some other variances. Therefore the inverse of the covariance matrix (Σ-1) describes the inverse transformation. Specifically, A is substituted with Σ-0.5. A covariance matrix is always positive semi-definite so it can always be inverted and can always have the square root taken. ATA then evaluates to Σ-1.

The Mahalanobis distance is thus implemented as (x-μ)TΣ-1(x-μ).


CategoryRicottone

Statistics/MahalanobisDistance (last edited 2024-06-06 02:35:32 by DominicRicottone)