= Mahalanobis Distance = '''Mahalanobis distance''' is a [[Calculus/Distance#Euclidean_distance|Euclidean distance]] that is transformed through a [[LinearAlgebra/Basis#Change_of_Basis|change of basis]] to normalize [[Statistics/Variance|variance]]. <> ---- == Description == Mahalanobis distance is equivalent to [[Calculus/Distance#Euclidean_distance|Euclidean distance]] with a change in [[LinearAlgebra/Basis|basis]]. ''Squared'' Euclidean distance is commonly formulated as... * given a vector x⃗ and the origin as a reference point, ''x⃗^T^x⃗''. * given two vectors x⃗ and y⃗, ''(x⃗-y⃗)^T^(x⃗-y⃗)''. * Let ''z⃗ = x⃗ - y⃗'', so ''(x⃗-y⃗)^T^(x⃗-y⃗) = z⃗^T^z⃗''. * given a column ''x'' and a column of population means as ''μ'', ''(x-μ)^T^(x-μ)''. Never forget to take the square root! Note that this is equivalent to ''x^T^'''I'''x''. A change of basis can be affected by swapping the [[LinearAlgebra/SpecialMatrices#Identity_Matrix|identity matrix]] with some other '''''A'''^-1^'' (so notated because the motivation is generally that there is some other linear transformation '''''A''''' that pre-exists, and needs to be undone). The ''squared'' Mahalanobis distance is then calculated as... * given a vector x⃗ and the origin as a reference point, ''x⃗^T^'''A'''^-1^x⃗''. * given two vectors x⃗ and y⃗, ''(x⃗-y⃗)^T^'''A'''^-1^(x⃗-y⃗)''. * Let ''z⃗ = x⃗ - y⃗'', so ''(x⃗-y⃗)^T^'''A'''^-1^(x⃗-y⃗) = z⃗^T^'''A'''^-1^z⃗''. * given a column ''x'' and a column of population means as ''μ'', ''(x-μ)^T^'''A'''^-1^(x-μ)''. Again, never forget to take the square root! === Properties === Mahalanobis distance is invariant under [[LinearAlgebra/Invertibility|non-singular]] linear transformations. Let ''Y,,1,, = a + '''b'''X,,1,,'' and ''Y,,2,, = a + '''b'''X,,2,,'', and suppose that '''''b''''' is non-singular. Then ''d,,M,,(Y,,1,,,Y,,2,,) = d,,M,,(X,,1,,,X,,2,,)''. === Geometry === In a two-dimensional graph, plotting the points with a Euclidean distance of 1 around the origin results in a unit circle. The change of basis described by '''''A''''' transforms the circle into an ellipsoid. Note that if '''''A''''' is [[LinearAlgebra/Diagonalization|diagonal]], the ellipsoid will be '''axis-aligned''' (i.e., appear to be stretched along the ''x'' or ''y'' axes). ---- == Usage == Mahalanobis distances are appropriate for calculating [[Statistics/Variance|variance]]-normalized distance under a multivariate distribution, as for [[Statistics/TestStatistic|test statistics]]. The change of [[LinearAlgebra/Basis|basis]] is established by the [[Statistics/Covariance#Matrix|inverse covariance matrix]], notated as '''''Σ'''^-1^''. === Normalized Euclidean distance === Using a diagonal matrix of variance terms ignores correlations between the terms. It is effectively an assumption of [[Statistics/JointProbability#Independence|independence]]. Despite not being true Mahalanobis distance, there are still some utilities to this calculation. The [[Stata/Mahapick|mahascore]] documentation calls this metric 'normalized Euclidean distance'. ---- CategoryRicottone