Differences between revisions 3 and 9 (spanning 6 versions)

Mahalanobis Distance

Mahalanobis distance is a Euclidean distance that is transformed through a change of basis to normalize variance.

Contents

Mahalanobis Distance
1. Description
  1. Properties
  2. Geometry
2. Usage

Description

Mahalanobis distance is equivalent to Euclidean distance with a change in basis.

Euclidean distance is commonly formulated as...

given a vector x⃗ and the origin as a reference point, x⃗^Tx⃗.
given two vectors x⃗ and y⃗, (x⃗-y⃗)^T(x⃗-y⃗).
- Let z⃗ = x⃗ - y⃗, so (x⃗-y⃗)^T(x⃗-y⃗) = z⃗^Tz⃗.
given a column x and a column of population means as μ, (x-μ)^T(x-μ).

Note that this is equivalent to x^TI^TIx. A change of basis can be effected by swapping the identity matrix with some other A: x^TA^TAx.

Properties

Mahalanobis distance is invariant under non-singular linear transformations. Let Y₁ = a + bX₁ and Y₂ = a + bX₂, and suppose that b is non-singular. Then d_M(Y₁,Y₂) = d_M(X₁,X₂).

Geometry

In a two-dimensional graph, plotting the points with a Euclidean distance of 1 around the origin results in a unit circle. The change of basis described by A transforms the circle into an ellipsoid.

Note that if A is diagonal, the ellipsoid will be axis-aligned (i.e., appear to be stretched along the x or y axes).

Usage

Mahalanobis distances are appropriate for calculating variance-normalized distances, as for test statistics. The change of basis is established by the covariance matrix, notated as Σ. More specifically, using the standard deviation matrix (√Σ = Σ^0.5).

The variance-normalized distance from a distribution to an estimate in a single dimension can be calculated with, e.g., the Z-statistic: (x̂-μ_X)/σ_X. (Henceforward measurements are normalized: x = x̂-μ_X.) This can be repeated for any number of dimensions. If variance is unit and independent across dimensions, then the joint distance from the multivariate distribution can be calculated (for two dimensions) like: √(x^Tx + y^Ty) = √((x-y)^T(x-y)). But variances are not unit and do correlate, as described by the covariate matrix. The change of basis must 'undo' this distribution, ergo the inverse of the standard deviation matrix (√(Σ^-1) = Σ^-0.5) should be used for A.

Note that a covariance matrix is...

square
symmetric, so Σ^T = Σ
positive semi-definite, so...
- Σ^0.5 can always be evaluated
- the determinant is bound by |Σ| >= 0, so...
  - either |Σ| = 0 or Σ is invertible

After substitution, using the symmetric rule, and simplifying exponents through the product rule, A^TA becomes Σ^-1. In summary, the variance-normalized distance is calculated like: √((x-y)^TΣ^-1(x-y))

CategoryRicottone

Statistics/MahalanobisDistance (last edited 2025-11-03 01:46:25 by DominicRicottone)

-  ⇤ ← Revision 3 as of 2025-03-27 19:23:52 → 
  Size: 3105
  Editor: DominicRicottone
  Comment: Rewrite for clarity
+   ← Revision 9 as of 2025-10-06 16:01:53 → ⇥
  Size: 3626
  Editor: DominicRicottone
  Comment: More notes
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-'''Mahalanobis distance''' is a [[LinearAlgebra/Distance|Euclidean distance]] that is transformed through a [[LinearAlgebra/Basis#Change_of_Basis|change of basis]] to normalize variance.
+'''Mahalanobis distance''' is a [[Calculus/Distance#Euclidean_distance|Euclidean distance]] that is transformed through a [[LinearAlgebra/Basis#Change_of_Basis|change of basis]] to normalize variance.
 Line 13:
-Mahalanobis distance is equivalent to [[LinearAlgebra/Distance|Euclidean distance]] with a change in [[LinearAlgebra/Basis|basis]].
+Mahalanobis distance is equivalent to [[Calculus/Distance#Euclidean_distance|Euclidean distance]] with a change in [[LinearAlgebra/Basis|basis]].
 Line 15:
-Euclidean distance is commonly formulated as ''(x-y)^T^(x-y)'' (or if the reference point is the origin, just ''x^T^x''), but an equivalent formulation looks like ''x^T^'''I'''^T^'''I'''x''.
+Euclidean distance is commonly formulated as...
 * given a vector x⃗ and the origin as a reference point, ''x⃗^T^x⃗''.
 * given two vectors x⃗ and y⃗, ''(x⃗-y⃗)^T^(x⃗-y⃗)''.
   * Let ''z⃗ = x⃗ - y⃗'', so ''(x⃗-y⃗)^T^(x⃗-y⃗) = z⃗^T^z⃗''.
 * given a column ''x'' and a column of population means as ''μ'', ''(x-μ)^T^(x-μ)''.
-Line 17:
+Line 21:
-A change of basis can be effected by swapping the [[LinearAlgebra/SpecialMatrices#Identity_Matrix|identity matrix]] with some other '''''A''''': ''x^T^'''A'''^T^'''A'''x''.
+Note that this is equivalent to ''x^T^'''I'''^T^'''I'''x''. A change of basis can be effected by swapping the [[LinearAlgebra/SpecialMatrices#Identity_Matrix|identity matrix]] with some other '''''A''''': ''x^T^'''A'''^T^'''A'''x''.
-Line 21:
+Line 25:
-=== Graphing ===
+=== Properties ===

Mahalanobis distance is invariant under [[LinearAlgebra/Invertibility|non-singular]] linear transformations. Let ''Y,,1,, = a + '''b'''X,,1,,'' and ''Y,,2,, = a + '''b'''X,,2,,'', and suppose that '''''b''''' is non-singular. Then ''d,,M,,(Y,,1,,,Y,,2,,) = d,,M,,(X,,1,,,X,,2,,)''.



=== Geometry ===
-Line 25:
+Line 35:
-Note that if '''''A''''' is [[LinearAlgebra/SpecialMatrices#Diagonal_Matrices|diagonal]], the ellipsoid will be '''axis-aligned''' (i.e., appear to be stretched along the ''x'' or ''y'' axes).
+Note that if '''''A''''' is [[LinearAlgebra/Diagonalization|diagonal]], the ellipsoid will be '''axis-aligned''' (i.e., appear to be stretched along the ''x'' or ''y'' axes).
-Line 38:
+Line 48:
- * always square [[LinearAlgebra/MatrixProperties#Symmetry|symmetric]], so '''''Σ'''^T^ = '''Σ'''''
 * always [[LinearAlgebra/MatrixProperties#Positive_Semi-definite|positive semi-definite]], so...
+ * square
 * [[LinearAlgebra/SpecialMatrices#Symmetric_Matrices|symmetric]], so '''''Σ'''^T^ = '''Σ'''''
 * [[LinearAlgebra/PositiveDefiniteness|positive semi-definite]], so...
-Line 41:
+Line 52:
-   * the [[LinearAlgebra/Determinants|determinant]] is bound by ''|'''Σ'''| >= 0'', so...
     * either ''|'''Σ'''| = 0'' or '''''Σ''''' is [[LinearAlgebra/MatrixProperties#Invertible|invertible]]
+   * the [[LinearAlgebra/Determinant|determinant]] is bound by ''|'''Σ'''| >= 0'', so...
     * either ''|'''Σ'''| = 0'' or '''''Σ''''' is [[LinearAlgebra/Invertibility|invertible]]

Diff for "Statistics/MahalanobisDistance"

Mahalanobis Distance

Description

Properties

Geometry

Usage