Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure invented by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set.
Formally, the Mahalanobis distance from a group of values with mean
and covariance matrix Σ
for a multivariate vector
is defined as:
Mahalanobis distance can also be defined as dissimilarity measure between two random vectors
and
of the same distribution with the covariance matrix
Σ :
If the covariance matrix is the identity matrix then it is the same as Euclidean distance. If covariance matrix is diagonal, then it is called normalized Euclidean distance:
where σi is the standard deviation of the xi over the sample set.
