Mahalanobis两个向量之间的距离 [英] Mahalanobis distance between two vectors

查看:666
本文介绍了Mahalanobis两个向量之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试应用 mahal 计算27个变量的2个行向量之间的马哈拉诺比斯距离 mahal(X,Y),其中 X Y 是两个向量。但是,它会出现错误:

I tried to apply mahal to calculate the Mahalanobis distance between 2 row-vectors of 27 variables, i.e mahal(X, Y), where X and Y are the two vectors. However, it comes up with an error:


X的行数必须超过列数。

经过几分钟的研究后,我知道我不能像这样使用它,但我我还不确定为什么。有人可以向我解释一下吗?

After a few minutes of research I got that I can't use it like this, but I'm still not sure sure why. Can some explain it to me?

此外我还有一个 mahal 方法的例子:

Also I have below an example of mahal method :

>> mahal([1.55 5 32],[5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;])

ans =    
   11.1706

在这种情况下,有人可以澄清MATLAB如何计算答案吗?

Can someone clarify how MATLAB calculate the answer in this case?

编辑:

我发现此代码计算马哈拉诺比斯距离:


I found this code that calculate the mahalanobis distance:

S = cov(X);
mu = mean(X);
d = (Y-mu)*inv(S)*(Y-mu)'
d = ((Y-mu)/S)*(Y-mu)'; % <-- Mathworks prefers this way

我在上测试了它[1.55 5 32] ,和 [5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;] 它给了我相同的结果,好像我使用 mahal 函数(11.1706),我试图计算距离在27个变量的2个向量之间它起作用。你怎么看待这件事?我可以指望这个解决方案,因为 mahal 函数无法满足我的需求吗?

I tested it on [1.55 5 32], and [5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;] and it gave me the same result as if I used the mahal function (11.1706), and I tried to calculate the distance between the 2 vectors of 27 variables and it works. What do you think about it? Can I count on this solution since the mahal function can't do what I need?

推荐答案


mahal(X,Y) ...给了我这个错误:

X的行数必须超过列数。

mahal(X,Y)... gave me this error:
"The number of rows of X must exceed the number of columns."

文档指出 Y 必须有比行更多的行(还要注意文档表示 X 作为第二个输入参数,而不是第一个)。对您而言,这意味着您输入 mahal 第二个数组的行数多于列数。

The documentation states that Y must have more rows than columns (also note that the documentation denotes X as the second input parameter, not the first). For you this means that the second array that you're feeding into mahal has more rows than columns.

为什么这么重要?此限制的目的是确保 mahal 具有足够的数据来构建用于计算马哈拉诺比斯距离的相关矩阵。如果没有足够的信息,输出将是垃圾。

Why is that so important? The purpose of this restriction is make sure that mahal has enough data to build the correlation matrix used in the computation of the Mahalanobis distance. If there's not enough information, the output would be garbage.

在您的情况下,您的输入数组是两个输入向量,每个向量有27个元素。 27个元素是否与不同的观察结果相对应,或者它们是27个变量的一个观察结果?如果它是前者,只需确保两个向量都是列向量:

In your case your input arrays are two input vectors, each having 27 elements. Are the 27 elements correspond to different observations, or are they one observation of 27 variables? If it's the former, just make sure both vectors are column vectors:

mahal(X(:), Y(:))

你很高兴。如果每个向量只包含一个观测值,则对协方差矩阵的估计将完全不准确。同样,输入的行应该是观察结果!

and you're good to go. If each vector contains only one observation, your estimation of the covariance matrix will be entirely inaccurate. Again, the rows of the inputs should be the observations!


在这种情况下,有人可以澄清MATLAB如何计算答案吗?

Can someone clarify how MATLAB calculated the answer in this case?

马哈拉诺比斯距离两个向量 x y 之间是: d M x y )= sqrt(( x - y T S -1 x - y )),其中 S 是他们的协方差矩阵。

The Mahalanobis distance between two vectors x and y is: dM(x, y) = sqrt((x-y)TS-1(x-y)), where S is their covariance matrix.

在MATLAB 1 mahal(Y,X)以下列方式有效实施:

In MATLAB1 mahal(Y,X) is efficiently implemented in the following manner:

m = mean(X,1);
M = m(ones(ry,1),:);
C = X - m(ones(rx,1),:);
[Q,R] = qr(C,0);

ri = R'\(Y-M)';
d = sum(ri.*ri,1)'*(rx-1);

您可以通过以下方式验证:

You can verify that with:

type mahal

注意MATLAB计算平方单位,所以在你的例子中,马哈拉诺比斯距离实际上是11.1706的平方根, 3.3422。

Note that MATLAB calculates the Mahalanobis distance in squared units, so in your example the Mahalanobis distance is actually the square root of 11.1706, i.e 3.3422.


我可以依靠这个[我的]解决方案,因为 mahal 函数不能满足我的需求吗?

Can I count on this [my] solution since the mahal function can't do what I need?

你正确地做了一切,所以使用起来是安全的。话虽如此,请注意MATLAB确实限制了第二个输入数组的尺寸有充分的理由(如上所述)。

You're doing everything correctly, so it's safe to use. Having said that, note that MATLAB did restrict the dimensions of the second input array for a good reason (stated above).

如果 X 只包含一行, cov 会自动将其转换为列向量,这意味着每个值都将被视为不同的观察值。结果 S 将是不准确的(如果不是垃圾)。

If X contains only one row, cov automatically converts it to a column vector, which means that each value will be treated as a different observation. The resulting S would be inaccurate (if not garbage).

1 检查MATLAB发布版本R2007b。

1 Checked for MATLAB release version R2007b.

这篇关于Mahalanobis两个向量之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆