Scipy:距离相关性高于1 [英] Scipy: distance correlation is higher than 1
问题描述
我正在尝试在各列之间找到距离相关,请看下面的代码。大多数情况下,它返回的结果高于1,这是不可能的,因为距离相关性介于0和1之间。您可以阅读有关scipy的距离相关性此处。
I'm trying to find distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. You can read about scipy's distance correlation here.
import numpy as np
from scipy.spatial import distance
x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)
1.00210811815
1.00210811815
这里出了什么问题或如何测量?
What is wrong here or how can I measure it?
upd1: 在github上发布链接
推荐答案
根据文档,我看不出这是为什么。
I don't see why this is a problem according to the documentation.
来自文档:
u和v之间的相关距离定义为 1-racfrac {(u-\bar {u})\cdot(v-\bar {v})}
{{||(u-\bar {u})|| } _2 {||(v-\bar {v})||} _2}
通过< a href = https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality rel = noreferrer> Cauchy-Schwarz不等式,减号后的表达式带有绝对值为1 。但是,并没有规定它不会是负数的-实际上,如果(平均归一化的)向量是反相关的,就会发生这种情况。
By the Cauchy-Schwarz Inequality, the expression following the minus sign has an absolute value that is at most 1. There is nothing stipulating that it won't be negative, though - in fact, this will happen if the (mean normalized) vectors are anticorrelated.
AFAICT,您应该如果您得到的值大于2或小于0感到很惊讶。使用@Cleb的注释以及范围为[0,2]的事实,我猜想其他一些包将距离简单地定义为一半这个表达式。
AFAICT, you should be surprised if you'd get a value larger than 2 or smaller than 0. Using the comment by @Cleb and the fact that the range is [0, 2], I'm guessing that some other packages simply define the distance as half this expression.
这篇关于Scipy:距离相关性高于1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!