Scipy:距离相关性高于1 [英] Scipy: distance correlation is higher than 1

查看:87
本文介绍了Scipy:距离相关性高于1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在各列之间找到距离相关,请看下面的代码。大多数情况下,它返回的结果高于1,这是不可能的,因为距离相关性介于0和1之间。您可以阅读有关scipy的距离相关性此处

I'm trying to find distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. You can read about scipy's distance correlation here.

import numpy as np
from scipy.spatial import distance

x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)




1.00210811815

1.00210811815

这里出了什么问题或如何测量?

What is wrong here or how can I measure it?

upd1: 在github上发布链接

推荐答案

根据文档,我看不出这是为什么。

I don't see why this is a problem according to the documentation.

来自文档


u和v之间的相关距离定义为 1-racfrac {(u-\bar {u})\cdot(v-\bar {v})}
{{||(u-\bar {u})|| } _2 {||(v-\bar {v})||} _2}

通过< a href = https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality rel = noreferrer> Cauchy-Schwarz不等式,减号后的表达式带有绝对值为1 。但是,并没有规定它不会是负数的-实际上,如果(平均归一化的)向量是反相关的,就会发生这种情况。

By the Cauchy-Schwarz Inequality, the expression following the minus sign has an absolute value that is at most 1. There is nothing stipulating that it won't be negative, though - in fact, this will happen if the (mean normalized) vectors are anticorrelated.

AFAICT,您应该如果您得到的值大于2或小于0感到很惊讶。使用@Cleb的注释以及范围为[0,2]的事实,我猜想其他一些包将距离简单地定义为一半这个表达式。

AFAICT, you should be surprised if you'd get a value larger than 2 or smaller than 0. Using the comment by @Cleb and the fact that the range is [0, 2], I'm guessing that some other packages simply define the distance as half this expression.

这篇关于Scipy:距离相关性高于1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆