使用pearsonr时遇到无效值 [英] Encountered invalid value when I use pearsonr
问题描述
也许我弄错了。如果是这样,我很抱歉问这个问题。
Maybe I made a mistake. If so, I am sorry to ask this.
我要计算 pearsonr 函数,定义 rel = nofollow noreferrer> 皮尔逊的相关系数 。
I want to calculate Pearson's correlation coefficent by using scipy's pearsonr
function.
from scipy.stats.stats import pearsonr
X = [4, 4, 4, 4, 4, 4]
Y = [4, 5, 5, 4, 4, 4]
pearsonr(X, Y)
下面出现错误
RuntimeWarning:遇到无效值double_scalars ###
RuntimeWarning: invalid value encountered in double_scalars ###
我收到错误的原因是E [X] = 4(X的例外值为4)
The reason why I get an error is E[X] = 4 (Excepted Value of X is 4)
我看一下scpy.stats.stats.py中的pearsonr函数代码。 pearsonr函数的某些部分如下。
I look at the code of pearsonr function in scpy.stats.stats.py. Some part of the pearsonr function is as follows.
mx = x.mean() # which is 4
my = y.mean() # not necessary
xm, ym = x-mx, y-my # xm = [0 0 0 0 0 0]
r_num = n*(np.add.reduce(xm*ym)) #r_num = 0, because xm*ym 1x6 Zero Vector.
r_den = n*np.sqrt(ss(xm)*ss(ym)) #r_den = 0
r = (r_num / r_den) # Invalid value encountered in double_scalars
最后, pearsonr
返回(nan,1.0 )
应该 pearsonr
返回(0,1.0)
?
我认为,如果每个行/列的向量均具有相同的值,则协方差应为零。因此,根据PCC的定义,皮尔逊相关系数也应为零。
I think if a vector has same value for every row/column, covariance should be zero. Thus Pearson's Correleation Coefficient should also be zero by the definition of PCC.
两个变量之间的皮尔逊相关系数定义为协方差的两个变量除以它们的标准偏差的乘积。
是错误还是在哪里出错?
Is it bug or where do I make a mistake?
推荐答案
两个变量之间的皮尔逊相关系数定义为两个变量的协方差除以它们的标准偏差乘积。
Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations.
所以这是
-
的标准偏差的协方差[4、5、5、4、4、4]
倍 -
的标准偏差[4、4、4、4 ,4,4]
。
- the standard deviation of
[4, 5, 5, 4, 4, 4]
times - the standard deviation of
[4, 4, 4, 4, 4, 4]
.
[4,4 ,4,4,4,4]
为零。
所以这是
- <$ c $的标准偏差的协方差c> [4、5、5、4、4、4] 次
- 零。
所以它是
- 零的协方差。
任何除以零的值都是 nan
。协方差的值是不相关。
Anything divided by zero is nan
. The value of the covariance is irrelevant.
这篇关于使用pearsonr时遇到无效值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!