R和Python之间的输出值不同吗? [英] Output values differ between R and Python?
问题描述
在 z标准化我的数组时,也许我做错了.有人可以看看这个情况并提供建议吗?
Perhaps I am doing something wrong while z-normalizing my array. Can someone take a look at this and suggest what's going on?
在R中:
> data <- c(2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34)
> data.mean <- mean(data)
> data.sd <- sqrt(var(data))
> data.norm <- (data - data.mean) / data.sd
> print(data.norm)
[1] -0.9796808 -0.8622706 -0.6123005 0.8496459 1.7396910 1.5881940 1.0958286 0.5277147 0.4709033 -0.2865819
[11] 0.0921607 -0.2865819 -0.9039323 -1.1955641 -1.2372258
在Python中使用numpy:
>>> import string
>>> import numpy as np
>>> from scipy.stats import norm
>>> data = np.array([np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])])
>>> data -= np.split(np.mean(data, axis=1), data.shape[0])
>>> data *= np.split(1.0/data.std(axis=1), data.shape[0])
>>> print data
[[-1.01406602 -0.89253491 -0.63379126 0.87946705 1.80075126 1.64393692
1.13429034 0.54623659 0.48743122 -0.29664045 0.09539539 -0.29664045
-0.93565885 -1.23752644 -1.28065039]]
我不正确地使用numpy
吗?
推荐答案
我相信您的NumPy结果正确.不过,我将以更简单的方式进行标准化:
I believe that your NumPy result is correct. I would do the normalization in a simpler way, though:
>>> data = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])
>>> data -= data.mean()
>>> data /= data.std()
>>> data
array([-1.01406602, -0.89253491, -0.63379126, 0.87946705, 1.80075126,
1.64393692, 1.13429034, 0.54623659, 0.48743122, -0.29664045,
0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])
两个结果之间的差异在于归一化:以r
作为R结果:
The difference between your two results lies in the normalization: with r
as the R result:
>>> r / data
array([ 0.96609173, 0.96609173, 0.96609173, 0.96609179, 0.96609179, 0.96609181, 0.9660918 , 0.96609181,
0.96609179, 0.96609179, 0.9660918 , 0.96609179, 0.96609175, 0.96609176, 0.96609177])
因此,您的两个结果大多只是彼此成比例.因此,您可能需要比较使用R和Python获得的标准偏差.
Thus, your two results are mostly simply proportional to each other. You may therefore want to compare the standard deviations obtained with R and with Python.
PS :现在我正在考虑,可能是因为NumPy和R中的方差不是以相同的方式定义的:对于N
元素,某些工具使用<计算方差时,使用c3>而不是N
.您可能要检查一下.
PS: Now that I am thinking of it, it may be that the variance in NumPy and in R is not defined in the same way: for N
elements, some tools normalize with N-1
instead of N
, when calculating the variance. You may want to check this.
PPS :这是差异原因:因子的差异来自两种不同的规范化惯例:观察到的因子仅仅是sqrt(14/15)= 0.9660917…(因为数据包含15个元素).因此,为了在R中获得与Python中相同的结果,您需要将R结果除以该因子.
PPS: Here is the reason for the discrepancy: the difference in factors comes from two different normalization conventions: the observed factor is simply sqrt(14/15) = 0.9660917… (because the data has 15 elements). Thus, in order to obtain in R the same result as in Python, you need to divide the R result by this factor.
这篇关于R和Python之间的输出值不同吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!