为什么 numpy std() 给出与 matlab std() 不同的结果? [英] Why does numpy std() give a different result to matlab std()?
问题描述
我尝试将 matlab 代码转换为 numpy 并发现 numpy 与 std 函数的结果不同.
在matlab中
std([1,3,4,6])答案 = 2.0817
在 numpy
np.std([1,3,4,6])1.8027756377319946
这正常吗?我该如何处理?
NumPy 函数 np.std
采用可选参数 ddof
:Delta 自由度".默认情况下,这是 0
.将其设置为 1
以获取 MATLAB 结果:
为了添加更多上下文,在计算方差(其标准偏差是平方根)时,我们通常除以我们拥有的值的数量.
但是,如果我们从较大的分布中随机选择 N
个元素的样本并计算方差,除以 N
可能会导致低估实际方差.为了解决这个问题,我们可以将除以(自由度)的数字降低到小于N
(通常是 N-1
).ddof
参数允许我们按指定的数量更改除数.
除非另有说明,NumPy 将计算方差的偏差估计量(ddof=0
,除以 N
).如果您正在处理整个分布(而不是从较大分布中随机选取的值的子集),这就是您想要的.如果给定了 ddof
参数,则 NumPy 除以 N - ddof
.
MATLAB 的std
的默认行为是通过除以N-1
来校正样本方差的偏差.这消除了标准偏差中的一些(但可能不是全部)偏差.如果您在更大分布的随机样本上使用该函数,这很可能就是您想要的.
@hbaderts 的好回答提供了更多的数学细节.
I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.
in matlab
std([1,3,4,6])
ans = 2.0817
in numpy
np.std([1,3,4,6])
1.8027756377319946
Is this normal? And how should I handle this?
The NumPy function np.std
takes an optional parameter ddof
: "Delta Degrees of Freedom". By default, this is 0
. Set it to 1
to get the MATLAB result:
>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326
To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.
But if we select a random sample of N
elements from a larger distribution and calculate the variance, division by N
can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N
(usually N-1
). The ddof
parameter allows us change the divisor by the amount we specify.
Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0
, dividing by N
). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof
parameter is given, NumPy divides by N - ddof
instead.
The default behaviour of MATLAB's std
is to correct the bias for sample variance by dividing by N-1
. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.
The nice answer by @hbaderts gives further mathematical details.
这篇关于为什么 numpy std() 给出与 matlab std() 不同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!