为什么 numpy std() 给出与 matlab std() 不同的结果? [英] Why does numpy std() give a different result to matlab std()?

查看:40
本文介绍了为什么 numpy std() 给出与 matlab std() 不同的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试将 matlab 代码转换为 numpy 并发现 numpy 与 std 函数的结果不同.

在matlab中

std([1,3,4,6])答案 = 2.0817

在 numpy

np.std([1,3,4,6])1.8027756377319946

这正常吗?我该如何处理?

解决方案

NumPy 函数 np.std 采用可选参数 ddof:Delta 自由度".默认情况下,这是 0.将其设置为 1 以获取 MATLAB 结果:

<预><代码>>>>np.std([1,3,4,6], ddof=1)2.0816659994661326

为了添加更多上下文,在计算方差(其标准偏差是平方根)时,我们通常除以我们拥有的值的数量.

但是,如果我们从较大的分布中随机选择 N 个元素的样本并计算方差,除以 N 可能会导致低估实际方差.为了解决这个问题,我们可以将除以(自由度)的数字降低到小于N(通常是 N-1).ddof 参数允许我们按指定的数量更改除数.

除非另有说明,NumPy 将计算方差的偏差估计量(ddof=0,除以 N).如果您正在处理整个分布(而不是从较大分布中随机选取的值的子集),这就是您想要的.如果给定了 ddof 参数,则 NumPy 除以 N - ddof.

MATLAB 的std 的默认行为是通过除以N-1 来校正样本方差的偏差.这消除了标准偏差中的一些(但可能不是全部)偏差.如果您在更大分布的随机样本上使用该函数,这很可能就是您想要的.

@hbaderts 的好回答提供了更多的数学细节.

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.

in matlab

std([1,3,4,6])
ans =  2.0817

in numpy

np.std([1,3,4,6])
1.8027756377319946

Is this normal? And how should I handle this?

解决方案

The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:

>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.

Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.

The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

The nice answer by @hbaderts gives further mathematical details.

这篇关于为什么 numpy std() 给出与 matlab std() 不同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆