pandas :计算整个数据帧的均值或标准差(标准差) [英] Pandas : compute mean or std (standard deviation) over entire dataframe

查看:79
本文介绍了 pandas :计算整个数据帧的均值或标准差(标准差)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题,我有一个像这样的数据框:

Here is my problem, I have a dataframe like this :

    Depr_1  Depr_2  Depr_3
S3  0   5   9
S2  4   11  8
S1  6   11  12
S5  0   4   11
S4  4   8   8

并且我只想计算整个数据帧的平均值,因为以下操作不起作用:

and I just want to calculate the mean over the full dataframe, as the following doesn't work :

df.mean()

然后我想到了:

df.mean().mean()

但是此技巧不适用于计算标准偏差.我最后的尝试是:

But this trick won't work for computing the standard deviation. My final attempts were :

df.get_values().mean()
df.get_values().std()

除了在后一种情况下,它使用numpy中的mean()和std()函数.对于平均值而言,这不是问题,但对于std来说,这是问题,因为pandas函数默认使用ddof=1,而不像numpy那样使用ddof=0.

Except that in the latter case, it uses mean() and std() function from numpy. It's not a problem for the mean, but it is for std, as the pandas function uses by default ddof=1, unlike the numpy one where ddof=0.

推荐答案

您可以使用stack将数据框转换为单列(这会将形状从5x3更改为15x1),然后采用标准偏差:

You could convert the dataframe to be a single column with stack (this changes the shape from 5x3 to 15x1) and then take the standard deviation:

df.stack().std()         # pandas default degrees of freedom is one

或者,您可以使用values在进行标准差之前将熊猫数据框转换为numpy数组:

Alternatively, you can use values to convert from a pandas dataframe to a numpy array before taking the standard deviation:

df.values.std(ddof=1)    # numpy default degrees of freedom is zero

与大熊猫不同,默认情况下numpy会给出整个数组的标准偏差,因此在进行标准偏差之前无需重塑形状.

Unlike pandas, numpy will give the standard deviation of the entire array by default, so there is no need to reshape before taking the standard deviation.

一些附加说明:

  • 这里的numpy方法比熊猫方法要快一些,当您可以选择用numpy或pandas完成相同的事情时,通常这是正确的.速度差异取决于数据的大小,但是当我在笔记本电脑上测试一些不同大小的数据帧(numpy版本1.15.4和pandas版本0.23.4)时,numpy的速度大约快10倍.

  • The numpy approach here is a bit faster than the pandas one, which is generally true when you have the option to accomplish the same thing with either numpy or pandas. The speed difference will depend on the size of your data, but numpy was roughly 10x faster when I tested a few different sized dataframes on my laptop (numpy version 1.15.4 and pandas version 0.23.4).

此处的numpy和pandas方法不会给出完全相同的答案,但将非常接近(相同的精度为几位数).差异是由于在幕后的实现中存在细微差异,影响了浮点值的取整方式.

The numpy and pandas approaches here will not give exactly the same answers, but will be extremely close (identical at several digits of precision). The discrepancy is due to slight differences in implementation behind the scenes that affect how the floating point values get rounded.

这篇关于 pandas :计算整个数据帧的均值或标准差(标准差)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆