pandas 标准偏差返回NaN [英] Pandas Standard Deviation returns NaN
问题描述
我在Python 2.7中具有以下Pandas Dataframe.
I have the following Pandas Dataframe in Python 2.7.
代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,6),columns=list('ABCDEF'))
df.insert(0,'Category',['A','C','D','D','B','E','F','F','G','H'])
print df.groupby('Category').std()
这里是 df
:
Category A B C D E F
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.912966 0.051288 0.055766 0.906490 0.078384 0.928538
D 0.416582 0.441684 0.605967 0.516580 0.458814 0.823692
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.375279 0.555228 0.020599 0.120947 0.896505 0.424233
F 0.952112 0.299520 0.150623 0.341139 0.186734 0.807519
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
我希望执行 GROUP_BY
,然后计算平均值和标准偏差.标准偏差是有时在分组超过1行后计算得出的-这意味着除以 N-1
会有时将除以 0
,将显示 NaN
.
I am looking to perform a GROUP_BY
and then calculate the average and standard deviation. The standard deviation is sometimes calculated after grouping over 1 row - this means dividing by N-1
will sometimes give division by 0
which will print NaN
.
以下是上面代码的输出:
Here is the output of the above code:
输出:
A B C D E F
Category
A NaN NaN NaN NaN NaN NaN
B NaN NaN NaN NaN NaN NaN
C NaN NaN NaN NaN NaN NaN
D 0.350996 0.276052 0.389051 0.275708 0.269004 0.074137
E NaN NaN NaN NaN NaN NaN
F 0.407882 0.180813 0.091941 0.155699 0.501884 0.271025
G NaN NaN NaN NaN NaN NaN
H NaN NaN NaN NaN NaN NaN
对于我在1行中执行 GROUP_BY
的情况,有没有一种方法可以跳过标准偏差"并仅返回值本身.例如,我正在寻找这个:
For the cases where I am performing the GROUP_BY
over 1 row, is there a way to skip the Standard Deviation and just return the value itself. For example, I am looking to get this:
期望的输出
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269004 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407882 0.180813 0.091941 0.155699 0.501884 0.271025
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
可以用熊猫来做到这一点吗?
Is it possible to do this with Pandas?
要在上面创建确切的Pandas数据框,请选择它,将其复制到剪贴板,然后使用以下方法:
To create the exact Pandas Dataframe above, select it, copy to clipboard and then use this:
import pandas as pd
df = pd.read_clipboard(index_col='Category')
print df
print df.groupby('Category').std()
推荐答案
您可以 fillna
替换缺少的值-将每个值的最后一个值传入 DataFrame
组.
You could fillna
to replace the missing values - passing in a DataFrame
with the last value of each group.
In [86]: (df.groupby('Category').std()
...: .fillna(df.groupby('Category').last()))
Out[86]:
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269005 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407883 0.180813 0.091941 0.155699 0.501884 0.271024
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
这篇关于 pandas 标准偏差返回NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!