pandas groupby将非空值计数为百分比 [英] Pandas groupby count non-null values as percentage

查看:81
本文介绍了 pandas groupby将非空值计数为百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于此数据集,我想计算缺失的NaN值:

Given this dataset, I would like to count missing, NaN, values:

df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan],
                   'Team' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                   'C' : [4, 14, 3 , 8, 8, 7, np.nan, 11],
                   'D' : [np.nan, np.nan, -12 , 12, 12, -12, np.nan, np.nan]})

具体来说,我想在团队"列中按组计算(以百分比为单位).我可以这样获得原始计数:

Specifically I want to count (as a percentage) per group in the 'Team' column. I can get the raw count by this:

df.groupby('Team').count()

这将获得不遗漏号码的数量.我想做的是创建一个百分比,所以与其获取原始数字,不如将其获取为每个组中总条目的百分比(我不知道不均的组的大小).我尝试使用.agg(),但似乎无法获得想要的东西.我该怎么办?

This will get the number of nonmissing numbers. What I would like to do is create a percentage, so instead of getting the raw number I would get it as a percentage of the total entries in each group (I don't know the size of the groups which are all uneven). I've tried using .agg(), but I can't seem to get what I want. How can I do this?

推荐答案

您可以使用 mean > notnull 布尔型DataFrame:

You can take the mean of the notnull Boolean DataFrame:

In [11]: df.notnull()
Out[11]:
       A      C      D  Team
0   True   True  False  True
1  False   True  False  True
2   True   True   True  True
3   True   True   True  True
4   True   True   True  True
5  False   True   True  True
6   True  False  False  True
7  False   True  False  True

In [12]: df.notnull().mean()
Out[12]:
A       0.625
C       0.875
D       0.500
Team    1.000
dtype: float64

以及分组依据:

In [13]: df.groupby("Team").apply(lambda x: x.notnull().mean())
Out[13]:
              A         C    D  Team
Team
one    0.666667  0.666667  0.0   1.0
three  0.500000  1.000000  0.5   1.0
two    0.666667  1.000000  1.0   1.0

如果不先使用set_index进行申请,这样做可能会更快:

It may be faster to do this without an apply using set_index first:

In [14]: df.set_index("Team").notnull().groupby(level=0).mean()
Out[14]:
              A         C    D
Team
one    0.666667  0.666667  0.0
three  0.500000  1.000000  0.5
two    0.666667  1.000000  1.0

这篇关于 pandas groupby将非空值计数为百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆