如何在Pandas中的groupby之后获取列计数的百分比 [英] How to get percentage of counts of a column after groupby in Pandas

查看:1477
本文介绍了如何在Pandas中的groupby之后获取列计数的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取数据列表中姓名的每个等级的成绩分布。
但是,我不知道如何获得每个年级在其职级组中所占的比例/百分比。例如:

I'm trying to get the distribution of grades for each rank for names in a list of data. However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:

df.head()

name    rank    grade
Bob     1       A
Bob     1       A
Bob     1       B
Bob     1       C
Bob     2       B
Bob     3       C
Joe     1       C
Joe     2       B
Joe     2       B
Joe     3       A
Joe     3       B
Joe     3       B

我使用 grade_count = df.groupby(['name','rank', ''grade'])。['grade']。size())给我其(名称,等级)组中每个等级的计数:

I use grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size()) to give me the count of each grade within its (name,rank) group:

name    rank    grade
Bob     1       A     2
                B     1
                C     1
        2       B     1
        3       C     1
Joe     1       C     1
        2       B     2
        3       A     1
                B     2

现在,对于每个计算出的尺寸,我想将其与(名称,等级)gro的比例向上(即等级中某个等级在系统中所占的比例是多少)这是我想要的输出

Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system) This is the output I'd like:

name    rank    grade
Bob     1       A     2    0.5   (Bob @ rank 1 had 4 grades, and 50% of them are A's)
                B     1    0.25
                C     1    0.25
        2       B     1    1
        3       C     1    1
Joe     1       C     1    1
        2       B     2    1
        3       A     1    0.33
                B     2    0.66

我设法通过使用 rank_totals = grade_count.groupby来获得每个等级组的总数。 (level [0,1])。sum()会导致:

I've managed to get the totals of each rank group by using rank_totals = grade_count.groupby(level[0,1]).sum() which results in:

name    rank    
Bob     1       4
        2       1
        3       1
Joe     1       1
        2       2
        3       3

如何将 grade_count 中的数字除以它们的corr排行总数在 rank_totals

How can I divide the numbers from grade_count by their corresponding rank totals in rank_totals?

推荐答案

按名称将数据分组并排名级别,然后使用 transform 来获取系列的总数并将其广播到整个系列。使用该系列除以当前序列:

Group your data by name and rank levels, and use transform to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:

grade_count.groupby(level = [0,1]).transform(sum)
Out[19]: 
name  rank  grade
Bob   1     A        4
            B        4
            C        4
      2     B        1
      3     C        1
Joe   1     C        1
      2     B        2
      3     A        3
            B        3
dtype: int64

grade_count / grade_count.groupby(level = [0,1]).transform(sum)
Out[20]: 
name  rank  grade
Bob   1     A        0.500000
            B        0.250000
            C        0.250000
      2     B        1.000000
      3     C        1.000000
Joe   1     C        1.000000
      2     B        1.000000
      3     A        0.333333
            B        0.666667

这篇关于如何在Pandas中的groupby之后获取列计数的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆