如何在Pandas中的groupby之后获取列计数的百分比 [英] How to get percentage of counts of a column after groupby in Pandas
问题描述
我正在尝试获取数据列表中姓名的每个等级的成绩分布。
但是,我不知道如何获得每个年级在其职级组中所占的比例/百分比。例如:
I'm trying to get the distribution of grades for each rank for names in a list of data. However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:
df.head()
name rank grade
Bob 1 A
Bob 1 A
Bob 1 B
Bob 1 C
Bob 2 B
Bob 3 C
Joe 1 C
Joe 2 B
Joe 2 B
Joe 3 A
Joe 3 B
Joe 3 B
我使用 grade_count = df.groupby(['name','rank', ''grade'])。['grade']。size())
给我其(名称,等级)组中每个等级的计数:
I use grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size())
to give me the count of each grade within its (name,rank) group:
name rank grade
Bob 1 A 2
B 1
C 1
2 B 1
3 C 1
Joe 1 C 1
2 B 2
3 A 1
B 2
现在,对于每个计算出的尺寸,我想将其与(名称,等级)gro的比例向上(即等级中某个等级在系统中所占的比例是多少)这是我想要的输出:
Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system) This is the output I'd like:
name rank grade
Bob 1 A 2 0.5 (Bob @ rank 1 had 4 grades, and 50% of them are A's)
B 1 0.25
C 1 0.25
2 B 1 1
3 C 1 1
Joe 1 C 1 1
2 B 2 1
3 A 1 0.33
B 2 0.66
我设法通过使用 rank_totals = grade_count.groupby来获得每个等级组的总数。 (level [0,1])。sum()
会导致:
I've managed to get the totals of each rank group by using rank_totals = grade_count.groupby(level[0,1]).sum()
which results in:
name rank
Bob 1 4
2 1
3 1
Joe 1 1
2 2
3 3
如何将 grade_count
中的数字除以它们的corr排行总数在 rank_totals
?
How can I divide the numbers from grade_count
by their corresponding rank totals in rank_totals
?
推荐答案
按名称将数据分组并排名级别,然后使用 transform
来获取系列的总数并将其广播到整个系列。使用该系列除以当前序列:
Group your data by name and rank levels, and use transform
to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:
grade_count.groupby(level = [0,1]).transform(sum)
Out[19]:
name rank grade
Bob 1 A 4
B 4
C 4
2 B 1
3 C 1
Joe 1 C 1
2 B 2
3 A 3
B 3
dtype: int64
grade_count / grade_count.groupby(level = [0,1]).transform(sum)
Out[20]:
name rank grade
Bob 1 A 0.500000
B 0.250000
C 0.250000
2 B 1.000000
3 C 1.000000
Joe 1 C 1.000000
2 B 1.000000
3 A 0.333333
B 0.666667
这篇关于如何在Pandas中的groupby之后获取列计数的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!