了解大 pandas 的分组方式 [英] Understanding groupby in pandas
问题描述
在分组之后,我正在寻找数据框中某些值的总和.
I'm looking to get the sum of some values in a dataframe after it has been grouped.
一些示例数据:
Race officeID CandidateId total_votes precinct
Mayor 10 705 20 Bell
Mayor 10 805 30 Bell
Treasurer 12 505 10 Bell
Treasurer 12 506 40 Bell
Treasurer 12 507 30 Bell
Mayor 10 705 50 Park
Mayor 10 805 10 Park
Treasurer 12 505 5 Park
Treasurer 12 506 13 Park
Treasurer 12 507 16 Park
要获得每个候选人的票数之和,我可以这样做:
To get the sum of the votes for each candidate, I can do:
cand_votes = df.groupby('CandidateId').sum().total_votes
print cand_votes
CandidateId
505 15
506 53
507 46
705 70
805 40
要获得每个办公室的总票数:
To get total votes per office:
total_votes = df.groupby('officeID').sum().total_votes
print total_votes
officeID
10 110
12 114
但是,如果我想获得每个候选人获得的选票百分比,该怎么办?我是否必须在每个数据对象上应用某种功能?理想情况下,我希望最终的数据对象看起来像这样:
But what if I want to get the percentage of the vote each candidate got? Would I have to apply some sort of function on each data object? Ideally I would like the final data object to look like:
officeID CandidateID total_votes vote_pct
10 705 70 .6363
10 805 40 .37
推荐答案
首先,创建一个具有按候选人和办公室投票的框架.
First, create a frame that that has the votes by candidate and office.
gb = df.groupby(['officeID','CandidateId'], as_index=False)['total_votes'].sum()
然后,您可以按办公室进行汇总,并使用转换(返回的数据类似于索引数据)来计算办公室的百分比.
Then with that, you can aggregate by office and use a transform (which returns like indexed data) to calculate a percent of office.
gb['vote_pct'] = gb['total_votes'] / gb.groupby('officeID')['total_votes'].transform('sum')
In [146]: gb
Out[146]:
officeID CandidateId total_votes vote_pct
0 10 705 70 0.636364
1 10 805 40 0.363636
2 12 505 15 0.131579
3 12 506 53 0.464912
4 12 507 46 0.403509
这篇关于了解大 pandas 的分组方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!