了解大 pandas 的分组方式 [英] Understanding groupby in pandas

查看:100
本文介绍了了解大 pandas 的分组方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在分组之后,我正在寻找数据框中某些值的总和.

I'm looking to get the sum of some values in a dataframe after it has been grouped.

一些示例数据:

Race          officeID   CandidateId  total_votes   precinct
Mayor         10         705            20           Bell
Mayor         10         805            30           Bell
Treasurer     12         505            10           Bell
Treasurer     12         506            40           Bell
Treasurer     12         507            30           Bell
Mayor         10         705            50           Park
Mayor         10         805            10           Park
Treasurer     12         505            5            Park
Treasurer     12         506            13           Park
Treasurer     12         507            16           Park

要获得每个候选人的票数之和,我可以这样做:

To get the sum of the votes for each candidate, I can do:

cand_votes = df.groupby('CandidateId').sum().total_votes
print cand_votes

CandidateId
505    15
506    53
507    46
705    70
805    40

要获得每个办公室的总票数:

To get total votes per office:

total_votes = df.groupby('officeID').sum().total_votes
print total_votes

officeID
10    110
12    114

但是,如果我想获得每个候选人获得的选票百分比,该怎么办?我是否必须在每个数据对象上应用某种功能?理想情况下,我希望最终的数据对象看起来像这样:

But what if I want to get the percentage of the vote each candidate got? Would I have to apply some sort of function on each data object? Ideally I would like the final data object to look like:

officeID    CandidateID    total_votes    vote_pct
10          705            70             .6363
10          805            40             .37

推荐答案

首先,创建一个具有按候选人和办公室投票的框架.

First, create a frame that that has the votes by candidate and office.

gb = df.groupby(['officeID','CandidateId'], as_index=False)['total_votes'].sum()

然后,您可以按办公室进行汇总,并使用转换(返回的数据类似于索引数据)来计算办公室的百分比.

Then with that, you can aggregate by office and use a transform (which returns like indexed data) to calculate a percent of office.

gb['vote_pct'] = gb['total_votes'] / gb.groupby('officeID')['total_votes'].transform('sum')


In [146]: gb
Out[146]: 
   officeID  CandidateId  total_votes  vote_pct
0        10          705           70  0.636364
1        10          805           40  0.363636
2        12          505           15  0.131579
3        12          506           53  0.464912
4        12          507           46  0.403509

这篇关于了解大 pandas 的分组方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆