向pandas数据框添加其他行以捕获残值,同时保留每个行的前2个 [英] Adding Additional rows to pandas dataframe to capture residual value while retaining the top 2 for each

查看:55
本文介绍了向pandas数据框添加其他行以捕获残值,同时保留每个行的前2个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,如下所示:

I have a pandas dataframe as follows:

df = pd.DataFrame({
'State':['am','am','am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','A','B','B','B','B','B','C','C','C','D','D','D','D'],
'Party':['alpha','beta','delta','yellow','alpha','beta','blue','pink','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,15,50,5,11,2,5,4,60,3,1,70,12,34,52,43]
})

我想添加一个总计列,其中将包含每台PC的投票总数.请注意,PC可以具有相同的名称(例如,上面的"A"处于两个不同的状态"am"和"fg",因此我们要对它们进行单独求和,因为它们是不同的PC).我这样做如下

I want to add a Total column, which will contain the sums of the votes for each PC. Note that the PC can have the same name (e.g. 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc). This I do as follows

df['Total'] = df.groupby(['State','PC']).Votes.transform('sum')

此后,我只想为州"和个人电脑"的每种组合保留投票"的前两个党",除非前两个不包括"beta".在那种情况下,我要为"beta"添加第三行.然后,我想根据需要使用"Party"作为"REST"在新行中捕获所有剩余的"Vote"计数.

After that I want to retain only the top two 'Party' by 'Vote' for each combination of 'State' and 'PC', except when the top two does not include 'beta'. In that case I want a third row for 'beta'. And, then I want to capture any remaining 'Vote' count in a new row with 'Party' as 'REST' as needed.

总而言之,我希望输出如下:

In sum I want the output as follows:

df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','B','C','C','C','A','A','A'],
'Party':['delta','beta','REST','gamma','alpha','REST','beta','kappa','alpha','beta','kappa','lambda','REST'],
'Votes':[50,15,15,60,11,9,2,70,3,1,52,43,46],
'Total':[80,80,80,82,82,82,82,74,74,74,141,141,141]
})

我该怎么做?

推荐答案

这里是使用groupby head的一种方法,将其他方法与groupby + agg组合在一起,然后将concat组合在一起,如果前两个不包含beta,我要将该行添加回s1

Here is one way using groupby head, and combine others with groupby + agg , then concat back , here if the first two do not include beta, I am adding that row back s1

s1=df.sort_values('Votes').groupby(['PC','State']).tail(2)
s2=df[~df.index.isin(s1.index)]
s1=pd.concat([s1,s2.loc[s2.Party=='beta']])
s2=s2[~s2.index.isin(s1.index)].groupby(['PC','State']).agg({'Votes':'sum','Total':'first'}).assign(Party='REST')
yourdf=pd.concat([s1,s2.reset_index()],sort=True).sort_values(['PC','State'])
yourdf
Out[517]: 
   PC   Party State  Total  Votes
1   A    beta    am     80     15
2   A   delta    am     80     50
0   A    REST    am     80     15
4   B   alpha    am     82     11
8   B   gamma    am     82     60
5   B    beta    am     82      2
1   B    REST    am     82      9
9   C   alpha    fg     74      3
11  C   kappa    fg     74     70
10  C    beta    fg     74      1
15  D  lambda    fg    141     43
14  D   kappa    fg    141     52
2   D    REST    fg    141     46

这篇关于向pandas数据框添加其他行以捕获残值,同时保留每个行的前2个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆