向pandas数据框添加其他行以捕获残值,同时保留每个行的前2个 [英] Adding Additional rows to pandas dataframe to capture residual value while retaining the top 2 for each
问题描述
我有一个熊猫数据框,如下所示:
I have a pandas dataframe as follows:
df = pd.DataFrame({
'State':['am','am','am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','A','B','B','B','B','B','C','C','C','D','D','D','D'],
'Party':['alpha','beta','delta','yellow','alpha','beta','blue','pink','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,15,50,5,11,2,5,4,60,3,1,70,12,34,52,43]
})
我想添加一个总计列,其中将包含每台PC的投票总数.请注意,PC可以具有相同的名称(例如,上面的"A"处于两个不同的状态"am"和"fg",因此我们要对它们进行单独求和,因为它们是不同的PC).我这样做如下
I want to add a Total column, which will contain the sums of the votes for each PC. Note that the PC can have the same name (e.g. 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc). This I do as follows
df['Total'] = df.groupby(['State','PC']).Votes.transform('sum')
此后,我只想为州"和个人电脑"的每种组合保留投票"的前两个党",除非前两个不包括"beta".在那种情况下,我要为"beta"添加第三行.然后,我想根据需要使用"Party"作为"REST"在新行中捕获所有剩余的"Vote"计数.
After that I want to retain only the top two 'Party' by 'Vote' for each combination of 'State' and 'PC', except when the top two does not include 'beta'. In that case I want a third row for 'beta'. And, then I want to capture any remaining 'Vote' count in a new row with 'Party' as 'REST' as needed.
总而言之,我希望输出如下:
In sum I want the output as follows:
df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','B','C','C','C','A','A','A'],
'Party':['delta','beta','REST','gamma','alpha','REST','beta','kappa','alpha','beta','kappa','lambda','REST'],
'Votes':[50,15,15,60,11,9,2,70,3,1,52,43,46],
'Total':[80,80,80,82,82,82,82,74,74,74,141,141,141]
})
我该怎么做?
推荐答案
这里是使用groupby
head
的一种方法,将其他方法与groupby
+ agg
组合在一起,然后将concat
组合在一起,如果前两个不包含beta,我要将该行添加回s1
Here is one way using groupby
head
, and combine others with groupby
+ agg
, then concat
back , here if the first two do not include beta, I am adding that row back s1
s1=df.sort_values('Votes').groupby(['PC','State']).tail(2)
s2=df[~df.index.isin(s1.index)]
s1=pd.concat([s1,s2.loc[s2.Party=='beta']])
s2=s2[~s2.index.isin(s1.index)].groupby(['PC','State']).agg({'Votes':'sum','Total':'first'}).assign(Party='REST')
yourdf=pd.concat([s1,s2.reset_index()],sort=True).sort_values(['PC','State'])
yourdf
Out[517]:
PC Party State Total Votes
1 A beta am 80 15
2 A delta am 80 50
0 A REST am 80 15
4 B alpha am 82 11
8 B gamma am 82 60
5 B beta am 82 2
1 B REST am 82 9
9 C alpha fg 74 3
11 C kappa fg 74 70
10 C beta fg 74 1
15 D lambda fg 141 43
14 D kappa fg 141 52
2 D REST fg 141 46
这篇关于向pandas数据框添加其他行以捕获残值,同时保留每个行的前2个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!