pandas 超过组数 [英] Pandas count over groups

查看:68
本文介绍了 pandas 超过组数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,如下所示:

I have a pandas dataframe that looks as follows:

ID  round   player1     player2 
1   1       A           B           
1   2       A           C
1   3       B           D
2   1       B           C           
2   2       C           D
2   3       C           E
3   1       B           C           
3   2       C           D
3   3       C           A

数据帧包含运动比赛结果,其中 ID 列表示一个锦标赛, round 列表示每个锦标赛的回合,而 player1 player2 列包含在相应的 round 中彼此对战的玩家的名称.

The dataframe contains sport match results, where the ID column denotes one tournament, the round column denotes the round for each tournament, and player1 and player2 columns contain the names of players that played against eachother in the respective round.

我现在想累计计算球员 A 的比赛参加人数.用伪码表示:如果每个锦标赛 ID A 的玩家出现在 player1 player2 列中>,将计数器加1.

I now want to cumulatively count the tournament participations for, say, player A. In pseudocode this means: If the player with name A comes up in either the player1 or player2 column per tournament ID, increment the counter by 1.

结果应如下所示(注意:在我的示例中,玩家 A 确实以 ID s 1和3参加了比赛)

The result should look like this (note: in my example player A did participate in tournaments with the IDs 1 and 3):

ID  round   player1     player2     playerAparticipated
1   1       A           B           1
1   2       A           C           1
1   3       B           D           1
2   1       B           C           0
2   2       C           D           0
2   3       C           E           0
3   1       B           C           2
3   2       C           D           2
3   3       C           A           2

我目前的状态是,我添加了一个"helper"列,其中包含值 1 0 表示相应的玩家是否参加了比赛:

My current status is, that I added a "helper" column containing the values 1 or 0 denoting, if the respective player participated in the tournament:

ID  round   player1     player2     helper
1   1       A           B           1
1   2       A           C           1
1   3       B           D           1
2   1       B           C           0
2   2       C           D           0
2   3       C           E           0
3   1       B           C           1
3   2       C           D           1
3   3       C           A           1

我认为我只需要最后一步,例如,巧妙地使用 cumsum()即可按所需方式对 helper 列进行计数.但是,我还无法提出解决方案.

I think that I just need one final step, e.g., a smart use of cumsum() that counts the helper column in the desired way. However, I could not come up with the solution yet.

推荐答案

我认为您需要:

  • drop_duplicates by column ID first and then set_index
  • filter out 0 values by boolean indexing, cumsum and last reindex for add 0 for missing index values
  • new column create by map
df1 = df.drop_duplicates('ID').set_index('ID')
s = df1.loc[df1['helper'] != 0, 'helper'].cumsum().reindex(index=df1.index, fill_value=0)
df['playerAparticipated'] = df['ID'].map(s)
print (df)
   ID  round player1 player2  helper  playerAparticipated
0   1      1       A       B       1                    1
1   1      2       A       C       1                    1
2   1      3       B       D       1                    1
3   2      1       B       C       0                    0
4   2      2       C       D       0                    0
5   2      3       C       E       0                    0
6   3      1       B       C       1                    2
7   3      2       C       D       1                    2
8   3      3       C       A       1                    2

相反,可以使用 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆