pandas 超过组数 [英] Pandas count over groups

查看：68 发布时间：2021/5/13 19:45:47 python pandas count group-by cumsum

本文介绍了 pandas 超过组数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个熊猫数据框，如下所示:

I have a pandas dataframe that looks as follows:

ID  round   player1     player2 
1   1       A           B           
1   2       A           C
1   3       B           D
2   1       B           C           
2   2       C           D
2   3       C           E
3   1       B           C           
3   2       C           D
3   3       C           A

数据帧包含运动比赛结果，其中 ID 列表示一个锦标赛， round 列表示每个锦标赛的回合，而 player1 和 player2 列包含在相应的 round 中彼此对战的玩家的名称.

The dataframe contains sport match results, where the ID column denotes one tournament, the round column denotes the round for each tournament, and player1 and player2 columns contain the names of players that played against eachother in the respective round.

我现在想累计计算球员 A 的比赛参加人数.用伪码表示:如果每个锦标赛 ID A 的玩家出现在 player1 或 player2 列中>，将计数器加1.

I now want to cumulatively count the tournament participations for, say, player A. In pseudocode this means: If the player with name A comes up in either the player1 or player2 column per tournament ID, increment the counter by 1.

结果应如下所示(注意:在我的示例中，玩家 A 确实以 ID s 1和3参加了比赛)

The result should look like this (note: in my example player A did participate in tournaments with the IDs 1 and 3):

ID  round   player1     player2     playerAparticipated
1   1       A           B           1
1   2       A           C           1
1   3       B           D           1
2   1       B           C           0
2   2       C           D           0
2   3       C           E           0
3   1       B           C           2
3   2       C           D           2
3   3       C           A           2

我目前的状态是，我添加了一个"helper"列，其中包含值 1 或 0 表示相应的玩家是否参加了比赛:

My current status is, that I added a "helper" column containing the values 1 or 0 denoting, if the respective player participated in the tournament:

ID  round   player1     player2     helper
1   1       A           B           1
1   2       A           C           1
1   3       B           D           1
2   1       B           C           0
2   2       C           D           0
2   3       C           E           0
3   1       B           C           1
3   2       C           D           1
3   3       C           A           1

我认为我只需要最后一步，例如，巧妙地使用 cumsum()即可按所需方式对 helper 列进行计数.但是，我还无法提出解决方案.

I think that I just need one final step, e.g., a smart use of cumsum() that counts the helper column in the desired way. However, I could not come up with the solution yet.

推荐答案

我认为您需要:

drop_duplicates ，然后按 ID 列，然后按过滤出 0 值> 布尔索引 ， reindex 用于为丢失的索引值添加 0
由 map <创建的新列/code>



drop_duplicates by column ID first and then set_index
filter out 0 values by boolean indexing, cumsum and last reindex for add 0 for missing index values
new column create by map

df1 = df.drop_duplicates('ID').set_index('ID')
s = df1.loc[df1['helper'] != 0, 'helper'].cumsum().reindex(index=df1.index, fill_value=0)
df['playerAparticipated'] = df['ID'].map(s)
print (df)
   ID  round player1 player2  helper  playerAparticipated
0   1      1       A       B       1                    1
1   1      2       A       C       1                    1
2   1      3       B       D       1                    1
3   2      1       B       C       0                    0
4   2      2       C       D       0                    0
5   2      3       C       E       0                    0
6   3      1       B       C       1                    2
7   3      2       C       D       1                    2
8   3      3       C       A       1                    2

相反，可以使用
                        查看全文

pandas 超过组数 [英] Pandas count over groups

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 超过组数 [英] Pandas count over groups

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭