pandas 超过组数 [英] Pandas count over groups
问题描述
我有一个熊猫数据框,如下所示:
I have a pandas dataframe that looks as follows:
ID round player1 player2
1 1 A B
1 2 A C
1 3 B D
2 1 B C
2 2 C D
2 3 C E
3 1 B C
3 2 C D
3 3 C A
数据帧包含运动比赛结果,其中 ID
列表示一个锦标赛, round
列表示每个锦标赛的回合,而 player1
和 player2
列包含在相应的 round
中彼此对战的玩家的名称.
The dataframe contains sport match results, where the ID
column denotes one tournament, the round
column denotes the round for each tournament, and player1
and player2
columns contain the names of players that played against eachother in the respective round
.
我现在想累计计算球员 A
的比赛参加人数.用伪码表示:如果每个锦标赛 ID
A 的玩家出现在 player1
或 player2
列中>,将计数器加1.
I now want to cumulatively count the tournament participations for, say, player A
. In pseudocode this means: If the player with name A
comes up in either the player1
or player2
column per tournament ID
, increment the counter by 1.
结果应如下所示(注意:在我的示例中,玩家 A
确实以 ID
s 1和3参加了比赛)
The result should look like this (note: in my example player A
did participate in tournaments with the ID
s 1 and 3):
ID round player1 player2 playerAparticipated
1 1 A B 1
1 2 A C 1
1 3 B D 1
2 1 B C 0
2 2 C D 0
2 3 C E 0
3 1 B C 2
3 2 C D 2
3 3 C A 2
我目前的状态是,我添加了一个"helper"列,其中包含值 1
或 0
表示相应的玩家是否参加了比赛:
My current status is, that I added a "helper" column containing the values 1
or 0
denoting, if the respective player participated in the tournament:
ID round player1 player2 helper
1 1 A B 1
1 2 A C 1
1 3 B D 1
2 1 B C 0
2 2 C D 0
2 3 C E 0
3 1 B C 1
3 2 C D 1
3 3 C A 1
我认为我只需要最后一步,例如,巧妙地使用 cumsum()
即可按所需方式对 helper
列进行计数.但是,我还无法提出解决方案.
I think that I just need one final step, e.g., a smart use of cumsum()
that counts the helper
column in the desired way. However, I could not come up with the solution yet.
推荐答案
我认为您需要:
-
drop_duplicates
,然后按ID
列,然后按过滤出0
值>布尔索引
,reindex
用于为丢失的索引值添加0
- 由
map <创建的新列/code>
drop_duplicates
by columnID
first and thenset_index
- filter out
0
values byboolean indexing
,cumsum
and lastreindex
for add0
for missing index values - new column create by
map
df1 = df.drop_duplicates('ID').set_index('ID')
s = df1.loc[df1['helper'] != 0, 'helper'].cumsum().reindex(index=df1.index, fill_value=0)
df['playerAparticipated'] = df['ID'].map(s)
print (df)
ID round player1 player2 helper playerAparticipated
0 1 1 A B 1 1
1 1 2 A C 1 1
2 1 3 B D 1 1
3 2 1 B C 0 0
4 2 2 C D 0 0
5 2 3 C E 0 0
6 3 1 B C 1 2
7 3 2 C D 1 2
8 3 3 C A 1 2
相反,可以使用 查看全文