在大 pandas 数据框中找到条纹 [英] finding streaks in pandas dataframe
问题描述
时间优胜者失败者
1 AB 0
2 CB 0
3 DB 1
4 EB 0
5 FA 0
6 GA 0
7 HA 0
8 IA 1
每行都是匹配结果。第一列是比赛的时间,第二列和第三列包含赢家/输家,第四列是匹配中的一个统计。
我想检测条纹
预期结果应如下所示:
时间优胜者失败者条纹
1 AB 0 1
2 CB 0 2
3 DB 1 0
4 EB 0 1
5 FA 0 1
6 GA 0 2
7 HA 0 3
8 IA 1 0
在伪代码中,算法应该像这样工作:
-
.groupby
loser
列。 - 然后在每一行中迭代每个
失败者
组 - 的每一行,查看
stat
列:如果它包含0
,则增加streak
上一行的值0
。如果它不是0
,那么开始一个新的streak
,就是把0
进入streak
列。
所以 .groupby
是清楚的。但是,我需要一些 .apply
我可以看看上一行?这是我被困的地方。
不如jezrael的答案,但对我来说更容易理解...
首先,定义一个可以与单个失败者一起使用的功能: / p>
def f(df):
df ['streak2'] =(df ['stat'] == 0 ).cumsum()
df ['cumsum'] = np.nan
df.loc [df ['stat'] == 1,'cumsum'] = df ['streak2']
df ['cumsum'] = df ['cumsum']。fillna(method ='ffill')
df ['cumsum'] = df ['cumsum']。fillna(0)
df ['streak'] = df ['streak2'] - df ['cumsum']
df.drop(['streak2','cumsum'],axis = 1,inplace = True)
返回df
条纹本质上是一个 cumsum
,但是每次 stat
为1时,我们需要重置它。因此,我们减去 cumsum
的值,其中 stat
是1,c到达下一个1。
然后 groupby
和申请
by loser:
df.groupby('loser')。apply(f)
结果如预期。
I have a pandas dataframe as follows:
time winner loser stat
1 A B 0
2 C B 0
3 D B 1
4 E B 0
5 F A 0
6 G A 0
7 H A 0
8 I A 1
each row is a match result. the first column is the time of the match, second and third column contain winner/loser and the fourth column is one stat from the match.
I want to detect streaks of zeros for this stat per loser.
The expected result should look like this:
time winner loser stat streak
1 A B 0 1
2 C B 0 2
3 D B 1 0
4 E B 0 1
5 F A 0 1
6 G A 0 2
7 H A 0 3
8 I A 1 0
In pseudocode the algorithm should work like this:
.groupby
loser
column.- then iterate over each row of each
loser
group - in each row, look at the
stat
column: if it contains0
, then increment thestreak
value from the previous row by0
. if it is not0
, then start a newstreak
, that is, put0
into thestreak
column.
So the .groupby
is clear. But then I would need some sort of .apply
where I can look at the previous row? this is where I am stuck.
Not as elegant as jezrael's answer, but for me easier to understand...
First, define a function that works with a single loser:
def f(df):
df['streak2'] = (df['stat'] == 0).cumsum()
df['cumsum'] = np.nan
df.loc[df['stat'] == 1, 'cumsum'] = df['streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['streak'] = df['streak2'] - df['cumsum']
df.drop(['streak2', 'cumsum'], axis=1, inplace=True)
return df
The streak is essentially a cumsum
, but we need to reset it each time stat
is 1. We therefore subtract the value of the cumsum
where stat
is 1, carried forward until the next 1.
Then groupby
and apply
by loser:
df.groupby('loser').apply(f)
The result is as expected.
这篇关于在大 pandas 数据框中找到条纹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!