在大 pandas 数据框中找到条纹 [英] finding streaks in pandas dataframe

查看:127
本文介绍了在大 pandas 数据框中找到条纹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大熊猫数据框如下:

 时间优胜者失败者
1 AB 0
2 CB 0
3 DB 1
4 EB 0
5 FA 0
6 GA 0
7 HA 0
8 IA 1

每行都是匹配结果。第一列是比赛的时间,第二列和第三列包含赢家/输家,第四列是匹配中的一个统计。



我想检测条纹



预期结果应如下所示:

 时间优胜者失败者条纹
1 AB 0 1
2 CB 0 2
3 DB 1 0
4 EB 0 1
5 FA 0 1
6 GA 0 2
7 HA 0 3
8 IA 1 0

在伪代码中,算法应该像这样工作:




  • .groupby loser 列。

  • 然后在每一行中迭代每个失败者

  • 的每一行,查看 stat 列:如果它包含 0 ,则增加 streak 上一行的值 0 。如果它不是 0 ,那么开始一个新的 streak ,就是把 0 进入 streak 列。



所以 .groupby 是清楚的。但是,我需要一些 .apply 我可以看看上一行?这是我被困的地方。

解决方案

不如jezrael的答案,但对我来说更容易理解...



首先,定义一个可以与单个失败者一起使用的功能: / p>

  def f(df):
df ['streak2'] =(df ['stat'] == 0 ).cumsum()
df ['cumsum'] = np.nan
df.loc [df ['stat'] == 1,'cumsum'] = df ['streak2']
df ['cumsum'] = df ['cumsum']。fillna(method ='ffill')
df ['cumsum'] = df ['cumsum']。fillna(0)
df ['streak'] = df ['streak2'] - df ['cumsum']
df.drop(['streak2','cumsum'],axis = 1,inplace = True)
返回df

条纹本质上是一个 cumsum ,但是每次 stat 为1时,我们需要重置它。因此,我们减去 cumsum 的值,其中 stat 是1,c到达下一个1。



然后 groupby 申请 by loser:

  df.groupby('loser')。apply(f)

结果如预期。


I have a pandas dataframe as follows:

time    winner  loser   stat
1       A       B       0
2       C       B       0
3       D       B       1
4       E       B       0
5       F       A       0
6       G       A       0
7       H       A       0
8       I       A       1

each row is a match result. the first column is the time of the match, second and third column contain winner/loser and the fourth column is one stat from the match.

I want to detect streaks of zeros for this stat per loser.

The expected result should look like this:

time    winner  loser   stat    streak
1       A       B       0       1
2       C       B       0       2
3       D       B       1       0
4       E       B       0       1
5       F       A       0       1
6       G       A       0       2
7       H       A       0       3
8       I       A       1       0

In pseudocode the algorithm should work like this:

  • .groupby loser column.
  • then iterate over each row of each loser group
  • in each row, look at the stat column: if it contains 0, then increment the streak value from the previous row by 0. if it is not 0, then start a new streak, that is, put 0 into the streak column.

So the .groupby is clear. But then I would need some sort of .apply where I can look at the previous row? this is where I am stuck.

解决方案

Not as elegant as jezrael's answer, but for me easier to understand...

First, define a function that works with a single loser:

def f(df):
    df['streak2'] = (df['stat'] == 0).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['stat'] == 1, 'cumsum'] = df['streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['streak'] = df['streak2'] - df['cumsum']
    df.drop(['streak2', 'cumsum'], axis=1, inplace=True)
    return df

The streak is essentially a cumsum, but we need to reset it each time stat is 1. We therefore subtract the value of the cumsum where stat is 1, carried forward until the next 1.

Then groupby and apply by loser:

df.groupby('loser').apply(f)

The result is as expected.

这篇关于在大 pandas 数据框中找到条纹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆