计算与 pandas 的连胜纪录 [英] Compute winning streak with pandas

查看:127
本文介绍了计算与 pandas 的连胜纪录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以为我知道该怎么做,但是我正在把头发拔出来.我正在尝试使用一个函数来创建新列.该函数查看当前行中win列的值,并需要将其与win列中的前一个数字进行比较,如下面的if语句所示.获胜栏将永远只有0或1.

I thought I knew how to do this but I'm pulling my hair out over it. I'm trying to use a function to create a new column. The function looks at the value of the win column in the current row and needs to compare it to the previous number in the win column as the if statements lay out below. The win column will only ever be 0 or 1.

import pandas as pd
data = pd.DataFrame({'win': [0, 0, 1, 1, 1, 0, 1]})
print (data)

   win
0    0
1    0
2    1
3    1
4    1
5    0
6    1

def streak(row):
    win_current_row = row['win']
    win_row_above = row['win'].shift(-1)
    streak_row_above = row['streak'].shift(-1)

    if (win_row_above == 0) & (win_current_row == 0):
        return 0
    elif (win_row_above == 0) & (win_current_row ==1):
        return 1
    elif (win_row_above ==1) & (win_current_row == 1):
        return streak_row_above + 1
    else:
        return 0

data['streak'] = data.apply(streak, axis=1)

所有这些都以以下错误结束:

All this ends with this error:

AttributeError: ("'numpy.int64' object has no attribute 'shift'", 'occurred at index 0')

在其他示例中,我看到了引用df['column'].shift(1)的函数,因此我很困惑为什么在这种情况下似乎无法做到这一点.

In other examples I see functions that are referring to df['column'].shift(1) so I'm confused why I can't seem to do it in this instance.

我也想获得的输出是:

result = pd.DataFrame({'win': [0, 0, 1, 1, 1, 0, 1], 'streak': ['NaN', 0 , 1, 2, 3, 0, 1]})
print(result)

   win streak
0    0    NaN
1    0      0 
2    1      1
3    1      2
4    1      3
5    0      0
6    1      1

感谢您帮助我摆脱困境.

Thanks for helping to get me unstuck.

推荐答案

使用pandas时,一个相当普遍的技巧是按连续值分组.这个技巧是在此处进行了详细描述.

A fairly common trick when using pandas is grouping by consecutive values. This trick is well-described here.

要解决您的特定问题,我们要groupby个连续值,然后使用cumsum,这意味着损失组(0组)的累积总和为0,而组(或1组)的获胜次数将跟踪获胜条纹.

To solve your particular problem, we want to groupby consecutive values, and then use cumsum, which means that groups of losses (groups of 0) will have a cumulative sum of 0, while groups of wins (or groups of 1) will track winning streaks.

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(grouper).cumsum()

   win  streak
0    0       0
1    0       0
2    1       1
3    1       2
4    1       3
5    0       0
6    1       1


为便于说明,这是我们的grouper Series,它使我们可以按10的连续区域进行分组:


For the sake of explanation, here is our grouper Series, which allows us to group by continuous regions of 1's and 0's:

print(grouper)

0    1
1    1
2    2
3    2
4    2
5    3
6    4
Name: win, dtype: int64

这篇关于计算与 pandas 的连胜纪录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆