python大 pandas - 创建一个列,保持连续值的运行计数 [英] python pandas - creating a column which keeps a running count of consecutive values

查看:404
本文介绍了python大 pandas - 创建一个列,保持连续值的运行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个列(consec),它将在不使用循环的情况下保持其他(二进制)中连续值的运行计数。这就是想要的结果:

I am trying to create a column ("consec") which will keep a running count of consecutive values in another ("binary") without using loop. This is what the desired outcome would look like:

.    binary consec
1       0      0
2       1      1
3       1      2
4       1      3
5       1      4
5       0      0
6       1      1
7       1      2
8       0      0

然而,这...

df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']

结果...

.  binary   consec
0     1       NaN
1     1       1
2     1       1
3     0       0
4     1       1
5     0       0
6     1       1
7     1       1
8     1       1
9     0       0

我看到使用分组或排序的其他帖子,但不幸的是,我看不出这对我有用。感谢您的帮助。

I see other posts which use grouping or sorting, but unfortunately, I don't see how that could work for me. Thanks in advance for your help.

推荐答案

您可以使用compare-cumsum-groupby模式(我真的需要随身携带文件),最后一个 cumcount

You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount:

>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
   binary  consec
0       0       0
1       1       1
2       1       2
3       1       3
4       0       0
5       0       0
6       1       1
7       1       2
8       0       0






这是有效的,因为我们首先得到要重置计数器的位置:


This works because first we get the positions where we want to reset the counter:

>>> (df["binary"] == 0)
0     True
1    False
2    False
3    False
4     True
5     True
6    False
7    False
8     True
Name: binary, dtype: bool

这些累积总和为每个组提供了不同的ID:

The cumulative sum of these gives us a different id for each group:

>>> (df["binary"] == 0).cumsum()
0    1
1    1
2    1
3    1
4    2
5    3
6    3
7    3
8    4
Name: binary, dtype: int64

然后我们可以将其传递给 groupby 并使用 cumcount 来增加每个组中的索引。

And then we can pass this to groupby and use cumcount to get an increasing index in each group.

这篇关于python大 pandas - 创建一个列,保持连续值的运行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆