python大 pandas - 创建一个列,保持连续值的运行计数 [英] python pandas - creating a column which keeps a running count of consecutive values
问题描述
我正在尝试创建一个列(consec),它将在不使用循环的情况下保持其他(二进制)中连续值的运行计数。这就是想要的结果:
I am trying to create a column ("consec") which will keep a running count of consecutive values in another ("binary") without using loop. This is what the desired outcome would look like:
. binary consec
1 0 0
2 1 1
3 1 2
4 1 3
5 1 4
5 0 0
6 1 1
7 1 2
8 0 0
然而,这...
df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']
结果...
. binary consec
0 1 NaN
1 1 1
2 1 1
3 0 0
4 1 1
5 0 0
6 1 1
7 1 1
8 1 1
9 0 0
我看到使用分组或排序的其他帖子,但不幸的是,我看不出这对我有用。感谢您的帮助。
I see other posts which use grouping or sorting, but unfortunately, I don't see how that could work for me. Thanks in advance for your help.
推荐答案
您可以使用compare-cumsum-groupby模式(我真的需要随身携带文件),最后一个 cumcount
:
You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount
:
>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
binary consec
0 0 0
1 1 1
2 1 2
3 1 3
4 0 0
5 0 0
6 1 1
7 1 2
8 0 0
这是有效的,因为我们首先得到要重置计数器的位置:
This works because first we get the positions where we want to reset the counter:
>>> (df["binary"] == 0)
0 True
1 False
2 False
3 False
4 True
5 True
6 False
7 False
8 True
Name: binary, dtype: bool
这些累积总和为每个组提供了不同的ID:
The cumulative sum of these gives us a different id for each group:
>>> (df["binary"] == 0).cumsum()
0 1
1 1
2 1
3 1
4 2
5 3
6 3
7 3
8 4
Name: binary, dtype: int64
然后我们可以将其传递给 groupby
并使用 cumcount
来增加每个组中的索引。
And then we can pass this to groupby
and use cumcount
to get an increasing index in each group.
这篇关于python大 pandas - 创建一个列,保持连续值的运行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!