识别 Pandas DataFrame 列中连续出现的值 [英] Identifying consecutive occurrences of a value in a column of a pandas DataFrame

查看：149 发布时间：2021/6/13 20:02:03 python pandas

本文介绍了识别 Pandas DataFrame 列中连续出现的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的 df:

I have a df like so:

并且如果 Count 和 1 连续出现两次或多次，我想在新列中返回一个 1code>0 如果没有.因此，在新列中，每行将根据 Count 列中满足的条件获得 1.我想要的输出是:

and I want to return a 1 in a new column if there are two or more consecutive occurrences of 1 in Count and a 0 if there is not. So in the new column each row would get a 1 based on this criteria being met in the column Count. My desired output would then be:

Count  New_Value
1      0 
0      0
1      1
1      1
0      0
0      0
1      1
1      1 
1      1
0      0

我想我可能需要使用 itertools 但我一直在阅读它并且还没有遇到我需要的东西.我希望能够使用这种方法来计算任意数量的连续出现次数，而不仅仅是 2 次.例如，有时我需要计算连续出现的 10 次，我在此处的示例中仅使用 2.

I am thinking I may need to use itertools but I have been reading about it and haven't come across what I need yet. I would like to be able to use this method to count any number of consecutive occurrences, not just 2 as well. For example, sometimes I need to count 10 consecutive occurrences, I just use 2 in the example here.

推荐答案

您可以:

df['consecutive'] = df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count

获得:

   Count  consecutive
0      1            1
1      0            0
2      1            2
3      1            2
4      0            0
5      0            0
6      1            3
7      1            3
8      1            3
9      0            0

从这里你可以，对于任何阈值:

From here you can, for any threshold:

threshold = 2
df['consecutive'] = (df.consecutive > threshold).astype(int)

获得:

   Count  consecutive
0      1            0
1      0            0
2      1            1
3      1            1
4      0            0
5      0            0
6      1            1
7      1            1
8      1            1
9      0            0

或者，在一个步骤中:

(df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)

在效率方面，当问题规模增大时，使用 pandas 方法提供了显着的加速:

In terms of efficiency, using pandas methods provides a significant speedup when the size of the problem grows:

 df = pd.concat([df for _ in range(1000)])

%timeit (df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)
1000 loops, best of 3: 1.47 ms per loop

相比:

%%timeit
l = []
for k, g in groupby(df.Count):
    size = sum(1 for _ in g)
    if k == 1 and size >= 2:
        l = l + [1]*size
    else:
        l = l + [0]*size    
pd.Series(l)

10 loops, best of 3: 76.7 ms per loop

这篇关于识别 Pandas DataFrame 列中连续出现的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

识别 Pandas DataFrame 列中连续出现的值 [英] Identifying consecutive occurrences of a value in a column of a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

识别 Pandas DataFrame 列中连续出现的值 [英] Identifying consecutive occurrences of a value in a column of a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭