如何计算 pandas 数据框中满足布尔条件的时间间隔的数量? [英] How to count the number of time intervals that meet a boolean condition within a pandas dataframe?

查看:66
本文介绍了如何计算 pandas 数据框中满足布尔条件的时间间隔的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫df,在column1中有一个时间序列,在column2中有一个布尔条件.这描述了满足特定条件的连续时间间隔.请注意,时间间隔的长度不相等.

I have a pandas df with a time series in column1, and a boolean condition in column2. This describes continuous time intervals that meet a specific condition. Note that the time intervals are of unequal length.

Timestamp   Boolean_condition
1           1
2           1
3           0
4           1
5           1
6           1
7           0
8           0
9           1
10          0

如何计算整个系列中满足此条件的时间间隔总数?

所需的输出应如下所示:

The desired output should look like this:

Timestamp   Boolean_condition   Event_number
1           1                   1
2           1                   1
3           0                   NaN
4           1                   2
5           1                   2
6           1                   2
7           0                   NaN
8           0                   NaN
9           1                   3
10          0                   NaN

推荐答案

您可以使用Series .cumsum.html"rel =" nofollow noreferrer> cumsum 中的两个masks,然后通过函数

You can create Series with cumsum of two masks and then create NaN by function Series.mask:

mask0 = df.Boolean_condition.eq(0)
mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
print ((mask2 & mask0).cumsum().add(1))
0    1
1    1
2    2
3    2
4    2
5    2
6    3
7    3
8    3
9    4
Name: Boolean_condition, dtype: int32

df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
print (df)
   Timestamp  Boolean_condition  Event_number
0          1                  1           1.0
1          2                  1           1.0
2          3                  0           NaN
3          4                  1           2.0
4          5                  1           2.0
5          6                  1           2.0
6          7                  0           NaN
7          8                  0           NaN
8          9                  1           3.0
9         10                  0           NaN

时间:

#[100000 rows x 2 columns
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()

def nick(df):
    isone = df.Boolean_condition[df.Boolean_condition.eq(1)]
    idx = isone.index
    grp = (isone != idx.to_series().diff().eq(1)).cumsum()
    df.loc[idx, 'Event_number'] = pd.Categorical(grp).codes + 1
    return df

def jez(df):
    mask0 = df.Boolean_condition.eq(0)
    mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
    df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
    return (df)

def jez1(df):
    mask0 = ~df.Boolean_condition
    mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
    df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
    return (df)

In [68]: %timeit (jez1(df))
100 loops, best of 3: 6.45 ms per loop

In [69]: %timeit (nick(df1))
100 loops, best of 3: 12 ms per loop

In [70]: %timeit (jez(df2))
100 loops, best of 3: 5.34 ms per loop

这篇关于如何计算 pandas 数据框中满足布尔条件的时间间隔的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆