pandas 在条件为真时测量经过的时间 [英] Pandas measure elapsed time when condition is true

查看:82
本文介绍了 pandas 在条件为真时测量经过的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

                 dt binary
2016-01-01 00:00:00  False
2016-01-01 00:00:01  False
2016-01-01 00:00:02  False
2016-01-01 00:00:03  False
2016-01-01 00:00:04   True
2016-01-01 00:00:05   True
2016-01-01 00:00:06   True
2016-01-01 00:00:07  False
2016-01-01 00:00:08  False
2016-01-01 00:00:09   True
2016-01-01 00:00:10   True

我想总结binaryTrue时的经过时间.我正在共享实现该解决方案的解决方案,但有一些事情告诉我应该有一种更简单的方法,因为它是时间序列数据的相当基本的功能.请注意,数据很可能是等距的,但我不能依靠它.

I would like to sum the elapsed time when binary is True. I'm sharing my solution, which implements it, but something tells me there should be an easier way since it is a pretty basic feature of time series data. Note that the data is most probably equidistant, but I can't rely on that.

df['binary_grp'] = (df.binary.diff(1) != False).astype(int).cumsum()
# Throw away False values
df = df[df.binary]
groupby = df.groupby('binary_grp')
df = pd.DataFrame({'timespan': groupby.dt.last() - groupby.dt.first()})
return df.timespan.sum().seconds / 60.0

最棘手的部分可能是第一行.它所做的基本上是为每个连续的块分配一个递增的数字.之后的数据如下:

The trickiest part is probably the first line. What it does, it basically assigns an incremented number to each consecutive block. Here's how the data looks like after that:

                 dt binary  binary_grp
2016-01-01 00:00:00  False           1
2016-01-01 00:00:01  False           1
2016-01-01 00:00:02  False           1
2016-01-01 00:00:03  False           1
2016-01-01 00:00:04   True           2
2016-01-01 00:00:05   True           2
2016-01-01 00:00:06   True           2
2016-01-01 00:00:07  False           3
2016-01-01 00:00:08  False           3
2016-01-01 00:00:09   True           4
2016-01-01 00:00:10   True           4

是否有更好的方法来做到这一点?我想这段代码是高性能的,我担心的是可读性.

Is there a better way to accomplish this? I guess this code is performant, my worry is readability.

推荐答案

IIUC:

您想找到整个系列的时间总和,其中binaryTrue.

You want to find the sum of time spanned across the entire series where binary is True.

但是,我们必须做出一些选择或假设

However, we have to make some choices or assumptions

                    dt  binary
0  2016-01-01 00:00:00   False
1  2016-01-01 00:00:01   False
2  2016-01-01 00:00:02   False
3  2016-01-01 00:00:03   False
4  2016-01-01 00:00:04    True # <- This where time starts
5  2016-01-01 00:00:05    True
6  2016-01-01 00:00:06    True
7  2016-01-01 00:00:07   False # <- And ends here. So this would
8  2016-01-01 00:00:08   False # be 00:00:07 - 00:00:04 or 3 seconds
9  2016-01-01 00:00:09    True # <- Starts again
10 2016-01-01 00:00:10    True # <- But ends here because
                               # I don't have another Timestamp

基于这些假设,我们可以使用diff,乘法和sum

With those assumptions, we can use diff, multiply, and sum

df.dt.diff().shift(-1).mul(df.binary).sum()

Timedelta('0 days 00:00:04')


然后我们可以将这个概念与groupby

# Use xor and cumsum to identify change in True to False and False to True
grps = (df.binary ^ df.binary.shift()).cumsum()
mask = df.binary.groupby(grps).first()
df.dt.diff().shift(-1).groupby(grps).sum()[mask]

binary
1   00:00:03
3   00:00:01
Name: dt, dtype: timedelta64[ns]

或者没有面具

pd.concat([df.dt.diff().shift(-1).groupby(grps).sum(), mask], axis=1)

             dt  binary
binary                 
0      00:00:04   False
1      00:00:03    True
2      00:00:02   False
3      00:00:01    True

这篇关于 pandas 在条件为真时测量经过的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆