计算事件与 pandas 之间的持续时间 [英] Calculate duration between events with pandas

查看:69
本文介绍了计算事件与 pandas 之间的持续时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框

df = pd.DataFrame([['2018-07-02', 'B'],
 ['2018-07-03', 'A'],
 ['2018-07-06', 'B'],
 ['2018-07-08', 'B'],
 ['2018-07-09', 'A'],
 ['2018-07-09', 'A'],
 ['2018-07-10', 'A'],
 ['2018-07-12', 'B'],
 ['2018-07-15', 'A'],
 ['2018-07-16', 'A'],
 ['2018-07-18', 'B'],
 ['2018-07-22', 'A'],
 ['2018-07-25', 'B'],
 ['2018-07-25', 'B'],
 ['2018-07-27', 'A'],
 ['2018-07-28', 'A']], columns = ['DateEvent','Event'])

从事件A开始计数到事件B结束.某些事件可能在一天以上开始,而在一天以上结束.

where counting starts with event A and ends with event B. Some events could start on more than one day and end on more than one day.

我已经计算出差异:

df = df.set_index('DateEvent')
begin = df.loc[df['Event'] == 'A'].index
cutoffs = df.loc[df['Event'] == 'B'].index

idx = cutoffs.searchsorted(begin)
mask = idx < len(cutoffs)
idx = idx[mask]
begin = begin[mask]
end = cutoffs[idx]

pd.DataFrame({'begin':begin, 'end':end})

但我在多个起点和终点也得到了不同:

but I get the difference for multiple starts and ends also:

begin         end
0  2018-07-03  2018-07-06
1  2018-07-09  2018-07-12
2  2018-07-09  2018-07-12
3  2018-07-10  2018-07-12
4  2018-07-15  2018-07-18
5  2018-07-16  2018-07-18
6  2018-07-22  2018-07-25

所需的输出包括事件A的第一次发生和事件B的最后发生...为了确保最大持续时间,这是肯定的.

The desired output includes the first occurrence of event A and the last occurrence of event B... looking for maximum duration, just to be sure.

我可以在删除不必要的事件之前或之后循环,但是有没有更好,更Python化的方式?

I could loop before or after to delete the unnecessary events, but is there a nicer, more pythonic way?

谢谢

Aleš

我一直在groupby中成功地使用该代码作为函数.但是它不干净,需要一些时间.如何重写代码以将组包含在df中?

I've been using the code sucessfully as a function in a groupby. But it's not clean and it does take some time. How can I rewrite the code to include the group in the df?

df = pd.DataFrame([['2.07.2018', 1, 'B'],
['3.07.2018', 1, 'A'],
['3.07.2018', 2, 'A'],
['6.07.2018', 2, 'B'],
['8.07.2018', 2, 'B'],
['9.07.2018', 2, 'A'],
['9.07.2018', 2, 'A'],
['9.07.2018', 2, 'B'],
['9.07.2018', 3, 'A'],
['10.07.2018', 3, 'A'],
['10.07.2018', 3, 'B'],
['12.07.2018', 3, 'B'],
['15.07.2018', 3, 'A'],
['16.07.2018', 4, 'A'],
['16.07.2018', 4, 'B'],
['18.07.2018', 4, 'B'],
['18.07.2018', 4, 'A'],
['22.07.2018', 5, 'A'],
['25.07.2018', 5, 'B'],
['25.07.2018', 7, 'B'],
['25.07.2018', 7, 'A'],
['25.07.2018', 7, 'B'],
['27.07.2018', 9, 'A'],
['28.07.2018', 9, 'A'],
['28.07.2018', 9, 'B']], columns = ['DateEvent','Group','Event'])

我正在尝试以某种方式在一个组上进行cumsum组合,但是无法获得预期的结果.

I'm trying to somehow do a combination of cumsum on a group, but cannot get the desired results.

谢谢!

推荐答案

让我们尝试:

df = pd.DataFrame([['2018-07-02', 'B'],
 ['2018-07-03', 'A'],
 ['2018-07-06', 'B'],
 ['2018-07-08', 'B'],
 ['2018-07-09', 'A'],
 ['2018-07-09', 'A'],
 ['2018-07-10', 'A'],
 ['2018-07-12', 'B'],
 ['2018-07-15', 'A'],
 ['2018-07-16', 'A'],
 ['2018-07-18', 'B'],
 ['2018-07-22', 'A'],
 ['2018-07-25', 'B'],
 ['2018-07-25', 'B'],
 ['2018-07-27', 'A'],
 ['2018-07-28', 'A']], columns = ['DateEvent','Event'])

a = (df['Event'] != 'A').cumsum()
a = a.groupby(a).cumcount()
df['Event Group'] = (a == 1).cumsum()

df_out = df.groupby('Event Group').filter(lambda x: set(x['Event']) == set(['A','B']))\
           .groupby('Event Group')['DateEvent'].agg(['first','last'])\
           .rename(columns={'first':'start','last':'end'})\
           .reset_index()

print(df_out)

输出:

   Event Group       start         end
0            1  2018-07-03  2018-07-08
1            2  2018-07-09  2018-07-12
2            3  2018-07-15  2018-07-18
3            4  2018-07-22  2018-07-25

编辑

a = (df['Event'] != 'A').cumsum().mask(df['Event'] != 'A')
df['Event Group'] = a.ffill()
df_out = df.groupby('Event Group').filter(lambda x: set(x['Event']) == set(['A','B']))\
           .groupby('Event Group')['DateEvent'].agg(['first','last'])\
           .rename(columns={'first':'start','last':'end'})\
           .reset_index()

这篇关于计算事件与 pandas 之间的持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆