pandas 事件之间的时间序列时间 [英] Pandas time series time between events

查看:84
本文介绍了 pandas 事件之间的时间序列时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何计算熊猫时间序列中事件"之间的时间(天数)?例如,如果我有以下时间序列,我想知道该序列中的每一天,自上一个TRUE

How can I calculate the time (number of days) between "events" in a Pandas time series? For example, if I have the below time series I'd like to know on each day in the series how many days have passed since the last TRUE

            event
2010-01-01  False
2010-01-02   True
2010-01-03  False
2010-01-04  False
2010-01-05   True
2010-01-06  False

我做的方法似乎太复杂了,所以我希望有一些更优雅的方法.显然,在行上进行for循环迭代是可行的,但是我正在寻找理想的矢量化(可伸缩)解决方案.我目前的尝试如下:

The way I've done it seems overcomplicated, so I'm hoping for something more elegant. Obviously a for loop iterating over the rows would work, but I'm looking for a vectorized (scalable) solution ideally. My current attempt below:

date_range = pd.date_range('2010-01-01', '2010-01-06')
df = pd.DataFrame([False, True, False, False, True, False], index=date_range, columns=['event'])
event_dates = df.index[df['event']]
df2 = pd.DataFrame(event_dates, index=event_dates, columns=['max_event_date'])
df = df.join(df2)
df['max_event_date'] = df['max_event_date'].cummax(axis=0, skipna=False)
df['days_since_event'] = df.index - df['max_event_date']

            event max_event_date  days_since_event
2010-01-01  False            NaT               NaT
2010-01-02   True     2010-01-02            0 days
2010-01-03  False     2010-01-02            1 days
2010-01-04  False     2010-01-02            2 days
2010-01-05   True     2010-01-05            0 days
2010-01-06  False     2010-01-05            1 days

推荐答案

继续改进此答案,并希望有人采用"the" pythonic方式.在此之前,我认为此最终更新效果最好.

Continuing to improve on this answer, and hoping that someone comes in with 'the' pythonic way. Until then, I think this final update works best.

last = pd.to_datetime(np.nan)
def elapsed(row):
    if not row.event:
        return row.name - last
    else:
        global last
        last = row.name
        return row.name-last

df['elapsed'] = df.apply(elapsed,axis=1)

df
            event  elapsed
2010-01-01  False      NaT
2010-01-02   True   0 days
2010-01-03  False   1 days
2010-01-04  False   2 days
2010-01-05   True   0 days
2010-01-06  False   1 days

:::::::::::::

:::::::::::::

将先前的答案放在次优状态,但将下面的答案保留在下面

Leaving previous answers below although they are sub-optimal

:::::::::

:::::::::

与其遍历多个遍历,不如遍历索引似乎更容易

Instead of making multiple passes through, seems easier to to just loop through the indexes

df['elapsed'] = 0
for i in df.index[1:]:
    if not df['event'][i]:
        df['elapsed'][i] = df['elapsed'][i-1] + 1

::::::::::::

::::::::::::

让我们说真实"是您感兴趣的事件.

Let's say 'Trues' are your event of interest.

trues = df[df.event==True]
trues.Dates = trues.index #need this because .diff() doesn't work on the index
trues.Elapsed = trues.Dates.diff()

这篇关于 pandas 事件之间的时间序列时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆