pandas 事件之间的时间序列时间 [英] Pandas time series time between events
问题描述
如何计算熊猫时间序列中事件"之间的时间(天数)?例如,如果我有以下时间序列,我想知道该序列中的每一天,自上一个TRUE
How can I calculate the time (number of days) between "events" in a Pandas time series? For example, if I have the below time series I'd like to know on each day in the series how many days have passed since the last TRUE
event
2010-01-01 False
2010-01-02 True
2010-01-03 False
2010-01-04 False
2010-01-05 True
2010-01-06 False
我做的方法似乎太复杂了,所以我希望有一些更优雅的方法.显然,在行上进行for循环迭代是可行的,但是我正在寻找理想的矢量化(可伸缩)解决方案.我目前的尝试如下:
The way I've done it seems overcomplicated, so I'm hoping for something more elegant. Obviously a for loop iterating over the rows would work, but I'm looking for a vectorized (scalable) solution ideally. My current attempt below:
date_range = pd.date_range('2010-01-01', '2010-01-06')
df = pd.DataFrame([False, True, False, False, True, False], index=date_range, columns=['event'])
event_dates = df.index[df['event']]
df2 = pd.DataFrame(event_dates, index=event_dates, columns=['max_event_date'])
df = df.join(df2)
df['max_event_date'] = df['max_event_date'].cummax(axis=0, skipna=False)
df['days_since_event'] = df.index - df['max_event_date']
event max_event_date days_since_event
2010-01-01 False NaT NaT
2010-01-02 True 2010-01-02 0 days
2010-01-03 False 2010-01-02 1 days
2010-01-04 False 2010-01-02 2 days
2010-01-05 True 2010-01-05 0 days
2010-01-06 False 2010-01-05 1 days
推荐答案
继续改进此答案,并希望有人采用"the" pythonic方式.在此之前,我认为此最终更新效果最好.
Continuing to improve on this answer, and hoping that someone comes in with 'the' pythonic way. Until then, I think this final update works best.
last = pd.to_datetime(np.nan)
def elapsed(row):
if not row.event:
return row.name - last
else:
global last
last = row.name
return row.name-last
df['elapsed'] = df.apply(elapsed,axis=1)
df
event elapsed
2010-01-01 False NaT
2010-01-02 True 0 days
2010-01-03 False 1 days
2010-01-04 False 2 days
2010-01-05 True 0 days
2010-01-06 False 1 days
:::::::::::::
:::::::::::::
将先前的答案放在次优状态,但将下面的答案保留在下面
Leaving previous answers below although they are sub-optimal
:::::::::
:::::::::
与其遍历多个遍历,不如遍历索引似乎更容易
Instead of making multiple passes through, seems easier to to just loop through the indexes
df['elapsed'] = 0
for i in df.index[1:]:
if not df['event'][i]:
df['elapsed'][i] = df['elapsed'][i-1] + 1
::::::::::::
::::::::::::
让我们说真实"是您感兴趣的事件.
Let's say 'Trues' are your event of interest.
trues = df[df.event==True]
trues.Dates = trues.index #need this because .diff() doesn't work on the index
trues.Elapsed = trues.Dates.diff()
这篇关于 pandas 事件之间的时间序列时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!