计算 pandas 数据帧索引之间的时差 [英] Calculate time difference between Pandas Dataframe indices
问题描述
我试图添加一个deltaT列到一个数据框,其中deltaT是连续行之间的时间差(在时间序列中被索引)。
I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).
value
time
2012-03-16 23:50:00 1
2012-03-16 23:56:00 2
2012-03-17 00:08:00 3
2012-03-17 00:10:00 4
2012-03-17 00:12:00 5
2012-03-17 00:20:00 6
2012-03-20 00:43:00 7
期望的结果是类似以下内容(以分钟表示的deltaT单位):
Desired result is something like the following (deltaT units shown in minutes):
value deltaT
time
2012-03-16 23:50:00 1 0
2012-03-16 23:56:00 2 6
2012-03-17 00:08:00 3 12
2012-03-17 00:10:00 4 2
2012-03-17 00:12:00 5 2
2012-03-17 00:20:00 6 8
2012-03-20 00:43:00 7 23
注意这是使用numpy> = 1.7,对于numpy< 1.7,请参阅此处的转换: http://pandas.pydata。 org / pandas-docs / dev / timeseries.html#time-deltas
Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas
您的原始框架,包含日期时间索引
Your original frame, with a datetime index
In [196]: df
Out[196]:
value
2012-03-16 23:50:00 1
2012-03-16 23:56:00 2
2012-03-17 00:08:00 3
2012-03-17 00:10:00 4
2012-03-17 00:12:00 5
2012-03-17 00:20:00 6
2012-03-20 00:43:00 7
In [199]: df.index
Out[199]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
Length: 7, Freq: None, Timezone: None
这里是你想要的timedelta64
Here is the timedelta64 of what you want
In [200]: df['tvalue'] = df.index
In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
In [202]: df
Out[202]:
value tvalue delta
2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00
2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00
2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00
2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00
2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00
2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00
2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00
在忽略天差(你最后一天是3/20,以前是3/17)实际上是棘手的
Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky
In [204]: df['ans'] = df['delta'].apply(lambda x: x / np.timedelta64(1,'m')).astype('int64') % (24*60)
In [205]: df
Out[205]:
value tvalue delta ans
2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00 0
2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00 6
2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00 12
2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00 2
2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00 2
2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00 8
2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00 23
这篇关于计算 pandas 数据帧索引之间的时差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!