按日汇总小时时间系列,通过pd.TimeGrouper('D');问题@时间戳00:00:00(24小时) [英] aggregating hourly time series by Day via pd.TimeGrouper('D'); issue @ timestamp 00:00:00 (hour 24)
问题描述
df:
小时转换
日期时间
2016-05 -01 01:00:00 1 -0.02
2016-05-01 02:00:00 2 -0.01
2016-05-01 03:00:00 3 -0.02
2016- 05-01 04:00:00 4 -0.02
2016-05-01 05:00:00 5 -0.01
2016-05-01 06:00:00 6 -0.03
2016 -05-01 07:00:00 7 -0.10
2016-05-01 08:00:00 8 -0.09
2016-05-01 09:00:00 9 -0.08
2016-05-01 10:00:00 10 -0.10
2016-05-01 11:00:00 11 -0.12
2016-05-01 12:00:00 12 -0.14
2016-05-01 13:00:00 13 -0.17
2016-05-01 14:00:00 14 -0.16
2016-05-01 15:00:00 15 -0.15
2016-05-01 16:00:00 16 -0.15
2016-05-01 17:00:00 17 -0.17
2016-05-01 18:00:00 18 -0.16
2016-05-01 19:00:00 19 -0.18
2016-05-01 20:00:00 20 -0.17
2016-05-01 21:00:00 21 -0.14
2016-05-01 22:00:00 22 -0.16
2016-05-01 23:00:00 23 -0.08
2016-05-02 00:00:00 24 - 0.06
df.reset_inde x()。to_dict('rec'):
$ b
[{'datetime':Timestamp('2016- 05-01 01:00:00'),'hour':1L,'rev':-0.02},
{'datetime':Timestamp('2016-05-01 02:00:00'), 'hour':2L,'rev':-0.01},
{'datetime':Timestamp('2016-05-01 03:00:00'),'hour':3L,'rev': - 0.02},
{'datetime':Timestamp('2016-05-01 04:00:00'),'hour':4L,'rev':-0.02},
{'datetime' :Timestamp('2016-05-01 05:00:00'),'hour':5L,'rev':-0.01},
{'datetime':Timestamp('2016-05-01 06: 00:00'),'hour':6L,'rev':-0.03},
{'datetime':Timestamp('2016-05-01 07:00:00'),'hour':7L ,'rev':-0.1},
{'datetime':Timestamp('2016-05-01 08:00:00'),'hour':8L,'rev':-0.09},
{'datetime':Timestamp('2016-05-01 09:00:00'),'hour':9L,'rev':-0.08},
{'datetime':Timestamp('2016 -05-01 10:00:00'),'小时':10L,'r ev':-0.1},
{'datetime':Timestamp('2016-05-01 11:00:00'),'hour':11L,'rev':-0.12},
{'datetime':Timestamp('2016-05-01 12:00:00'),'hour':12L,'rev':-0.14},
{'datetime':Timestamp('2016-05 -01 13:00:00'),'hour':13L,'rev':-0.17},
{'datetime':Timestamp('2016-05-01 14:00:00'),''小时:14L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 15:00:00'),'小时':15L,'rev':-0.15 },
{'datetime':Timestamp('2016-05-01 16:00:00'),'hour':16L,'rev':-0.15},
{'datetime': Timestamp('2016-05-01 17:00:00'),'hour':17L,'rev':-0.17},
{'datetime':Timestamp('2016-05-01 18:00 '','小时':18L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 19:00:00'),'小时':19L, 'rev':-0.18},
{'datetime':Timestamp('2016-05-01 20:00:00'),'hour':20L,'rev':-0.17},
''dateti我':Timestamp('2016-05-01 21:00:00'),'hour':21L,'rev':-0.14},
{'datetime':Timestamp('2016-05-01 '小时':22L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 23:00:00'),'小时' :23L,'rev':-0.08},
{'datetime':Timestamp('2016-05-02 00:00:00'),'hour':24L,'rev':-0.06}]
df.set_index('datetime',inplace = True)
I希望通过 DAY 汇总数据。所以我这样做:
dfgrped = df.groupby([pd.TimeGrouper('D')])
我想计算像总和这样的统计数据:
dfgrped.agg(总和)
小时
日期时间
2016-05-01 276 -2.43
2016-05-02 24 -0.06
您可以看到 2016-05-01
和 2016-05-02
。
请注意,df中最后一小时的数据输入发生在2016-05-02 00:00:00,这意味着最后一小时的数据前一天,即每天24小时的数据点。
然而,考虑到日期时间戳,事情并不按照我的意图。我希望 24小时
小时合计。
我想,在小时结束时进行测量时,这种问题通常会在各种应用中出现。直到最后一个小时发生这个问题,发生在第二天的 00:00:00
时间戳。
如何在熊猫中解决这个问题?
它看起来像是另一个黑客,但它应该做的工作:
在[79]中:df.assign(t = df.datetime - pd.Timedelta(hours = 1))。drop('datetime',1).groupby(pd。 TimeGrouper('D',key ='t'))。sum()
Out [79]:
hour rev
t
2016-05-01 300 -2.49
df:
hour rev
datetime
2016-05-01 01:00:00 1 -0.02
2016-05-01 02:00:00 2 -0.01
2016-05-01 03:00:00 3 -0.02
2016-05-01 04:00:00 4 -0.02
2016-05-01 05:00:00 5 -0.01
2016-05-01 06:00:00 6 -0.03
2016-05-01 07:00:00 7 -0.10
2016-05-01 08:00:00 8 -0.09
2016-05-01 09:00:00 9 -0.08
2016-05-01 10:00:00 10 -0.10
2016-05-01 11:00:00 11 -0.12
2016-05-01 12:00:00 12 -0.14
2016-05-01 13:00:00 13 -0.17
2016-05-01 14:00:00 14 -0.16
2016-05-01 15:00:00 15 -0.15
2016-05-01 16:00:00 16 -0.15
2016-05-01 17:00:00 17 -0.17
2016-05-01 18:00:00 18 -0.16
2016-05-01 19:00:00 19 -0.18
2016-05-01 20:00:00 20 -0.17
2016-05-01 21:00:00 21 -0.14
2016-05-01 22:00:00 22 -0.16
2016-05-01 23:00:00 23 -0.08
2016-05-02 00:00:00 24 -0.06
df.reset_index().to_dict('rec'):
[{'datetime': Timestamp('2016-05-01 01:00:00'), 'hour': 1L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 02:00:00'), 'hour': 2L, 'rev': -0.01},
{'datetime': Timestamp('2016-05-01 03:00:00'), 'hour': 3L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 04:00:00'), 'hour': 4L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 05:00:00'), 'hour': 5L, 'rev': -0.01},
{'datetime': Timestamp('2016-05-01 06:00:00'), 'hour': 6L, 'rev': -0.03},
{'datetime': Timestamp('2016-05-01 07:00:00'), 'hour': 7L, 'rev': -0.1},
{'datetime': Timestamp('2016-05-01 08:00:00'), 'hour': 8L, 'rev': -0.09},
{'datetime': Timestamp('2016-05-01 09:00:00'), 'hour': 9L, 'rev': -0.08},
{'datetime': Timestamp('2016-05-01 10:00:00'), 'hour': 10L, 'rev': -0.1},
{'datetime': Timestamp('2016-05-01 11:00:00'), 'hour': 11L, 'rev': -0.12},
{'datetime': Timestamp('2016-05-01 12:00:00'), 'hour': 12L, 'rev': -0.14},
{'datetime': Timestamp('2016-05-01 13:00:00'), 'hour': 13L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 14:00:00'), 'hour': 14L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 15:00:00'), 'hour': 15L, 'rev': -0.15},
{'datetime': Timestamp('2016-05-01 16:00:00'), 'hour': 16L, 'rev': -0.15},
{'datetime': Timestamp('2016-05-01 17:00:00'), 'hour': 17L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 18:00:00'), 'hour': 18L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 19:00:00'), 'hour': 19L, 'rev': -0.18},
{'datetime': Timestamp('2016-05-01 20:00:00'), 'hour': 20L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 21:00:00'), 'hour': 21L, 'rev': -0.14},
{'datetime': Timestamp('2016-05-01 22:00:00'), 'hour': 22L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 23:00:00'), 'hour': 23L, 'rev': -0.08},
{'datetime': Timestamp('2016-05-02 00:00:00'), 'hour': 24L, 'rev': -0.06}]
df.set_index('datetime', inplace=True)
I want to aggregate the data by DAY. So I do:
dfgrped = df.groupby([pd.TimeGrouper('D')])
I want to compute stats like the sum:
dfgrped.agg(sum)
hour rev
datetime
2016-05-01 276 -2.43
2016-05-02 24 -0.06
As you can see the aggregation occurs for 2016-05-01
and 2016-05-02
.
Notice, that the last hourly data entry in df occurs at 2016-05-02 00:00:00, which is meant to be the data for the last hour of the previous day i.e. 24 hourly data points for each day.
However, given the datetime stamp, things don't work out the way I intended. I want all 24
hours to be aggregated for 2016-05-01
.
I imagine this sort of issue must arise often in various applications when a measurement is taken at the end of the hour. This isn't a problem until the last hour, which occurs at the 00:00:00
timestamp of the following day.
How to address this issue in pandas?
it looks like another hack, but it should do the job:
In [79]: df.assign(t=df.datetime - pd.Timedelta(hours=1)).drop('datetime',1).groupby(pd.TimeGrouper('D', key='t')).sum()
Out[79]:
hour rev
t
2016-05-01 300 -2.49
这篇关于按日汇总小时时间系列,通过pd.TimeGrouper('D');问题@时间戳00:00:00(24小时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!