按日汇总小时时间系列,通过pd.TimeGrouper('D');问题@时间戳00:00:00(24小时) [英] aggregating hourly time series by Day via pd.TimeGrouper('D'); issue @ timestamp 00:00:00 (hour 24)

查看:542
本文介绍了按日汇总小时时间系列,通过pd.TimeGrouper('D');问题@时间戳00:00:00(24小时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df:

 小时转换
日期时间
2016-05 -01 01:00:00 1 -0.02
2016-05-01 02:00:00 2 -0.01
2016-05-01 03:00:00 3 -0.02
2016- 05-01 04:00:00 4 -0.02
2016-05-01 05:00:00 5 -0.01
2016-05-01 06:00:00 6 -0.03
2016 -05-01 07:00:00 7 -0.10
2016-05-01 08:00:00 8 -0.09
2016-05-01 09:00:00 9 -0.08
2016-05-01 10:00:00 10 -0.10
2016-05-01 11:00:00 11 -0.12
2016-05-01 12:00:00 12 -0.14
2016-05-01 13:00:00 13 -0.17
2016-05-01 14:00:00 14 -0.16
2016-05-01 15:00:00 15 -0.15
2016-05-01 16:00:00 16 -0.15
2016-05-01 17:00:00 17 -0.17
2016-05-01 18:00:00 18 -0.16
2016-05-01 19:00:00 19 -0.18
2016-05-01 20:00:00 20 -0.17
2016-05-01 21:00:00 21 -0.14
2016-05-01 22:00:00 22 -0.16
2016-05-01 23:00:00 23 -0.08
2016-05-02 00:00:00 24 - 0.06

df.reset_inde x()。to_dict('rec'):
$ b

  [{'datetime':Timestamp('2016- 05-01 01:00:00'),'hour':1L,'rev':-0.02},
{'datetime':Timestamp('2016-05-01 02:00:00'), 'hour':2L,'rev':-0.01},
{'datetime':Timestamp('2016-05-01 03:00:00'),'hour':3L,'rev': - 0.02},
{'datetime':Timestamp('2016-05-01 04:00:00'),'hour':4L,'rev':-0.02},
{'datetime' :Timestamp('2016-05-01 05:00:00'),'hour':5L,'rev':-0.01},
{'datetime':Timestamp('2016-05-01 06: 00:00'),'hour':6L,'rev':-0.03},
{'datetime':Timestamp('2016-05-01 07:00:00'),'hour':7L ,'rev':-0.1},
{'datetime':Timestamp('2016-05-01 08:00:00'),'hour':8L,'rev':-0.09},
{'datetime':Timestamp('2016-05-01 09:00:00'),'hour':9L,'rev':-0.08},
{'datetime':Timestamp('2016 -05-01 10:00:00'),'小时':10L,'r ev':-0.1},
{'datetime':Timestamp('2016-05-01 11:00:00'),'hour':11L,'rev':-0.12},
{'datetime':Timestamp('2016-05-01 12:00:00'),'hour':12L,'rev':-0.14},
{'datetime':Timestamp('2016-05 -01 13:00:00'),'hour':13L,'rev':-0.17},
{'datetime':Timestamp('2016-05-01 14:00:00'),''小时:14L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 15:00:00'),'小时':15L,'rev':-0.15 },
{'datetime':Timestamp('2016-05-01 16:00:00'),'hour':16L,'rev':-0.15},
{'datetime': Timestamp('2016-05-01 17:00:00'),'hour':17L,'rev':-0.17},
{'datetime':Timestamp('2016-05-01 18:00 '','小时':18L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 19:00:00'),'小时':19L, 'rev':-0.18},
{'datetime':Timestamp('2016-05-01 20:00:00'),'hour':20L,'rev':-0.17},
''dateti我':Timestamp('2016-05-01 21:00:00'),'hour':21L,'rev':-0.14},
{'datetime':Timestamp('2016-05-01 '小时':22L,'rev':-0.16},
{'datetime':时间戳('2016-05-01 23:00:00'),'小时' :23L,'rev':-0.08},
{'datetime':Timestamp('2016-05-02 00:00:00'),'hour':24L,'rev':-0.06}]

df.set_index('datetime',inplace = True)

I希望通过 DAY 汇总数据。所以我这样做:

  dfgrped = df.groupby([pd.TimeGrouper('D')])

我想计算像总和这样的统计数据:

  dfgrped.agg(总和)

小时
日期时间
2016-05-01 276 -2.43
2016-05-02 24 -0.06

您可以看到 2016-05-01 2016-05-02



请注意,df中最后一小时的数据输入发生在2016-05-02 00:00:00,这意味着最后一小时的数据前一天,即每天24小时的数据点。

然而,考虑到日期时间戳,事情并不按照我的意图。我希望 2016-05-01 的所有 24小时小时合计。



我想,在小时结束时进行测量时,这种问题通常会在各种应用中出现。直到最后一个小时发生这个问题,发生在第二天的 00:00:00 时间戳。



如何在熊猫中解决这个问题?

解决方案

它看起来像是另一个黑客,但它应该做的工作:

 在[79]中:df.assign(t = df.datetime  -  pd.Timedelta(hours = 1))。drop('datetime',1).groupby(pd。 TimeGrouper('D',key ='t'))。sum()
Out [79]:
hour rev
t
2016-05-01 300 -2.49


df:

                    hour    rev
datetime        
2016-05-01 01:00:00 1   -0.02
2016-05-01 02:00:00 2   -0.01
2016-05-01 03:00:00 3   -0.02
2016-05-01 04:00:00 4   -0.02
2016-05-01 05:00:00 5   -0.01
2016-05-01 06:00:00 6   -0.03
2016-05-01 07:00:00 7   -0.10
2016-05-01 08:00:00 8   -0.09
2016-05-01 09:00:00 9   -0.08
2016-05-01 10:00:00 10  -0.10
2016-05-01 11:00:00 11  -0.12
2016-05-01 12:00:00 12  -0.14
2016-05-01 13:00:00 13  -0.17
2016-05-01 14:00:00 14  -0.16
2016-05-01 15:00:00 15  -0.15
2016-05-01 16:00:00 16  -0.15
2016-05-01 17:00:00 17  -0.17
2016-05-01 18:00:00 18  -0.16
2016-05-01 19:00:00 19  -0.18
2016-05-01 20:00:00 20  -0.17
2016-05-01 21:00:00 21  -0.14
2016-05-01 22:00:00 22  -0.16
2016-05-01 23:00:00 23  -0.08
2016-05-02 00:00:00 24  -0.06

df.reset_index().to_dict('rec'):

[{'datetime': Timestamp('2016-05-01 01:00:00'), 'hour': 1L, 'rev': -0.02},
 {'datetime': Timestamp('2016-05-01 02:00:00'), 'hour': 2L, 'rev': -0.01},
 {'datetime': Timestamp('2016-05-01 03:00:00'), 'hour': 3L, 'rev': -0.02},
 {'datetime': Timestamp('2016-05-01 04:00:00'), 'hour': 4L, 'rev': -0.02},
 {'datetime': Timestamp('2016-05-01 05:00:00'), 'hour': 5L, 'rev': -0.01},
 {'datetime': Timestamp('2016-05-01 06:00:00'), 'hour': 6L, 'rev': -0.03},
 {'datetime': Timestamp('2016-05-01 07:00:00'), 'hour': 7L, 'rev': -0.1},
 {'datetime': Timestamp('2016-05-01 08:00:00'), 'hour': 8L, 'rev': -0.09},
 {'datetime': Timestamp('2016-05-01 09:00:00'), 'hour': 9L, 'rev': -0.08},
 {'datetime': Timestamp('2016-05-01 10:00:00'), 'hour': 10L, 'rev': -0.1},
 {'datetime': Timestamp('2016-05-01 11:00:00'), 'hour': 11L, 'rev': -0.12},
 {'datetime': Timestamp('2016-05-01 12:00:00'), 'hour': 12L, 'rev': -0.14},
 {'datetime': Timestamp('2016-05-01 13:00:00'), 'hour': 13L, 'rev': -0.17},
 {'datetime': Timestamp('2016-05-01 14:00:00'), 'hour': 14L, 'rev': -0.16},
 {'datetime': Timestamp('2016-05-01 15:00:00'), 'hour': 15L, 'rev': -0.15},
 {'datetime': Timestamp('2016-05-01 16:00:00'), 'hour': 16L, 'rev': -0.15},
 {'datetime': Timestamp('2016-05-01 17:00:00'), 'hour': 17L, 'rev': -0.17},
 {'datetime': Timestamp('2016-05-01 18:00:00'), 'hour': 18L, 'rev': -0.16},
 {'datetime': Timestamp('2016-05-01 19:00:00'), 'hour': 19L, 'rev': -0.18},
 {'datetime': Timestamp('2016-05-01 20:00:00'), 'hour': 20L, 'rev': -0.17},
 {'datetime': Timestamp('2016-05-01 21:00:00'), 'hour': 21L, 'rev': -0.14},
 {'datetime': Timestamp('2016-05-01 22:00:00'), 'hour': 22L, 'rev': -0.16},
 {'datetime': Timestamp('2016-05-01 23:00:00'), 'hour': 23L, 'rev': -0.08},
 {'datetime': Timestamp('2016-05-02 00:00:00'), 'hour': 24L, 'rev': -0.06}]

df.set_index('datetime', inplace=True)

I want to aggregate the data by DAY. So I do:

dfgrped = df.groupby([pd.TimeGrouper('D')])

I want to compute stats like the sum:

dfgrped.agg(sum)

            hour    rev
datetime        
2016-05-01  276 -2.43
2016-05-02  24  -0.06

As you can see the aggregation occurs for 2016-05-01 and 2016-05-02.

Notice, that the last hourly data entry in df occurs at 2016-05-02 00:00:00, which is meant to be the data for the last hour of the previous day i.e. 24 hourly data points for each day.

However, given the datetime stamp, things don't work out the way I intended. I want all 24 hours to be aggregated for 2016-05-01.

I imagine this sort of issue must arise often in various applications when a measurement is taken at the end of the hour. This isn't a problem until the last hour, which occurs at the 00:00:00 timestamp of the following day.

How to address this issue in pandas?

解决方案

it looks like another hack, but it should do the job:

In [79]: df.assign(t=df.datetime - pd.Timedelta(hours=1)).drop('datetime',1).groupby(pd.TimeGrouper('D', key='t')).sum()
Out[79]:
            hour   rev
t
2016-05-01   300 -2.49

这篇关于按日汇总小时时间系列,通过pd.TimeGrouper('D');问题@时间戳00:00:00(24小时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆