如何按定义的时间间隔分组一个 pandas 数据框? [英] How to group a pandas dataframe by a defined time interval?

查看:86
本文介绍了如何按定义的时间间隔分组一个 pandas 数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 我有一个这样的数据框,我想每60分钟分组一次,并在06:30开始分组。数据
索引
2017-02-14 06:29:57 11198648
2017-02-14 06:30:01 11198650
2017-02-14 06:37:22 11198706
2017-02-14 23:11:13 11207728
2017-02-14 23:21:43 11207774
2017-02-14 23:22:36 11207776

我正在使用:

  df.groupby(pd.TimeGrouper(freq ='60Min'))

分组:

 数据
索引
2017-02-14 06:00:00 x1
2017-02-14 07:00:00 x2
2017-02-14 08:00:00 x3
2017-02-14 09:00:00 x4
2017-02- 14 10:00:00 x5

但我正在寻找这个结果:

 数据
索引
2017-02-14 06:30:00 x1
2017-02-14 07: 30:00 x 2
2017-02-14 08:30:00 x3
2017-02-14 09:30:00 x4
2017-02-14 10:30:00 x5

如何告诉函数在6:30以1小时为间隔开始分组?



如果 .groupby(pd.TimeGrouper(freq ='60Min'))无法完成,那么最好的方法是怎么做的? p>

敬礼并且非常感谢您提前致谢

解决方案

$ c> base = 30 连同 label ='right'参数 pd.Grouper



指定 label ='right'使时间段从6:30开始分组(高端)而不是5:30。
另外, base 设置为0 默认情况下为,因此需要将这些值抵消30以考虑日期的向前传播。



假设你想聚合每个子组的第一个元素,那么:

$ $ $ $ $ $ c $ df.groupby(pd.TimeGrouper(freq ='60Min',base = 30,label ='right'))。first()
#使用resample的同一事物 - df.resample('60Min',base = 30,label ='right')。 ()

产生:

 数据
索引
2017-02-14 06:30:00 11198648.0
2017-02-14 07:30:00 11198650.0
2017-02 -14 08:30:00 NaN
2017-02-14 09:30:00 NaN
2017-02-14 10:30:00 NaN
2017-02-14 11:30 :00 NaN
2017-02-14 12:30:00 NaN
2017-02-14 13:30:00 NaN
2017-02-1 4 14:30:00 NaN
2017-02-14 15:30:00 NaN
2017-02-14 16:30:00 NaN
2017-02-14 17:30: 00 NaN
2017-02-14 18:30:00 NaN
2017-02-14 19:30:00 NaN
2017-02-14 20:30:00 NaN
2017-02-14 21:30:00 NaN
2017-02-14 22:30:00 NaN
2017-02-14 23:30:00 11207728.0


I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.

                           data
index
2017-02-14 06:29:57    11198648
2017-02-14 06:30:01    11198650
2017-02-14 06:37:22    11198706
2017-02-14 23:11:13    11207728
2017-02-14 23:21:43    11207774
2017-02-14 23:22:36    11207776

I am using:

df.groupby(pd.TimeGrouper(freq='60Min'))

I get this grouping:

                      data
index       
2017-02-14 06:00:00     x1
2017-02-14 07:00:00     x2
2017-02-14 08:00:00     x3
2017-02-14 09:00:00     x4
2017-02-14 10:00:00     x5

but I am looking for this result:

                      data
index       
2017-02-14 06:30:00     x1
2017-02-14 07:30:00     x2
2017-02-14 08:30:00     x3
2017-02-14 09:30:00     x4
2017-02-14 10:30:00     x5

How can I tell the function to start grouping at 6:30 at one-hour intervals?

If it can not be done by the .groupby(pd.TimeGrouper(freq='60Min')), how is the best way to do it?

A salute and thanks very much in advance

解决方案

Use base=30 in conjunction with label='right' parameters in pd.Grouper.

Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30. Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.

Suppose, you want to aggregate the first element of every sub-group, then:

df.groupby(pd.TimeGrouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()

yields:

                           data
index                          
2017-02-14 06:30:00  11198648.0
2017-02-14 07:30:00  11198650.0
2017-02-14 08:30:00         NaN
2017-02-14 09:30:00         NaN
2017-02-14 10:30:00         NaN
2017-02-14 11:30:00         NaN
2017-02-14 12:30:00         NaN
2017-02-14 13:30:00         NaN
2017-02-14 14:30:00         NaN
2017-02-14 15:30:00         NaN
2017-02-14 16:30:00         NaN
2017-02-14 17:30:00         NaN
2017-02-14 18:30:00         NaN
2017-02-14 19:30:00         NaN
2017-02-14 20:30:00         NaN
2017-02-14 21:30:00         NaN
2017-02-14 22:30:00         NaN
2017-02-14 23:30:00  11207728.0

这篇关于如何按定义的时间间隔分组一个 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆