pandas :用groupby重新采样时间序列 [英] Pandas: resample timeseries with groupby

查看:618
本文介绍了 pandas :用groupby重新采样时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出下面的熊猫DataFrame:

  In [115]:times = pd.to_datetime(pd.Series(['p' 2014-08-25 21:00:00','2014-08-25 21:04:00',
'2014-08-25 22:07:00','2014-08-25 22: 09:00']))
locations = ['HK','LDN','LDN','LDN']
event = ['foo','bar','baz',' qux']
df = pd.DataFrame({'Location':locations,
'Event':event},index = times)
df
Out [115]:
活动地点
2014-08-25 21:00:00 foo HK
2014-08-25 21:04:00 bar LDN
2014-08-25 22:07: 00 baz LDN
2014-08-25 22:09:00 qux LDN

我会就像重新采样数据,按位计数每小时聚合一次,然后按位置进行分组,产生如下所示的数据框:

  Out [115]:
HK LDN
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2

我尝试了resample()和groupby()的各种组合,但没有运气。我会如何去解决这个问题?

在我原来的文章中,我建议使用 pd.TimeGrouper / code>。
现在,使用 pd.Grouper 而不是 pd.TimeGrouper 。语法基本相同,但 TimeGrouper 现已被弃用青睐 pd.Grouper



此外,虽然 pd.TimeGrouper 只能按日期时间索引进行分组, pd.Grouper 可以按日期时间列分组,您可以通过 key parameter a>。






您可以使用 pd.Grouper 可以按小时对DatetimeIndex的DataFrame进行分组: / p>

  grouper = df.groupby([pd.Grouper('1H'),'Location'])

使用 count 来计算每个组中的事件数量: p>

 石斑鱼['事件] .count()
#位置
#2014-08-25 21:00:00 HK 1
#LDN 1
#2014-08-25 22:00:00 LDN 2
#名称:Event,dtype:int64

使用 unstack Location 索引级别移至列级别:

  grouper [ 'Event']。count()。unstack()
#退出[49]:
#位置HK LDN
#2014-08-25 21:00:00 1 1
#2014-08-25 22:00:00 NaN 2

然后使用<$ c $






把所有的东西放在一起,



  grouper = df.groupby([pd.Grouper('1H'),'Location'])
result = grouper ['Event']。count()。unstack('Location')。fillna(0)





 地点HK LDN 
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2


Given the below pandas DataFrame:

In [115]: times = pd.to_datetime(pd.Series(['2014-08-25 21:00:00','2014-08-25 21:04:00',
                                            '2014-08-25 22:07:00','2014-08-25 22:09:00']))
          locations = ['HK', 'LDN', 'LDN', 'LDN']
          event = ['foo', 'bar', 'baz', 'qux']
          df = pd.DataFrame({'Location': locations,
                             'Event': event}, index=times)
          df
Out[115]:
                               Event Location
          2014-08-25 21:00:00  foo   HK
          2014-08-25 21:04:00  bar   LDN
          2014-08-25 22:07:00  baz   LDN
          2014-08-25 22:09:00  qux   LDN

I would like resample the data to aggregate it hourly by count while grouping by location to produce a data frame that looks like this:

Out[115]:
                               HK    LDN
          2014-08-25 21:00:00  1     1
          2014-08-25 22:00:00  0     2

I've tried various combinations of resample() and groupby() but with no luck. How would I go about this?

解决方案

In my original post, I suggested using pd.TimeGrouper. Nowadays, use pd.Grouper instead of pd.TimeGrouper. The syntax is largely the same, but TimeGrouper is now deprecated in favor of pd.Grouper.

Moreover, while pd.TimeGrouper could only group by DatetimeIndex, pd.Grouper can group by datetime columns which you can specify through the key parameter.


You could use a pd.Grouper to group the DatetimeIndex'ed DataFrame by hour:

grouper = df.groupby([pd.Grouper('1H'), 'Location'])

use count to count the number of events in each group:

grouper['Event'].count()
#                      Location
# 2014-08-25 21:00:00  HK          1
#                      LDN         1
# 2014-08-25 22:00:00  LDN         2
# Name: Event, dtype: int64

use unstack to move the Location index level to a column level:

grouper['Event'].count().unstack()
# Out[49]: 
# Location             HK  LDN
# 2014-08-25 21:00:00   1    1
# 2014-08-25 22:00:00 NaN    2

and then use fillna to change the NaNs into zeros.


Putting it all together,

grouper = df.groupby([pd.Grouper('1H'), 'Location'])
result = grouper['Event'].count().unstack('Location').fillna(0)

yields

Location             HK  LDN
2014-08-25 21:00:00   1    1
2014-08-25 22:00:00   0    2

这篇关于 pandas :用groupby重新采样时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆