pandas ：用groupby重新采样时间序列 [英] Pandas: resample timeseries with groupby

查看：618 发布时间：2018/5/30 13:37:53 python pandas group-by time-series

本文介绍了 pandas ：用groupby重新采样时间序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出下面的熊猫DataFrame：

  In [115]：times = pd.to_datetime（pd.Series（['p' 2014-08-25 21:00:00'，'2014-08-25 21:04:00'，
'2014-08-25 22:07:00'，'2014-08-25 22： 09:00']））
 locations = ['HK'，'LDN'，'LDN'，'LDN'] 
 event = ['foo'，'bar'，'baz'，' qux'] 
 df = pd.DataFrame（{'Location'：locations，
'Event'：event}，index = times）
 df 
 Out [115]：
活动地点
 2014-08-25 21:00:00 foo HK 
 2014-08-25 21:04:00 bar LDN 
 2014-08-25 22:07： 00 baz LDN 
 2014-08-25 22:09:00 qux LDN

我会就像重新采样数据，按位计数每小时聚合一次，然后按位置进行分组，产生如下所示的数据框：

  Out [115]：
 HK LDN 
 2014-08-25 21:00:00 1 1 
 2014-08-25 22:00:00 0 2

我尝试了resample（）和groupby（）的各种组合，但没有运气。我会如何去解决这个问题？

在我原来的文章中，我建议使用 pd.TimeGrouper / code>。现在，使用 pd.Grouper 而不是 pd.TimeGrouper 。语法基本相同，但 TimeGrouper 现已被弃用青睐 pd.Grouper 。此外，虽然 pd.TimeGrouper 只能按日期时间索引进行分组， pd.Grouper 可以按日期时间列分组，您可以通过 key parameter a>。您可以使用 pd.Grouper 可以按小时对DatetimeIndex的DataFrame进行分组： / p> grouper = df.groupby（[pd.Grouper（'1H'），'Location']）使用 count 来计算每个组中的事件数量： p> 石斑鱼['事件] .count（）＃位置＃2014-08-25 21:00:00 HK 1 ＃LDN 1 ＃2014-08-25 22:00:00 LDN 2 ＃名称：Event，dtype：int64 使用 unstack 将 Location 索引级别移至列级别： grouper [ 'Event']。count（）。unstack（）＃退出[49]：＃位置HK LDN ＃2014-08-25 21:00:00 1 1 ＃2014-08-25 22:00:00 NaN 2 然后使用<$ c $ 把所有的东西放在一起， grouper = df.groupby（[pd.Grouper（'1H'），'Location']） result = grouper ['Event']。count（）。unstack（'Location'）。fillna（0）地点HK LDN 2014-08-25 21:00:00 1 1 2014-08-25 22:00:00 0 2 Given the below pandas DataFrame: In [115]: times = pd.to_datetime(pd.Series(['2014-08-25 21:00:00','2014-08-25 21:04:00', '2014-08-25 22:07:00','2014-08-25 22:09:00'])) locations = ['HK', 'LDN', 'LDN', 'LDN'] event = ['foo', 'bar', 'baz', 'qux'] df = pd.DataFrame({'Location': locations, 'Event': event}, index=times) df Out[115]: Event Location 2014-08-25 21:00:00 foo HK 2014-08-25 21:04:00 bar LDN 2014-08-25 22:07:00 baz LDN 2014-08-25 22:09:00 qux LDN I would like resample the data to aggregate it hourly by count while grouping by location to produce a data frame that looks like this: Out[115]: HK LDN 2014-08-25 21:00:00 1 1 2014-08-25 22:00:00 0 2 I've tried various combinations of resample() and groupby() but with no luck. How would I go about this? 解决方案 In my original post, I suggested using pd.TimeGrouper. Nowadays, use pd.Grouper instead of pd.TimeGrouper. The syntax is largely the same, but TimeGrouper is now deprecated in favor of pd.Grouper. Moreover, while pd.TimeGrouper could only group by DatetimeIndex, pd.Grouper can group by datetime columns which you can specify through the key parameter. You could use a pd.Grouper to group the DatetimeIndex'ed DataFrame by hour: grouper = df.groupby([pd.Grouper('1H'), 'Location']) use count to count the number of events in each group: grouper['Event'].count() # Location # 2014-08-25 21:00:00 HK 1 # LDN 1 # 2014-08-25 22:00:00 LDN 2 # Name: Event, dtype: int64 use unstack to move the Location index level to a column level: grouper['Event'].count().unstack() # Out[49]: # Location HK LDN # 2014-08-25 21:00:00 1 1 # 2014-08-25 22:00:00 NaN 2 and then use fillna to change the NaNs into zeros. Putting it all together, grouper = df.groupby([pd.Grouper('1H'), 'Location']) result = grouper['Event'].count().unstack('Location').fillna(0) yields Location HK LDN 2014-08-25 21:00:00 1 1 2014-08-25 22:00:00 0 2 这篇关于 pandas ：用groupby重新采样时间序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas ：用groupby重新采样时间序列 [英] Pandas: resample timeseries with groupby

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：用groupby重新采样时间序列 [英] Pandas: resample timeseries with groupby

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭