pandas :用groupby重新采样时间序列 [英] Pandas: resample timeseries with groupby
问题描述
In [115]:times = pd.to_datetime(pd.Series(['p' 2014-08-25 21:00:00','2014-08-25 21:04:00',
'2014-08-25 22:07:00','2014-08-25 22: 09:00']))
locations = ['HK','LDN','LDN','LDN']
event = ['foo','bar','baz',' qux']
df = pd.DataFrame({'Location':locations,
'Event':event},index = times)
df
Out [115]:
活动地点
2014-08-25 21:00:00 foo HK
2014-08-25 21:04:00 bar LDN
2014-08-25 22:07: 00 baz LDN
2014-08-25 22:09:00 qux LDN
我会就像重新采样数据,按位计数每小时聚合一次,然后按位置进行分组,产生如下所示的数据框:
Out [115]:
HK LDN
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2
我尝试了resample()和groupby()的各种组合,但没有运气。我会如何去解决这个问题?
在我原来的文章中,我建议使用 pd.TimeGrouper
/ code>。 现在,使用
pd.Grouper
而不是 pd.TimeGrouper
。语法基本相同,但 TimeGrouper
现已被弃用青睐 pd.Grouper
。 此外,虽然 pd.TimeGrouper
只能按日期时间索引进行分组, pd.Grouper
可以按日期时间列分组,您可以通过 key
parameter a>。
您可以使用 pd.Grouper
可以按小时对DatetimeIndex的DataFrame进行分组: / p>
grouper = df.groupby([pd.Grouper('1H'),'Location'])
使用 count
来计算每个组中的事件数量: p>
石斑鱼['事件] .count()
#位置
#2014-08-25 21:00:00 HK 1
#LDN 1
#2014-08-25 22:00:00 LDN 2
#名称:Event,dtype:int64
使用 unstack
将 Location
索引级别移至列级别:
grouper [ 'Event']。count()。unstack()
#退出[49]:
#位置HK LDN
#2014-08-25 21:00:00 1 1
#2014-08-25 22:00:00 NaN 2
然后使用<$ c $
把所有的东西放在一起,
grouper = df.groupby([pd.Grouper('1H'),'Location'])
result = grouper ['Event']。count()。unstack('Location')。fillna(0)
地点HK LDN
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2
Given the below pandas DataFrame:
In [115]: times = pd.to_datetime(pd.Series(['2014-08-25 21:00:00','2014-08-25 21:04:00',
'2014-08-25 22:07:00','2014-08-25 22:09:00']))
locations = ['HK', 'LDN', 'LDN', 'LDN']
event = ['foo', 'bar', 'baz', 'qux']
df = pd.DataFrame({'Location': locations,
'Event': event}, index=times)
df
Out[115]:
Event Location
2014-08-25 21:00:00 foo HK
2014-08-25 21:04:00 bar LDN
2014-08-25 22:07:00 baz LDN
2014-08-25 22:09:00 qux LDN
I would like resample the data to aggregate it hourly by count while grouping by location to produce a data frame that looks like this:
Out[115]:
HK LDN
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2
I've tried various combinations of resample() and groupby() but with no luck. How would I go about this?
In my original post, I suggested using pd.TimeGrouper
.
Nowadays, use pd.Grouper
instead of pd.TimeGrouper
. The syntax is largely the same, but TimeGrouper
is now deprecated in favor of pd.Grouper
.
Moreover, while pd.TimeGrouper
could only group by DatetimeIndex, pd.Grouper
can group by datetime columns which you can specify through the key
parameter.
You could use a pd.Grouper
to group the DatetimeIndex'ed DataFrame by hour:
grouper = df.groupby([pd.Grouper('1H'), 'Location'])
use count
to count the number of events in each group:
grouper['Event'].count()
# Location
# 2014-08-25 21:00:00 HK 1
# LDN 1
# 2014-08-25 22:00:00 LDN 2
# Name: Event, dtype: int64
use unstack
to move the Location
index level to a column level:
grouper['Event'].count().unstack()
# Out[49]:
# Location HK LDN
# 2014-08-25 21:00:00 1 1
# 2014-08-25 22:00:00 NaN 2
and then use fillna
to change the NaNs into zeros.
Putting it all together,
grouper = df.groupby([pd.Grouper('1H'), 'Location'])
result = grouper['Event'].count().unstack('Location').fillna(0)
yields
Location HK LDN
2014-08-25 21:00:00 1 1
2014-08-25 22:00:00 0 2
这篇关于 pandas :用groupby重新采样时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!