pandas group by月份和年份 [英] Pandas groupby month and year
问题描述
我有以下数据框:
Date abc xyz
01-Jun-13 100 200
03-Jun-13 -20 50
15-Aug-13 40 -5
20-Jan-14 25 15
21-Feb-14 60 80
我需要按年份和月份对数据进行分组.例如:按2013年1月,2013年2月,2013年3月等分组. 我将使用新分组的数据来创建一个显示每年/每月abc vs xyz的图.
I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.
我尝试了groupby和sum的各种组合,但似乎什么都无法工作.
I've tried various combinations of groupby and sum but just can't seem to get anything to work.
谢谢您的帮助.
推荐答案
您可以使用重采样或Grouper
(在内部进行重采样).
You can use either resample or Grouper
(which resamples under the hood).
首先确保datetime列实际上是datetimes(用pd.to_datetime
命中).如果它是一个DatetimeIndex,会更容易:
First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime
). It's easier if it's a DatetimeIndex:
In [11]: df1
Out[11]:
abc xyz
Date
2013-06-01 100 200
2013-06-03 -20 50
2013-08-15 40 -5
2014-01-20 25 15
2014-02-21 60 80
In [12]: g = df1.groupby(pd.Grouper(freq="M")) # DataFrameGroupBy (grouped by Month)
In [13]: g.sum()
Out[13]:
abc xyz
Date
2013-06-30 80 250
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
In [14]: df1.resample("M", how='sum') # the same
Out[14]:
abc xyz
Date
2013-06-30 40 125
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
注意:以前pd.Grouper(freq="M")
以前写为pd.TimeGrouper("M")
.后者从0.21开始不推荐使用.
Note: Previously pd.Grouper(freq="M")
was written as pd.TimeGrouper("M")
. The latter is now deprecated since 0.21.
我曾以为以下方法会起作用,但不会(由于as_index
不被尊重?我不确定.).出于兴趣考虑,我将其包括在内.
I had thought the following would work, but it doesn't (due to as_index
not being respected? I'm not sure.). I'm including this for interest's sake.
如果它是一列(必须是datetime64列!就像我说的那样,用to_datetime
击中它),则可以使用PeriodIndex:
If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime
), you can use the PeriodIndex:
In [21]: df
Out[21]:
Date abc xyz
0 2013-06-01 100 200
1 2013-06-03 -20 50
2 2013-08-15 40 -5
3 2014-01-20 25 15
4 2014-02-21 60 80
In [22]: pd.DatetimeIndex(df.Date).to_period("M") # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M
In [23]: per = df.Date.dt.to_period("M") # new way to get the same
In [24]: g = df.groupby(per)
In [25]: g.sum() # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
abc xyz
2013-06 80 250
2013-08 40 -5
2014-01 25 15
2014-02 60 80
要获得理想的结果,我们必须重新编制索引...
To get the desired result we have to reindex...
这篇关于 pandas group by月份和年份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!