pandas group by月份和年份 [英] Pandas groupby month and year

查看:153
本文介绍了 pandas group by月份和年份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

我需要按年份和月份对数据进行分组.例如:按2013年1月,2013年2月,2013年3月等分组. 我将使用新分组的数据来创建一个显示每年/每月abc vs xyz的图.

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

我尝试了groupby和sum的各种组合,但似乎什么都无法工作.

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

谢谢您的帮助.

推荐答案

您可以使用重采样或Grouper(在内部进行重采样).

You can use either resample or Grouper (which resamples under the hood).

首先确保datetime列实际上是datetimes(用pd.to_datetime命中).如果它是一个DatetimeIndex,会更容易:

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

注意:以前pd.Grouper(freq="M")以前写为pd.TimeGrouper("M").后者从0.21开始不推荐使用.

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.

我曾以为以下方法会起作用,但不会(由于as_index不被尊重?我不确定.).出于兴趣考虑,我将其包括在内.

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.). I'm including this for interest's sake.

如果它是一列(必须是datetime64列!就像我说的那样,用to_datetime击中它),则可以使用PeriodIndex:

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

要获得理想的结果,我们必须重新编制索引...

To get the desired result we have to reindex...

这篇关于 pandas group by月份和年份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆