Python-按月分组日期 [英] Python - Group Dates by Month

查看:82
本文介绍了Python-按月分组日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个快速的问题,我一开始就认为这很容易.一个小时后,我不太确定!
因此,我有一个Python datetime对象的列表,并且想要对其进行图形处理. x值是年份和月份,y值是该列表中本月发生的日期对象的数量.
也许有一个例子可以更好地说明这一点(dd/mm/yyyy):

Here's a quick problem that I, at first, dismissed as easy. An hour in, and I'm not so sure!
So, I have a list of Python datetime objects, and I want to graph them. The x-values are the year and month, and the y-values would be the amount of date objects in this list that happened in this month.
Perhaps an example will demonstrate this better (dd/mm/yyyy):

[28/02/2018, 01/03/2018, 16/03/2018, 17/05/2018] 
-> ([02/2018, 03/2018, 04/2018, 05/2018], [1, 2, 0, 1])

我的第一次尝试是按照以下方式简单地按日期和年份分组:

My first attempt tried to simply group by date and year, along the lines of:

import itertools
group = itertools.groupby(dates, lambda date: date.strftime("%b/%Y"))
graph = zip(*[(k, len(list(v)) for k, v in group]) # format the data for graphing

您可能已经注意到,它只会按列表中已经存在的日期进行分组.在上面的示例中,没有一个日期发生在4月这一事实将被忽略.

As you've probably noticed though, this will group only by dates that are already present in the list. In my example above, the fact that none of the dates occurred in April would have been overlooked.

接下来,我尝试查找开始日期和结束日期,并在它们之间的几个月内循环:

Next, I tried finding the starting and ending dates, and looping over the months between them:

import datetime
data = [[], [],]
for year in range(min_date.year, max_date.year):
    for month in range(min_date.month, max_date.month):
        k = datetime.datetime(year=year, month=month, day=1).strftime("%b/%Y")
        v = sum([1 for date in dates if date.strftime("%b/%Y") == k])
        data[0].append(k)
        data[1].append(v)

当然,这仅在min_date.month小于max_date.month时有效,而跨度多年则不一定.另外,它还很丑陋.

Of course, this only works if min_date.month is smaller than max_date.month which is not necessarily the case if they span multiple years. Also, its pretty ugly.

是否有一种优雅的方法?
预先感谢

Is there an elegant way of doing this?
Thanks in advance

编辑:要清楚,日期是datetime对象,而不是字符串.为了便于阅读,它们在这里看起来像字符串.

EDIT: To be clear, the dates are datetime objects, not strings. They look like strings here for the sake of being readable.

推荐答案

我建议使用 pandas :

I suggest use pandas:

import pandas as pd

dates = ['28/02/2018', '01/03/2018', '16/03/2018', '17/05/2018'] 

s = pd.to_datetime(pd.Series(dates), format='%d/%m/%Y')
s.index = s.dt.to_period('m')
s = s.groupby(level=0).size()

s = s.reindex(pd.period_range(s.index.min(), s.index.max(), freq='m'), fill_value=0)
print (s)
2018-02    1
2018-03    2
2018-04    0
2018-05    1
Freq: M, dtype: int64

s.plot.bar()

说明:

  1. 首先从 Series > s并转换 to_datetime s.
  2. 通过 Series.dt.to_period创建PeriodIndex
  3. 通过索引
  4. groupby ( level=0),并通过 GroupBy.size
  5. 通过 Series.reindex 通过 PeriodIndex 由最大值和最小值创建索引
  6. 最后的情节,例如用于酒吧- Series.plot.bar
  1. First create Series from list of dates and convert to_datetimes.
  2. Create PeriodIndex by Series.dt.to_period
  3. groupby by index (level=0) and get counts by GroupBy.size
  4. Add missing periods by Series.reindex by PeriodIndex created by max and min values of index
  5. Last plot, e.g. for bars - Series.plot.bar

这篇关于Python-按月分组日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆