Python会按年份+月份分割数据表列表 [英] Python split a list of datetimes by year + month

查看:1219
本文介绍了Python会按年份+月份分割数据表列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下csv文件:

 #模拟一个csv文件
从StringIO import StringIO
data = StringIO(
2012-04-01,00:10,A,10
2012-04-01,00:20,B,11
2012-04- 01,00:30,B,12
2012-04-02,00:10,A,18
2012-05-02,00:20,A,14
2012-05 -02,00:30,B,11
2012-05-03,00:10,A,10
2012-06-03,00:20,B,13
2012- 06-03,00:30,C,12
.strip())

我想在年+月加类别(即A,B,C)进行gropu。



我希望最终的数据按月分组然后按类别
作为原始数据的视图

  2012-04,A 

>> array [0,] => 2012-04-01,00:10,A,10

>> array [3,] => 2012-04-02,00:10,A,18

2012-04,B

>> array [1,] => 2012-04-01,00:20,B,11

>> array [2,] => 2012-04-01,00:30,B,12

2012-05,A

>> array [4,] => 2012-05-02,00:20,A,14

...

然后,对于每个组,我想迭代使用相同的功能来绘制它们。



我已经看到一个类似的问题,按日期分割天数
将datetimes列表分成几天
,我可以在我的情况下这样a)。但是有一些问题会导致一年+一个月的拆分,如果b)。



这是我到目前为止我遇到的问题的代码片段:

 #! / usr / bin / python 

import numpy as np
import csv
import os
from datetime import datetime

def strToDate(string )
d = datetime.strptime(string,'%Y-%m-%d')
return d;

def strToMonthDate(string):
d = datetime.strptime(string,'%Y-%m-%d')
d_by_month = datetime(d.year,d。月,1)
return d_by_month;

#模拟一个csv文件
从StringIO import StringIO
data = StringIO(
2012-04-01,00:10,A,10
2012-04-01,00:20,B,11
2012-04-01,00:30,B,12
2012-04-02,00:10,A,18
2012-05-02,00:20,A,14
2012-05-02,00:30,B,11
2012-05-03,00:10,A, 10
2012-06-03,00:20,B,13
2012-06-03,00:30,C,12
.strip())

arr = np.genfromtxt(data,delimiter =',',dtype = object)


#a)如果我们只按日期分组
#获取唯一日期
#keys = np.unique(arr [:,0])
#keys1 = np.unique(arr [:,2])
#按唯一日期分组$键
#打印键
#在key1中的key1:
#group = arr [(arr [:,0] == key)& (arr [:,2] == key1)]
#if group.size:
#print\t+ key1
#print group
#print\\ \\ n

#b)但是,如果我们要按年份+月份分组
dates_by_month = np.array(map(strToMonthDate,arr [:,0]))
keys2 = np.unique(dates_by_month)
打印日期_by_month
#>> [datetime.datetime(2012,4,1,0,0),datetime.datetime(2012,4,1,0,0),...
打印\\\

打印键2
#>> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]

key2中的键:
打印键
打印类型(键)
group = arr [dates_by_month == key]
打印组
打印\\\

问题:我获得每月密钥,但对于该组,我所得到的是[2012-04-01 00:10 A 10]。 key2中的键的类型为datetime.datetime。任何想法可能是错的?欢迎任何替代实施建议。我不想使用itertools.groupby解决方案,因为它返回一个迭代器而不是一个数组,这不太适合绘图。



Edit1: / strong>问题解决了。问题是,在b)的情况下,我预先索引的dates_by_month应该初始化为np.array而不是列表,该映射返回dates_by_month = np.array(map(strToMonthDate,arr [:,0]))。我已经在上面的代码段中修复了它,现在的例子就是这个例子。

解决方案

我发现问题在我原来的解决方案。



如果b),

  dates_by_month = map(strToMonthDate ,arr [:,0])

返回一个列表,而不是一个numpy数组。提前索引:

  group = arr [dates_by_month == key] 


将无法正常工作。如果相反,我有:

  dates_by_month = np.array(map(strToMonthDate,arr [:,0]))

然后分组按预期工作。


I have the following csv files:

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())

which I would like to gropu by year+month plus category (ie. A, B, C).

I would like the final data to have grouping by month and then by category as a view of the original data

2012-04, A

>>  array[0,] => 2012-04-01,00:10, A, 10

>>  array[3,] => 2012-04-02,00:10, A, 18

2012-04, B

>>  array[1,] => 2012-04-01,00:20, B, 11

>>  array[2,] => 2012-04-01,00:30, B, 12

2012-05, A

>>  array[4,] => 2012-05-02,00:20, A, 14

...

And then for each group, I would like iterate to plot them using the same function.

I have seen a similar question on splitting by dates by days Split list of datetimes into days and I am able to to so in my case a). But having some issues turning that into a year+month split in case b).

Here is the snippet that I have so far with the issue that I am running into:

#! /usr/bin/python

import numpy as np
import csv
import os
from  datetime import datetime

def strToDate(string):
    d = datetime.strptime(string, '%Y-%m-%d')
    return d;

def strToMonthDate(string):
    d = datetime.strptime(string, '%Y-%m-%d')
    d_by_month = datetime(d.year,d.month,1)
    return d_by_month;

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())

arr = np.genfromtxt(data, delimiter=',', dtype=object)


# a) If we were to just group by dates
# Get unique dates
#keys = np.unique(arr[:,0])
#keys1 = np.unique(arr[:,2])
# Group by unique dates
#for key in keys:
#   print key   
#   for key1 in keys1:      
#       group = arr[ (arr[:,0]==key) & (arr[:,2]==key1) ]                       
#       if group.size:
#           print "\t" + key1
#           print group
#   print "\n"      

# b) But if we want to group by year+month in the dates 
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
keys2 = np.unique(dates_by_month)
print dates_by_month
# >> [datetime.datetime(2012, 4, 1, 0, 0), datetime.datetime(2012, 4, 1, 0, 0), ...
print "\n"  
print keys2
# >> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]

for key in keys2:
    print key       
     print type(key)
    group = arr[dates_by_month==key]
        print group
    print "\n"  

Question: I get the monthly key but for the group, all I get is [2012-04-01 00:10 A 10] for each group. key in keys2 is of type datetime.datetime. Any idea what could be wrong? Any alternative implementations suggestions are welcome. I would prefer not to use a itertools.groupby solution, as it returns an iterator rather than an array, which is less suitable for plotting.

Edit1: Problem solved. The issue was that the dates_by_month that I used in advance indexing in case b) should be initialized as an np.array instead of a list which map returns dates_by_month = np.array(map(strToMonthDate, arr[:,0])). I have fixed it in the snippet above, and the example now works.

解决方案

I found where the issue was in my original solution.

In case b), the

dates_by_month = map(strToMonthDate, arr[:,0]) 

returns a list instead of a numpy array. The advance indexing:

group = arr[dates_by_month==key]

therefore would not work. If instead, I have:

dates_by_month = np.array(map(strToMonthDate, arr[:,0]))

then the grouping works as expected.

这篇关于Python会按年份+月份分割数据表列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆