来自numpy ndarray的datetime对象的Python数组 [英] Python array of datetime objects from numpy ndarray

查看:280
本文介绍了来自numpy ndarray的datetime对象的Python数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有numpy ndarray,其中包含两列:一个是日期,例如2011-08-04,另一个是时间,例如19:00:00:081。



如何将它们组合成一个datetime对象数组?目前,它们是numpy数组中的字符串。

解决方案

如果 example.txt 数据文件作为一列没有分隔的空格,然后 genfromtxt 可以转换它成为一个datetime对象,如下所示:

  import numpy as np 
import datetime as dt
def mkdate (text):
return dt.datetime.strptime(text,'%Y-%m-%dT%H:%M:%S:%f')
data = np.genfromtxt(
'example.txt',
names =('data','num','date')+ tuple('col {i}'。format(i = i)for i in range )),
converters = {'date':mkdate},
dtype = None)






给定 example.txt 原样,您可以使用

$ b形成所需的numpy数组
$ b

  import numpy as np 
import datetime as dt
import csv

def mkdate(text)
return dt.datetime.strptime(text,'%Y-%m-%d%H:%M:%S:%f')

def using_csv(fname):
desc =([('data','| S4'),('num',' [('col {i}对于范围(19)中的i).format(i = i),'< f8'))
打开(fname,'r')为f:
reader = csv.reader ,delimiter ='\t')
data = np.array([tuple(row [:2] + [mkdate(''。join(row [2:4]))] + row [ ])
在读者中的行],
dtype = desc)
#print(mc.report_memory())
返回数据

合并数组中的两列可能是一个缓慢的操作,特别是如果数组很大。这是因为合并(如调整大小)需要为新阵列分配内存,并将数据从原始阵列复制到新阵列。所以我认为值得尝试直接形成正确的numpy数组,而不是分阶段(通过形成部分正确的数组并合并两列)。






顺便说一下,我测试了上述 csv 代码,合并了两列(如下)。从 csv (上)形成单个数组的速度更快(内存使用量大致相同):

  import matplotlib.cbook as mc 
import numpy as np
import datetime as dt

def using_genfromtxt(fname):
data = np.genfromtxt(fname,dtype = None)

orig_desc = data.dtype.descr
view_desc = orig_desc [:2] + [('date','| S22')] + orig_desc [4:]
new_desc = orig_desc [:2] + [('date','| O4')] + orig_desc [4:]

newdata = np.empty data.shape,dtype = new_desc)
fields = data.dtype.names
fields = fields [:2] + fields [4:]
在字段中的字段:
newdata [field] = data [field]

newdata ['date'] = np.vectorize(mkdate)(data.view(view_desc)['date'])
#print(mc .report_memory())

返回newdata

#using_csv('example4096.txt')
#using_genfromtxt('example4096.txt')

example4096.t xt example.txt 相同,重复4096次。大约12K行长。

 %python -mtimeit -s'importport''test.using_genfromtxt(example4096.txt )'
10循环,最好3:1.92秒每循环

%python -mtimeit -s'导入测试''test.using_csv(example4096.txt)'
10循环,最佳3:982毫秒每循环


I have numpy ndarray which contains two columns: one is date, e.g. 2011-08-04, another one is time, e.g. 19:00:00:081.

How can I combine them into one array of datetime objects? Currently, they're strings in numpy array.

解决方案

If the date and time string in the example.txt data file were given as one column with no separating whitespace, then genfromtxt could convert it into a datetime object like this:

import numpy as np
import datetime as dt
def mkdate(text):
    return dt.datetime.strptime(text, '%Y-%m-%dT%H:%M:%S:%f')    
data = np.genfromtxt(
    'example.txt',
    names=('data','num','date')+tuple('col{i}'.format(i=i) for i in range(19)),
    converters={'date':mkdate},
    dtype=None)


Given example.txt as it is, you could form the desired numpy array with

import numpy as np
import datetime as dt
import csv

def mkdate(text):
    return dt.datetime.strptime(text, '%Y-%m-%d%H:%M:%S:%f')    

def using_csv(fname):
    desc=([('data', '|S4'), ('num', '<i4'), ('date', '|O4')]+
          [('col{i}'.format(i=i), '<f8') for i in range(19)])
    with open(fname,'r') as f:
        reader=csv.reader(f,delimiter='\t')
        data=np.array([tuple(row[:2]+[mkdate(''.join(row[2:4]))]+row[4:])
                       for row in reader],
                      dtype=desc)
    # print(mc.report_memory())        
    return data

Merging two columns in a numpy array can be a slow operation especially if the array is large. That's because merging, like resizing, requires allocating memory for a new array, and copying data from the original array to the new one. So I think it is worth trying to form the correct numpy array directly, instead of in stages (by forming a partially correct array and merging two columns).


By the way, I tested the above csv code versus merging two columns (below). Forming a single array from csv (above) was faster (and the memory usage was about the same):

import matplotlib.cbook as mc
import numpy as np
import datetime as dt

def using_genfromtxt(fname):
    data = np.genfromtxt(fname, dtype=None)

    orig_desc=data.dtype.descr
    view_desc=orig_desc[:2]+[('date','|S22')]+orig_desc[4:]
    new_desc=orig_desc[:2]+[('date','|O4')]+orig_desc[4:]

    newdata = np.empty(data.shape, dtype=new_desc)
    fields=data.dtype.names
    fields=fields[:2]+fields[4:]
    for field in fields:
        newdata[field] = data[field]

    newdata['date']=np.vectorize(mkdate)(data.view(view_desc)['date'])
    # print(mc.report_memory())

    return newdata  

# using_csv('example4096.txt')
# using_genfromtxt('example4096.txt')

example4096.txt is the same as example.txt, duplicated 4096 times. It's about 12K lines long.

% python -mtimeit -s'import test' 'test.using_genfromtxt("example4096.txt")'
10 loops, best of 3: 1.92 sec per loop

% python -mtimeit -s'import test' 'test.using_csv("example4096.txt")'
10 loops, best of 3: 982 msec per loop

这篇关于来自numpy ndarray的datetime对象的Python数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆