来自numpy ndarray的datetime对象的Python数组 [英] Python array of datetime objects from numpy ndarray
问题描述
如何将它们组合成一个datetime对象数组?目前,它们是numpy数组中的字符串。
如果 example.txt 数据文件作为一列没有分隔的空格,然后 genfromtxt
可以转换它成为一个datetime对象,如下所示:
import numpy as np
import datetime as dt
def mkdate (text):
return dt.datetime.strptime(text,'%Y-%m-%dT%H:%M:%S:%f')
data = np.genfromtxt(
'example.txt',
names =('data','num','date')+ tuple('col {i}'。format(i = i)for i in range )),
converters = {'date':mkdate},
dtype = None)
给定 example.txt
原样,您可以使用
$ b
import numpy as np
import datetime as dt
import csv
def mkdate(text)
return dt.datetime.strptime(text,'%Y-%m-%d%H:%M:%S:%f')
def using_csv(fname):
desc =([('data','| S4'),('num',' [('col {i}对于范围(19)中的i).format(i = i),'< f8'))
打开(fname,'r')为f:
reader = csv.reader ,delimiter ='\t')
data = np.array([tuple(row [:2] + [mkdate(''。join(row [2:4]))] + row [ ])
在读者中的行],
dtype = desc)
#print(mc.report_memory())
返回数据
合并数组中的两列可能是一个缓慢的操作,特别是如果数组很大。这是因为合并(如调整大小)需要为新阵列分配内存,并将数据从原始阵列复制到新阵列。所以我认为值得尝试直接形成正确的numpy数组,而不是分阶段(通过形成部分正确的数组并合并两列)。
顺便说一下,我测试了上述 csv
代码,合并了两列(如下)。从 csv
(上)形成单个数组的速度更快(内存使用量大致相同):
import matplotlib.cbook as mc
import numpy as np
import datetime as dt
def using_genfromtxt(fname):
data = np.genfromtxt(fname,dtype = None)
orig_desc = data.dtype.descr
view_desc = orig_desc [:2] + [('date','| S22')] + orig_desc [4:]
new_desc = orig_desc [:2] + [('date','| O4')] + orig_desc [4:]
newdata = np.empty data.shape,dtype = new_desc)
fields = data.dtype.names
fields = fields [:2] + fields [4:]
在字段中的字段:
newdata [field] = data [field]
newdata ['date'] = np.vectorize(mkdate)(data.view(view_desc)['date'])
#print(mc .report_memory())
返回newdata
#using_csv('example4096.txt')
#using_genfromtxt('example4096.txt')
example4096.t xt
与 example.txt
相同,重复4096次。大约12K行长。
%python -mtimeit -s'importport''test.using_genfromtxt(example4096.txt )'
10循环,最好3:1.92秒每循环
%python -mtimeit -s'导入测试''test.using_csv(example4096.txt)'
10循环,最佳3:982毫秒每循环
I have numpy ndarray which contains two columns: one is date, e.g. 2011-08-04, another one is time, e.g. 19:00:00:081.
How can I combine them into one array of datetime objects? Currently, they're strings in numpy array.
If the date and time string in the example.txt data file were given as one column with no separating whitespace, then genfromtxt
could convert it into a datetime object like this:
import numpy as np
import datetime as dt
def mkdate(text):
return dt.datetime.strptime(text, '%Y-%m-%dT%H:%M:%S:%f')
data = np.genfromtxt(
'example.txt',
names=('data','num','date')+tuple('col{i}'.format(i=i) for i in range(19)),
converters={'date':mkdate},
dtype=None)
Given example.txt
as it is, you could form the desired numpy array with
import numpy as np
import datetime as dt
import csv
def mkdate(text):
return dt.datetime.strptime(text, '%Y-%m-%d%H:%M:%S:%f')
def using_csv(fname):
desc=([('data', '|S4'), ('num', '<i4'), ('date', '|O4')]+
[('col{i}'.format(i=i), '<f8') for i in range(19)])
with open(fname,'r') as f:
reader=csv.reader(f,delimiter='\t')
data=np.array([tuple(row[:2]+[mkdate(''.join(row[2:4]))]+row[4:])
for row in reader],
dtype=desc)
# print(mc.report_memory())
return data
Merging two columns in a numpy array can be a slow operation especially if the array is large. That's because merging, like resizing, requires allocating memory for a new array, and copying data from the original array to the new one. So I think it is worth trying to form the correct numpy array directly, instead of in stages (by forming a partially correct array and merging two columns).
By the way, I tested the above csv
code versus merging two columns (below). Forming a single array from csv
(above) was faster (and the memory usage was about the same):
import matplotlib.cbook as mc
import numpy as np
import datetime as dt
def using_genfromtxt(fname):
data = np.genfromtxt(fname, dtype=None)
orig_desc=data.dtype.descr
view_desc=orig_desc[:2]+[('date','|S22')]+orig_desc[4:]
new_desc=orig_desc[:2]+[('date','|O4')]+orig_desc[4:]
newdata = np.empty(data.shape, dtype=new_desc)
fields=data.dtype.names
fields=fields[:2]+fields[4:]
for field in fields:
newdata[field] = data[field]
newdata['date']=np.vectorize(mkdate)(data.view(view_desc)['date'])
# print(mc.report_memory())
return newdata
# using_csv('example4096.txt')
# using_genfromtxt('example4096.txt')
example4096.txt
is the same as example.txt
, duplicated 4096 times. It's about 12K lines long.
% python -mtimeit -s'import test' 'test.using_genfromtxt("example4096.txt")'
10 loops, best of 3: 1.92 sec per loop
% python -mtimeit -s'import test' 'test.using_csv("example4096.txt")'
10 loops, best of 3: 982 msec per loop
这篇关于来自numpy ndarray的datetime对象的Python数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!