将文本转换为numpy中的datetime64 [英] converting text to datetime64 in numpy
问题描述
我有numpy的字符串数组(请问为什么将字符串表示为对象?!)
I have numpy array of strings (p.s. why is string represented as object?!)
t = array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
'07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
'07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
'05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
'28/02/2014 01:25:00 PM'], dtype=object)
我想将其转换为具有日期分辨率的numpy.datetime64,但是我发现的唯一解决方案是:
I would like to convert it to numpy.datetime64 with day resolution, however the only solution I found is:
t = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t = np.array(t,dtype='datetime64[us]').astype('datetime64[D]')
它会变得更丑吗?为什么需要查看本机Python列表? 必须有另一种方式...
Can it get uglier than that? Why do I need to go through native Python list? There must be another way...
顺便说一句,我找不到以numpy/pandas绘制日期直方图的方法
By the way, I cannot find a way to plot an histogram of dates in numpy/pandas
推荐答案
日期格式存在问题,01/01/2015
含糊不清,如果它在ISO 8601中,则可以直接使用numpy对其进行解析,因为您只需要想要日期,然后拆分和重新排列数据将明显更快:
The date format is the problem, 01/01/2015
is ambiguous, if it was in ISO 8601 you could parse it directly using numpy, in your case since you only want the date then splitting and rearranging the data will be significantly faster:
t = np.array([datetime.strptime(d.split(None)[0], "%d/%m/%Y")
for d in t],dtype='datetime64[us]').astype('datetime64[D]')
某些时间,在解析后首先重新排列:
Some timings, first rearranging after parsing:
In [36]: %%timeit
from datetime import datetime
t = np.array(['21/02/2014 08:40:00', '11/02/2014 10:50:00 PM',
'07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
'07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
'05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
'28/02/2014 01:25:00 PM']*10000)
t1 = np.array([np.datetime64("{}-{}-{}".format(c[:4], b, a)) for a, b, c in (s.split("/", 2) for s in t)])
....:
10 loops, best of 3: 125 ms per loop
您的代码:
In [37]: %%timeit
from datetime import datetime
t = np.array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
'07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
'07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
'05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
'28/02/2014 01:25:00 PM']*10000)
t = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t = np.array(t,dtype='datetime64[us]').astype('datetime64[D]')
....:
1 loops, best of 3: 1.56 s per loop
两者的结果相差悬殊:
In [48]: t = np.array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
'07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
'07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
'05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
'28/02/2014 01:25:00 PM'] * 10000)
In [49]: t1 = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t1 = np.array(t1,dtype='datetime64[us]').astype('datetime64[D]')
....:
In [50]: t2 = np.array([np.datetime64("{}-{}-{}".format(c[:4], b, a)) for a, b, c in (s.split("/", 2) for s in t)])
In [51]: (t1 == t2).all()
Out[51]: True
这篇关于将文本转换为numpy中的datetime64的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!