将文本转换为numpy中的datetime64 [英] converting text to datetime64 in numpy

查看:244
本文介绍了将文本转换为numpy中的datetime64的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有numpy的字符串数组(请问为什么将字符串表示为对象?!)

I have numpy array of strings (p.s. why is string represented as object?!)

t = array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
           '07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
           '07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
           '05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
           '28/02/2014 01:25:00 PM'], dtype=object)

我想将其转换为具有日期分辨率的numpy.datetime64,但是我发现的唯一解决方案是:

I would like to convert it to numpy.datetime64 with day resolution, however the only solution I found is:

t = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t = np.array(t,dtype='datetime64[us]').astype('datetime64[D]')

它会变得更丑吗?为什么需要查看本机Python列表? 必须有另一种方式...

Can it get uglier than that? Why do I need to go through native Python list? There must be another way...

顺便说一句,我找不到以numpy/pandas绘制日期直方图的方法

By the way, I cannot find a way to plot an histogram of dates in numpy/pandas

推荐答案

日期格式存在问题,01/01/2015含糊不清,如果它在ISO 8601中,则可以直接使用numpy对其进行解析,因为您只需要想要日期,然后拆分和重新排列数据将明显更快:

The date format is the problem, 01/01/2015 is ambiguous, if it was in ISO 8601 you could parse it directly using numpy, in your case since you only want the date then splitting and rearranging the data will be significantly faster:

t = np.array([datetime.strptime(d.split(None)[0], "%d/%m/%Y") 
for d in t],dtype='datetime64[us]').astype('datetime64[D]')

某些时间,在解析后首先重新排列:

Some timings, first rearranging after parsing:

In [36]: %%timeit
from datetime import datetime
t = np.array(['21/02/2014 08:40:00', '11/02/2014 10:50:00 PM',
           '07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
           '07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
           '05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
           '28/02/2014 01:25:00 PM']*10000)
t1 = np.array([np.datetime64("{}-{}-{}".format(c[:4], b, a)) for a, b, c in (s.split("/", 2) for s in t)])
....: 
10 loops, best of 3: 125 ms per loop

您的代码:

In [37]: %%timeit
from datetime import datetime
t = np.array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
           '07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
           '07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
           '05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
           '28/02/2014 01:25:00 PM']*10000)
t = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t = np.array(t,dtype='datetime64[us]').astype('datetime64[D]')
....: 
1 loops, best of 3: 1.56 s per loop

两者的结果相差悬殊:

In [48]: t = np.array(['21/02/2014 08:40:00 AM', '11/02/2014 10:50:00 PM',
              '07/04/2014 05:50:00 PM', '17/02/2014 10:20:00 PM',
              '07/03/2014 06:10:00 AM', '02/03/2014 12:25:00 PM',
              '05/02/2014 03:20:00 AM', '31/01/2014 12:30:00 AM',
              '28/02/2014 01:25:00 PM'] * 10000)

In [49]: t1 = [datetime.strptime(tt,"%d/%m/%Y %H:%M:%S %p") for tt in t]
t1 = np.array(t1,dtype='datetime64[us]').astype('datetime64[D]')
   ....: 

In [50]: t2 = np.array([np.datetime64("{}-{}-{}".format(c[:4], b, a)) for a, b, c in (s.split("/", 2) for s in t)])

In [51]: (t1 == t2).all()
Out[51]: True

这篇关于将文本转换为numpy中的datetime64的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆