为什么pandas.to_datetime对于非标准时间格式(例如'2014/12/31')慢, [英] Why is pandas.to_datetime slow for non standard time format such as '2014/12/31'
问题描述
我有一个.csv文件格式
I have a .csv file in such format
timestmp, p
2014/12/31 00:31:01:9200, 0.7
2014/12/31 00:31:12:1700, 1.9
...
并且通过 pd.read_csv
读取并使用 pd.to_datetime 将时间str转换为datetime时, code>,性能急剧下降。这是一个最小的例子。
and when read via pd.read_csv
and convert the time str to datetime using pd.to_datetime
, the performance drops dramatically. Here is a minimal example.
import re
import pandas as pd
d = '2014-12-12 01:02:03.0030'
c = re.sub('-', '/', d)
%timeit pd.to_datetime(d)
%timeit pd.to_datetime(c)
%timeit pd.to_datetime(c, format="%Y/%m/%d %H:%M:%S.%f")
,效果如下:
10000 loops, best of 3: 62.4 µs per loop
10000 loops, best of 3: 181 µs per loop
10000 loops, best of 3: 82.9 µs per loop
所以,如何提高 pd.to_datetime
从csv文件?
so, how could I improve the performance of pd.to_datetime
when reading date from a csv file?
推荐答案
这是因为pandas回到 dateutil.parser.parse
用于解析字符串,当它具有非默认格式或当没有提供
格式
字符串(这是更灵活,但也较慢)。
This is because pandas falls back to dateutil.parser.parse
for parsing the strings when it has a non-default format or when no format
string is supplied (this is much more flexible, but also slower).
如上所示,您可以通过将格式
字符串提供给 to_datetime
。或者另一个选项是使用 infer_datetime_format = True
As you have shown above, you can improve the performance by supplying a format
string to to_datetime
. Or another option is to use infer_datetime_format=True
infer_datetime_format
无法推断何时有微秒。有了没有这些的例子,你可以看到一个大的加速:
Apparently, the infer_datetime_format
cannot infer when there are microseconds. With an example without those, you can see a large speed-up:
In [28]: d = '2014-12-24 01:02:03'
In [29]: c = re.sub('-', '/', d)
In [30]: s_c = pd.Series([c]*10000)
In [31]: %timeit pd.to_datetime(s_c)
1 loops, best of 3: 1.14 s per loop
In [32]: %timeit pd.to_datetime(s_c, infer_datetime_format=True)
10 loops, best of 3: 105 ms per loop
In [33]: %timeit pd.to_datetime(s_c, format="%Y/%m/%d %H:%M:%S")
10 loops, best of 3: 99.5 ms per loop
这篇关于为什么pandas.to_datetime对于非标准时间格式(例如'2014/12/31')慢,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!