加快datetime.strptime [英] Speeding up datetime.strptime

查看:79
本文介绍了加快datetime.strptime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码从字符串中提取日期:

I am using the following piece of code to extract a date from a string:

try:
    my_date = datetime.strptime(input_date, "%Y-%m-%d").date()
except ValueError:
    my_date = None

如果我运行了750,000次,则需要19.144秒(由cProfile确定)。现在,我用以下(丑陋的)代码替换它:

If I run this 750,000 times, it takes 19.144 seconds (determined with cProfile). Now I replace this with the following (ugly) code:

a= 1000 * int(input_date[0])
b=  100 * int(input_date[1])
c=   10 * int(input_date[2])
d=    1 * int(input_date[3])
year = a+b+c+d

c=   10 * int(input_date[5])
d=    1 * int(input_date[6])
month = c+d

c=   10 * int(input_date[8])
d=    1 * int(input_date[9])
day = c+d

try:
    my_date = date(year, month, day)
except ValueError:
    my_date = None

如果我运行这750,000次,则只需5.946秒。但是,我发现代码确实很难看。

If I run this 750,000 times, it only takes 5.946 seconds. However, I find the code really ugly. Is there another fast way to extract a date from a string, without using strptime?

推荐答案

是的,还有更快的方法可以解析,而无需使用strptime?如果您放弃了很多灵活性和验证功能,则该日期应小于 datetime.strptime() strptime()允许带和不带零填充的数字,并且只匹配使用正确分隔符的字符串,而您的丑陋版本则不允许。

Yes, there are faster methods to parse a date than datetime.strptime(), if you forgo a lot of flexibility and validation. strptime() allows both numbers with and without zero-padding, and it only matches strings that use the right separators, whilst your 'ugly' version doesn't.

您应始终使用 timeit 模块用于时间试用,它比这里的 cProfile 准确得多。

You should always use the timeit module for time trials, it is far more accurate than cProfile here.

实际上,您的丑陋方法是 strptime()的两倍:

Indeed, your 'ugly' approach is twice as fast as strptime():

>>> from datetime import date, datetime
>>> import timeit
>>> def ugly(input_date):
...     a= 1000 * int(input_date[0])
...     b=  100 * int(input_date[1])
...     c=   10 * int(input_date[2])
...     d=    1 * int(input_date[3])
...     year = a+b+c+d
...     c=   10 * int(input_date[5])
...     d=    1 * int(input_date[6])
...     month = c+d
...     c=   10 * int(input_date[8])
...     d=    1 * int(input_date[9])
...     day = c+d
...     try:
...         my_date = date(year, month, day)
...     except ValueError:
...         my_date = None
... 
>>> def strptime(input_date):
...     try:
...         my_date = datetime.strptime(input_date, "%Y-%m-%d").date()
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import ugly as f')
4.21576189994812
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import strptime as f')
9.873773097991943

虽然可以改进您的方法;您可以使用切片:

Your approach can be improved upon though; you could use slicing:

>>> def slicing(input_date):
...     try:
...         year = int(input_date[:4])
...         month = int(input_date[5:7])
...         day = int(input_date[8:])
...         my_date = date(year, month, day)
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import slicing as f')
1.7224829196929932

现在快了将近6倍。我还将 int()调用移至 try -除外在将字符串转换为整数时处理无效输入。

Now it is almost 6 times faster. I also moved the int() calls into the try - except to handle invalid input when converting strings to integers.

您还可以使用 str.split()来获取零件,但这又使它稍微变慢:

You could also use str.split() to get the parts, but that makes it slightly slower again:

>>> def split(input_date):
...     try:
...         my_date = date(*map(int, input_date.split('-')))
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import split as f')
2.294667959213257

这篇关于加快datetime.strptime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆