如何在python中解析多个（未知）日期格式？ [英] How can I parse multiple (unknown) date formats in python?

查看：253 发布时间：2017/4/6 19:42:04 python parsing date

本文介绍了如何在python中解析多个（未知）日期格式？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一堆我正在提取日期的excel文档。我正在尝试将它们转换为标准格式，以便将它们放在数据库中。有没有一个功能，我可以把这些字符串，并得到一个标准的格式？这是我的数据的一个小样本：

好的事情是我知道它总是月/日

  10/02/09 
 07/22/09 
 09-08-2008 
 9/9/2008 
 11/4 / 2010 
 03-07-2009 
 09/01/2010

喜欢把它们全部变成MM / DD / YYYY格式。有没有办法这样做，而不是尝试每个模式的字符串？

解决方案

 进口re 
 
 ss ='''10/02/09 
 07/22/09 
 09-08-2008 
 9/9/2008 
 11/4/2010 
 03-07-2009 
 09/01/2010'''
 
 
 regx = re.compile（'[ -  /] '）
 for ss.splitlines（）中的xd：
m，d，y = regx.split（xd）
 print xd，''，'/'join（（m.zfill （2），d.zfill（2），'20'+ y.zfill（2）如果len（y）== 2 else y））

结果

  10/02/09 10/02/2009 
 07/22/09 07/22/2009 
 09-08-2008 09/08/2008 
 9/9/2008 09/09/2008 
 11/4/2010 11 / 04/2010 
 03-07-2009 03/07/2009 
 09/01/2010 09/01/2010

编辑1

和编辑2 ：考虑 {0：0> 2}'。格式（天）从JBernardo，我添加了第四个解决方案，似乎是最快的

  import re 
 from time import clock 
 iterat = 100 
 
 from datetime import datetime 
 dates = ['10 / 02/09'，'07 / 22/09'，'09 -08-2008'，'9/9/2008'，'11 / 4/2010 '，
'03-07-2009'，'09 / 01/2010'] 
 
 reobj = re.compile（
r\s *＃可选的空格
（\d +）＃月
 [ -  /]＃分隔符
（\d +）＃日
 [ -  /]＃分隔符
（？：20 ）？ ＃世纪（可选）
（\d +）＃年（YY）
 \s *＃可选空白，
 re.VERBOSE）
 
 te = clock（）
在xrange（iterat）中的i：
 ndates =（reobj.sub（r\1 / \2 / 20\3，date）日期的日期）
 fdates1 = [datetime.strftime（datetime.strptime（date，％m /％d /％Y），％m /％d /％Y）
 for ndates] 
打印Tim的方法，clock（） -  te，'seconds'
 
 
 
 regx = re.compile（'[ -  /]'）
 
 
 te = clock（）
 for x in xrange（iterat）：
 ndates =（reobj.match（date）.groups（）for date in date）
 fdates2 = ['％s /％s / 20％s'％tuple（x.zfill（2）for x in tu）for tu in ndates] 
 printmixing solution，clock（） - te，'seconds'
 
 
 te = clock（）
在xrange（iterat）中的$：
 ndates =（regx.split（date.strip（） ）日期的日期）
 fdates3 = ['/'.join((m.zfill(2),d.zfill(2),('20'+y.zfill(2）if len（y） == 2 else y）））
 for m，d，y in ndates] 
打印eyquem的方法，clock（） -  te，'seconds'
 
 
 
 te = clock（）
 for x in xrange（iterat）：
 fdates4 = ['{：0> 2} / {：0> 2} / 20 {}'format（* reobj.match（date） ）日期在日期] 
打印Tim +格式，clock（） -  te，'秒'
 
 
打印fdates1 == fdates2 == fdates3 == fdates4

结果

 迭代次数：100 
 Tim的方法0.295053700959秒
混合解决方案0.0459111423379秒
 eyquem的方法0.0192239516475秒
 Tim +格式0.0153756971906秒
 True

混合解决方案很有趣，因为它将我的解决方案的速度和Tim Pietzcker的正则表达式的能力结合起来， strong>检测日期在一个字符串。

对于将Tim的一个和 {：0> ; 2} 。我不能结合 {：0> 2} 与我的 regx.split（date.strip（））年份为2或4位数字

I have a bunch of excel documents I am extracting dates from. I am trying to convert these to a standard format so I can put them in a database. Is there a function I can throw these strings at and get a standard format back? Here is a small sample of my data:

The good thing is I know it is always Month/Day

I'd like to get them all into MM/DD/YYYY format. Is there a way I can do this without trying each pattern against the string?

解决方案

import re

ss = '''10/02/09
07/22/09
09-08-2008
9/9/2008
11/4/2010
03-07-2009
09/01/2010'''


regx = re.compile('[-/]')
for xd in ss.splitlines():
    m,d,y = regx.split(xd)
    print xd,'   ','/'.join((m.zfill(2),d.zfill(2),'20'+y.zfill(2) if len(y)==2 else y))

result

10/02/09     10/02/2009
07/22/09     07/22/2009
09-08-2008     09/08/2008
9/9/2008     09/09/2008
11/4/2010     11/04/2010
03-07-2009     03/07/2009
09/01/2010     09/01/2010

Edit 1

And Edit 2 : taking account of the information on '{0:0>2}'.format(day) from JBernardo, I added a 4th solution, that appears to be the fastest

import re
from time import clock
iterat = 100

from datetime import datetime
dates = ['10/02/09', '07/22/09', '09-08-2008', '9/9/2008', '11/4/2010',
         ' 03-07-2009', '09/01/2010']

reobj = re.compile(
r"""\s*  # optional whitespace
(\d+)    # Month
[-/]     # separator
(\d+)    # Day
[-/]     # separator
(?:20)?  # century (optional)
(\d+)    # years (YY)
\s*      # optional whitespace""",
re.VERBOSE)

te = clock()
for i in xrange(iterat):
    ndates = (reobj.sub(r"\1/\2/20\3", date) for date in dates)
    fdates1 = [datetime.strftime(datetime.strptime(date,"%m/%d/%Y"), "%m/%d/%Y")
               for date in ndates]
print "Tim's method   ",clock()-te,'seconds'



regx = re.compile('[-/]')


te = clock()
for i in xrange(iterat):
    ndates = (reobj.match(date).groups() for date in dates)
    fdates2 = ['%s/%s/20%s' % tuple(x.zfill(2) for x in tu) for tu in ndates]
print "mixing solution",clock()-te,'seconds'


te = clock()
for i in xrange(iterat):
    ndates = (regx.split(date.strip()) for date in dates)
    fdates3 = ['/'.join((m.zfill(2),d.zfill(2),('20'+y.zfill(2) if len(y)==2 else y)))
              for m,d,y in ndates]
print "eyquem's method",clock()-te,'seconds'



te = clock()
for i in xrange(iterat):
    fdates4 = ['{:0>2}/{:0>2}/20{}'.format(*reobj.match(date).groups()) for date in dates]
print "Tim + format   ",clock()-te,'seconds'


print fdates1==fdates2==fdates3==fdates4

result

number of iteration's turns : 100
Tim's method    0.295053700959 seconds
mixing solution 0.0459111423379 seconds
eyquem's method 0.0192239516475 seconds
Tim + format    0.0153756971906 seconds 
True

The mixing solution is interesting because it combines the speed of my solution and the ability of the regex of Tim Pietzcker to detect dates in a string.

That's still more true for the solution combining Tim's one and the formating with {:0>2}. I cant' combine {:0>2} with mine because regx.split(date.strip()) produces year with 2 OR 4 digits

这篇关于如何在python中解析多个（未知）日期格式？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在python中解析多个（未知）日期格式？ [英] How can I parse multiple (unknown) date formats in python?

问题描述

编辑1

Edit 1

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中解析多个（未知）日期格式？ [英] How can I parse multiple (unknown) date formats in python?

问题描述

编辑1

Edit 1

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭