使用Python中的正则表达式匹配日期? [英] Matching dates with regular expressions in Python?

查看:250
本文介绍了使用Python中的正则表达式匹配日期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有类似的问题,我已经回答了,但阅读后,我仍然没有解决方案我正在寻找。



使用Python 3.2.2,我需要匹配月,日,年,月份是一个字符串,Day是两位数,不超过30,31或28,二月和二月二月在闰年。 (基本上是一个实际和有效的日期)



这是我到目前为止:

 code> pattern =(1月| 2月| 3月| 4月| 5月| 6月| 7月| 8月| 9月| 10月| 11月| 12月)[,] [](0 [1-9] | [12] 0-9] | 3 [01])[,] []((19 | 20)[0-9] [0-9])
expression = re.compile(pattern)
matches = expression.findall(sampleTextFile)

我仍然不太熟悉正则表达式语法,所以我可能会那些不必要的字符(逗号和空格的[,] []感觉像错误的方式去做),但是当我尝试在我的示例文本文件中匹配1991年1月26日时,打印在比赛中的项目是('1月','26','1991','19')。



为什么额外的'19'出现在最后?



此外,我可以添加什么东西或更改我的正则表达式,这将允许我正确验证日期?我现在的计划是接受几乎所有的日期,然后通过将日期分组与月和年分组进行比较,以后再使用高级别的结构进行杂草,以查看日期是否应为<31,30,29,28



任何帮助将非常感激,包括对我正在设计正则表达式的建设性批评。

解决方案

这是一种正则表达式的一种方式,可以匹配所需格式的任何日期(尽管您可以明显地调整逗号是可选的,添加月份缩写等等):

  years = r'((?:19 | 20)\ d \d)'
pattern = r'(%% s)+(%% s),*%s'%year

thirties = pattern%(
九月|四月|六月|十一月,
r'0?[1-9] | [12] \d | 30')

三十年代=模式%(
1月3月3月5月7月8月10月12月
r'0?[1-9] | [12] \d | 3 [01]')

4s ='(?:%s)'%'|'.join('%02d'%x for range(4,100,4))

feb = r' )+(?:%s |%s)'%(
r'(?:( 0?[1-9] | 1\d | 2 [0-8])),*%s' %年,#1-28任何一年
r'(?:( 29),*((?:(?:19 | 20)%s)| 2000))'%fours)#29闰年

result ='|'.join('(?:%s)'%x for x(30, feb))
r = re.compile(result)
打印结果

然后我们有:

 >>> r.match('2001年1月30日')不是无
True
>>> r.match('2001年1月31日')不是无
True
>>> r.match('2001年1月32日')不是无
False
>>> r.match('2001年2月32日')不是无
False
>>> r.match('2001年2月29日')不是无
False
>>> r.match('2001年2月28日')不是无
True
>>> r.match('2000年2月29日')不是无
True
>>> r.match('1908年4月30日')不是无
True
>>> r.match('1908年4月31日')不是无
False

什么这个光荣的正则表达式,你可能会问?

 >>>打印结果
(?:(九月|四月|六月|十一月)+(0?[1-9] | [12] \d | 30),*((?:19 | 20)\d \\(0?[1-9] | [12] \d | 3 [01]),*(( ??(?:2月+(?:(?:( 0?[1-9] | 1\d | 2 [0-8]),*( (?:19 | 20)\d\d))|(?:( 29),*((?(?:19 | 20)(?:04 | 08 | 12 | 16 | 20 | 24 | 28 | 32 | 36 | 40 | 44 | 48 | 52 | 56 | 60 | 64 | 68 | 72 | 76 | 80 | 84 | 88 | 92 | 96))| 2000))))

(我最初打算在可能的日期做一个舌头在脸颊的枚举,但我基本上完成了整个手写总的来说,除了四倍的倍数,无论如何。)


I know that there are similar questions to mine that have been answered, but after reading through them I still don't have the solution I'm looking for.

Using Python 3.2.2, I need to match "Month, Day, Year" with the Month being a string, Day being two digits not over 30, 31, or 28 for February and 29 for February on a leap year. (Basically a REAL and Valid date)

This is what I have so far:

pattern = "(January|February|March|April|May|June|July|August|September|October|November|December)[,][ ](0[1-9]|[12][0-9]|3[01])[,][ ]((19|20)[0-9][0-9])"
expression = re.compile(pattern)
matches = expression.findall(sampleTextFile)

I'm still not too familiar with regex syntax so I may have characters in there that are unnecessary (the [,][ ] for the comma and spaces feels like the wrong way to go about it), but when I try to match "January, 26, 1991" in my sample text file, the printing out of the items in "matches" is ('January', '26', '1991', '19').

Why does the extra '19' appear at the end?

Also, what things could I add to or change in my regex that would allow me to validate dates properly? My plan right now is to accept nearly all dates and weed them out later using high level constructs by comparing the day grouping with the month and year grouping to see if the day should be <31,30,29,28

Any help would be much appreciated including constructive criticism on how I am going about designing my regex.

解决方案

Here's one way to make a regular expression that will match any date of your desired format (though you could obviously tweak whether commas are optional, add month abbreviations, and so on):

years = r'((?:19|20)\d\d)'
pattern = r'(%%s) +(%%s), *%s' % years

thirties = pattern % (
     "September|April|June|November",
     r'0?[1-9]|[12]\d|30')

thirtyones = pattern % (
     "January|March|May|July|August|October|December",
     r'0?[1-9]|[12]\d|3[01]')

fours = '(?:%s)' % '|'.join('%02d' % x for x in range(4, 100, 4))

feb = r'(February) +(?:%s|%s)' % (
     r'(?:(0?[1-9]|1\d|2[0-8])), *%s' % years, # 1-28 any year
     r'(?:(29), *((?:(?:19|20)%s)|2000))' % fours)  # 29 leap years only

result = '|'.join('(?:%s)' % x for x in (thirties, thirtyones, feb))
r = re.compile(result)
print result

Then we have:

>>> r.match('January 30, 2001') is not None
True
>>> r.match('January 31, 2001') is not None
True
>>> r.match('January 32, 2001') is not None
False
>>> r.match('February 32, 2001') is not None
False
>>> r.match('February 29, 2001') is not None
False
>>> r.match('February 28, 2001') is not None
True
>>> r.match('February 29, 2000') is not None
True
>>> r.match('April 30, 1908') is not None
True
>>> r.match('April 31, 1908') is not None
False

And what is this glorious regexp, you may ask?

>>> print result
(?:(September|April|June|November) +(0?[1-9]|[12]\d|30), *((?:19|20)\d\d))|(?:(January|March|May|July|August|October|December) +(0?[1-9]|[12]\d|3[01]), *((?:19|20)\d\d))|(?:February +(?:(?:(0?[1-9]|1\d|2[0-8]), *((?:19|20)\d\d))|(?:(29), *((?:(?:19|20)(?:04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))|2000))))

(I initially intended to do a tongue-in-cheek enumeration of the possible dates, but I basically ended up hand-writing that whole gross thing except for the multiples of four, anyway.)

这篇关于使用Python中的正则表达式匹配日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆