如何修改此REGEX以获取测试字符串中的所有日期 [英] How do I modify this REGEX to pick up all dates in the test string
问题描述
test_string = '''dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas'''
import re
import calendar
months_full = '|'.join([month for month in calendar.month_name][1:])
pattern_1 = r'\b' + months_full + r'\s+\d{1,2},?\s+\d{4},?'
test_pattern = re.compile(pattern_1)
x = test_pattern.findall(test_string)
print x
>>>
['October', 'December 21, 2011']
>>>
我想我的正则表达式在询问
I think my regex is asking
从单词边界开始
查找任何月份(正确拼写和大写
find any month (correctly spelled and capitalized
下一步要求必须有一个或更多空格
next require there to be one or more white spaces
后跟1或2位数字
下一个可能是一个或零个逗号
there might be one or zero commas next
之后是一个或多个空格
然后应该有4位数字
,并且可能以逗号结尾,紧跟最后一位数字
and it might end with a comma immediately adjacent to the last digit
一旦获得日期,我打算验证它们,因此我不太担心case
Once I get dates I intend to validate them so I am not too worried about the case
2999年1月1日的朗姆酒装箱,好像我可以检查日期是否在有效范围内。
January 1, 2999 cases of rum as if I can check to see if the date is in a valid range.
我确实发现,当我用12月替换第一个月时,正则表达式会返回两个日期。我玩过\b。和其他变体,但似乎无法超越。
I did discover that when I replace the first month with December the regex returns both dates. I have played around with \b . and other variations but can't seem to get past this.
任何观察将不胜感激
推荐答案
您的模式不起作用,因为您忘记了将月份名称的轮换放在非捕获组<$ c中$ c>(?:...)
Your pattern doesn't work because you have forgotten to put the alternation with month names in a non capturing group (?:...)
另一条通知:
当您可以编写一个模块并优化其模式时,仅以英语输入月份名称是可耻的!示例:
It's a shame to load a module only to have the month names in english, when you can write them and optimise your pattern! Example:
pattern_1 = r'\b(?:(?:jan|febr)uary|ma(?:y|rch)|ju(?:ne|ly)|a(?:pril|ugust)|(?:octo|(?:sept|nov|dec)em)ber)\s+[0-9]{1,2},?\s+[0-9]{4},?'
这篇关于如何修改此REGEX以获取测试字符串中的所有日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!