如何修改此REGEX以获取测试字符串中的所有日期 [英] How do I modify this REGEX to pick up all dates in the test string

查看:62
本文介绍了如何修改此REGEX以获取测试字符串中的所有日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

test_string = '''dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas'''


import re
import calendar

months_full = '|'.join([month for month in calendar.month_name][1:])

pattern_1 = r'\b' + months_full + r'\s+\d{1,2},?\s+\d{4},?'
test_pattern = re.compile(pattern_1)
x = test_pattern.findall(test_string)

print x

>>> 
['October', 'December 21, 2011']
>>> 

我想我的正则表达式在询问

I think my regex is asking

从单词边界开始

查找任何月份(正确拼写和大写

find any month (correctly spelled and capitalized

下一步要求必须有一个或更多空格

next require there to be one or more white spaces

后跟1或2位数字

下一个可能是一个或零个逗号

there might be one or zero commas next

之后是一个或多个空格

然后应该有4位数字

,并且可能以逗号结尾,紧跟最后一位数字

and it might end with a comma immediately adjacent to the last digit

一旦获得日期,我打算验证它们,因此我不太担心case

Once I get dates I intend to validate them so I am not too worried about the case

2999年1月1日的朗姆酒装箱,好像我可以检查日期是否在有效范围内。

January 1, 2999 cases of rum as if I can check to see if the date is in a valid range.

我确实发现,当我用12月替换第一个月时,正则表达式会返回两个日期。我玩过\b。和其他变体,但似乎无法超越。

I did discover that when I replace the first month with December the regex returns both dates. I have played around with \b . and other variations but can't seem to get past this.

任何观察将不胜感激

推荐答案

您的模式不起作用,因为您忘记了将月份名称的轮换放在非捕获组<$ c中$ c>(?:...)

Your pattern doesn't work because you have forgotten to put the alternation with month names in a non capturing group (?:...)

另一条通知:

当您可以编写一个模块并优化其模式时,仅以英语输入月份名称是可耻的!示例:

It's a shame to load a module only to have the month names in english, when you can write them and optimise your pattern! Example:

pattern_1 = r'\b(?:(?:jan|febr)uary|ma(?:y|rch)|ju(?:ne|ly)|a(?:pril|ugust)|(?:octo|(?:sept|nov|dec)em)ber)\s+[0-9]{1,2},?\s+[0-9]{4},?'

这篇关于如何修改此REGEX以获取测试字符串中的所有日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆