修改正则表达式以使用序号“st”,“nd”,“rd”,“th”和“ [英] Modify regex to match dates with ordinals "st", "nd", "rd", "th"

查看:175
本文介绍了修改正则表达式以使用序号“st”,“nd”,“rd”,“th”和“的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何修改下面的正则表达式以匹配日期与日期的序数?这个正则表达式匹配2003年1月1日| 2004年2月29日| 2月2日,3202,但我也需要匹配:2003年1月1日| 2004年2月29日| 2010年11月02日,3202 | 2010年3月3日 (?((Jan(呃))?Ma(r(ch)?| y)| Jul(y) ?ust ust | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (0?[1-9] |([(?)]((?)) 12] \d)| 30))|(Feb(ruary))\(0?[1-9] | 1\d | 2 [0-8] |(29(?=,\ 1 [6-9] | [2-9] \d)((16 | [048] [3579] ] [26])))))\,\((1 [6-9] | [2-9] \d)\d {2}))

谢谢。

解决方案

这将取决于你的用例,但是为了实用主义,你可能会很好地匹配匹配的任何东西:

(1)任何月份名称或缩写;

2)空格;

(3)任何一位或两位数字;

(4)空格;

(5)任何st,nd,rd,th;

(6)空格或逗号+可选空格;

(7)任意四位数;



我不知道你在匹配什么,但如果我有$ code> 1月35nd,3001 ,我认为我宁愿现在捕获它,并使之后无效比只是跳过它在get-go。



另外,依赖在您的数据集上,考虑区分大小写问题和常见的国际英语变体,如 2004年1月1日 2004年1月1日 2004年1月等。



添加换行符

  ^(?:j(?:an(?:uary)?| un(?:e)?| ul(?:y)?) ?| feb(?:ruary)?| ma(?:r(?:ch)?| y)
| a(?:pr(?:il)?| ug(? (?:ober)?|(?:nov | dec)(?:ember)?)
\s + \d {1,2}(? st | nd | rd | th)?(?: \s + |,\s *)\d {4} \b

更实际(和可读),除非你有一个非常奇怪的数据集,是允许任何公共前缀之后的任何东西:

 (?:jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec)[az ] *?\s + \d {1,2}(?:[az] {2})?(?:\s + |,\s *)\d {4} \b 

这匹配 octagenarianism 99xx,0000 ?是。这可能是一个问题吗?我怀疑。


How can the regex below be modified to match dates with ordinals on the day part? This regex matches "Jan 1, 2003 | February 29, 2004 | November 02, 3202" but I need it to match also: "Jan 1st, 2003 | February 29th, 2004 | November 02nd, 3202 | March 3rd, 2010"

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))

Thank you.

解决方案

This will depend on your use case, but in the interest of pragmatism, you might do well to just match anything matching:
(1) any month name or abbreviation;
(2) whitespace;
(3) any one or two digits;
(4) whitespace;
(5) any st,nd,rd,th;
(6) whitespace OR comma + optional whitespace;
(7) any four digits;

I'm not sure what you're matching in, but if I had Jan 35nd,3001, I think I'd rather capture it now and invalidate it later than to just skip over it right at the get-go.

Also, depending on your data set, consider case sensitivity issues and common international English variants, like 1 Jan 2004 or 1st Jan, 2004 or January, 2004 etc.

line breaks added

^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
|a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)  
\s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b

Even more pragmatic (and readable), unless you have a very bizarre dataset, is to allow anything after the common prefixes:

(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b

Would this match octagenarianism 99xx, 0000 ? Yes. Is that likely to be an issue? I doubt it.

这篇关于修改正则表达式以使用序号“st”,“nd”,“rd”,“th”和“的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆