正则表达式的所有​​字符串传递.NET DateTime.Parse文化的en-US [英] Regex For All Strings That Pass .NET DateTime.Parse Culture en-US

查看:116
本文介绍了正则表达式的所有​​字符串传递.NET DateTime.Parse文化的en-US的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在你告诉我,这是一个很大的正则表达式 - 我知道。没有要求任何人都可以写任何正则表达式!你知道,如果有人已经这样做了正则表达式?

这将返回所有模式:CultureInfo.CurrentCulture.DateTimeFormat.GetAllDateTimePatterns() 但是,这份名单是不是100%准确。有一些模式不解析(YY / MM / DD)和一些模式,解析未列出。参照EN-美国通用DateTime.Parse

我所做的就是打破模式,并尝试写正则表达式对每个模式。

<$p$p><$c$c>(^|\s)(3[01]|[12]\d|0?[1-9])\s+(January|February|March|April|May|June|July|August|September|October|November|December),\s?(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?         // DD MMMM,YYYY DDDD,DD MMMM,YYYY         // DD MMMM,YYYY H:毫米TT DDDD,DD MMMM,YYYY H:毫米TT         // DD MMMM,YYYY HH:MM TT DDDD,DD MMMM,YYYY H:MM:SS TT         // DD MMMM,YYYY H:MM:SS TT DDDD,DD MMMM,YYYY HH:MM TT         // DD MMMM,YYYY HH:MM:SS TT DDDD,DD MMMM,YYYY HH:MM:SS TT         // DD MMMM,YYYY H:毫米DDDD,DD MMMM,YYYY H:毫米         // DD MMMM,YYYY HH:MM DDDD,DD MMMM,YYYY HH:MM         // DD MMMM,YYYY H:MM:SS DDDD,DD MMMM,YYYY H:MM:SS         // DD MMMM,YYYY HH:MM:SS DDDD,DD MMMM,YYYY HH:MM:SS (^|\s)(3[01]|[12]\d|0?[1-9])(/|-)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(/|-)\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?         // DD-MMM-YY         // DD-MMM-YY H:毫米TT         // DD-MMM-YY H:MM:SS TT         // DD-MMM-YY HH:MM TT         // DD-MMM-YY HH:MM:SS TT         // DD-MMM-YY H:毫米         // DD-MMM-YY HH:MM         // DD-MMM-YY H:MM:SS         // DD-MMM-YY HH:MM:SS (^|\s)(January|February|March|April|May|June|July|August|September|October|November|December)\s(3[01]|[12]\d|0?[1-9])(,\s?|\s)(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?         // MMMM DD,YYYY DDDD,MMMM DD,YYYY         // MMMM DD,YYYY H:毫米TT DDDD,MMMM DD,YYYY H:毫米TT         // MMMM DD,YYYY H:MM:SS TT DDDD,MMMM DD,YYYY H:MM:SS TT         // MMMM DD,YYYY HH:MM TT DDDD,MMMM DD,YYYY HH:MM TT         // MMMM DD,YYYY HH:MM:SS TT DDDD,MMMM DD,YYYY HH:MM:SS TT         // MMMM DD,YYYY H:毫米DDDD,MMMM DD,YYYY HH:MM         // MMMM DD,YYYY H:MM:SS DDDD,MMMM DD,YYYY H:MM:SS         // MMMM DD,YYYY HH:MM DDDD,MMMM DD,YYYY HH:MM:SS         // MMMM DD,YYYY HH:MM:SS (^|\s)(19|20)\d\d(/|-)(1[0-2]|0?\d)(/|-)(3[01]|[12]\d|0?[1-9])(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?         / YY / MM / DD YYYY-MM-DD         // YY / MM / DD H:毫米TT YYYY-MM-DD H:毫米TT         // YY / MM / DD HH:MM TT YYYY-MM-DD HH:MM TT         // YY / MM / DD H:MM:SS TT YYYY-MM-DD H:MM:SS TT         // YY / MM / DD HH:MM:SS TT YYYY-MM-DD HH:MM:SS TT         // YY / MM / DD H:毫米YYYY-MM-DD H:毫米         // YY / MM / DD HH:MM YYYY-MM-DD HH:MM         // YY / MM / DD H:MM:SS YYYY-MM-DD H:MM:SS         // YY / MM / DD HH:MM:SS YYYY-MM-DD HH:MM:SS (^|\s)(3[01]|[12]\d|0?[1-9])(/|-|/.)(1[0-2]|0?\d)(/|-|/.)(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?         // FR-FR         //dd.MM.yy DD / MM / YY DD-MM-YY DD / MM / YYYY         //dd.MM.yy H:毫米DD / MM / YY H:毫米DD-MM-YY H:毫米DD / MM / YYYY H:毫米         //dd.MM.yy H:MM:SS DD / MM / YY H:MM:SS DD-MM-YY H:MM:SS DD / MM / YYYY H:MM:SS         //dd.MM.yy HH'H'毫米DD / MM / YY HH'H'毫米DD-MM-YY HH'H'毫米DD / MM / YYYY HH'H'毫米         //dd.MM.yy HH.mm DD / MM / YY HH.mm DD-MM-YY HH.mm DD / MM / YYYY HH.mm         //dd.MM.yy HH:MM DD / MM / YY HH:MM DD-MM-YY HH:MM DD / MM / YYYY HH:MM         //dd.MM.yy HH:MM:SS DD / MM / YY HH:MM:SS DD-MM-YY HH:MM:SS DD / MM / YYYY HH:MM:SS

解决方案

我要去无路可退,并假设你会罚款不解析一天的名称,只要日休息与时间匹配...毕竟,一旦日期被解析,当天的名称,可再生(这将需要额外EX pression的复杂性,所以我决定将它排除在外。不过,我有一个前pression似乎做pretty的以及在发现全部由 GetAllDateTimePatterns 返回的日期格式,并且也有几个人,可能显示为好(不知道你想这些......):

 周二2010年2月20日
星期一,1999年6月12日
周二,1901年12月9号
周五,1900年2月3日
2012年1月12日
 

(请注意,它不匹配的日期名称,但匹配的日期)

这是前pression:

<$p$p><$c$c>(?i)((3[01]|[12]\d|0?[1-9]|\d{4})([\s/.-]))?\b(1[0-2]|0?\d|(jan|febr?)(uary)?|ma(r(ch)?|y)|a(pr(il)?|ug(ust)?)|(sept?|oct|nov|dec)((em|o)ber)?|ju(ne?|ly?))\b(\3|\s)(((?(2)|3[01])|[12]\d|0?[1-9])(?(2)\d\d\b|\b,?\s+(20|19)?\d\d))?\s+(\d+([:.]\d+)+)?

我相信它的非常的好(我想是准确的,因为一个人撇上快速文本),但完美的显然远,因此需要有软的比赛后,真正的解析被发现。整体搜索效率可提高排除消息的部分来自搜索,如果可能的话 - 如果你想找到的日期都在头部,那么只能运行针对头前pression

让我知道,如果它的作品不够好,或者有任何优势的情况下你会发现,我会看看我是否可以修改它。

Before you tell me that is a lot of Regex - I know. Not asking for anyone to write any Regex! Do you know if someone has already done that Regex?

This will return all the patterns: CultureInfo.CurrentCulture.DateTimeFormat.GetAllDateTimePatterns() But this list is not 100% accurate. There are some patterns that do not parse (yy/mm/dd) and some patterns that parse that are not listed. Referring to en-US generic DateTime.Parse

What I did was break down the patterns and try and write Regex for each pattern.

(^|\s)(3[01]|[12]\d|0?[1-9])\s+(January|February|March|April|May|June|July|August|September|October|November|December),\s?(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?
        //dd MMMM, yyyy                dddd, dd MMMM, yyyy
        //dd MMMM, yyyy h:mm tt        dddd, dd MMMM, yyyy h:mm tt
        //dd MMMM, yyyy hh:mm tt       dddd, dd MMMM, yyyy h:mm:ss tt
        //dd MMMM, yyyy h:mm:ss tt     dddd, dd MMMM, yyyy hh:mm tt
        //dd MMMM, yyyy hh:mm:ss tt    dddd, dd MMMM, yyyy hh:mm:ss tt
        //dd MMMM, yyyy H:mm           dddd, dd MMMM, yyyy H:mm
        //dd MMMM, yyyy HH:mm          dddd, dd MMMM, yyyy HH:mm
        //dd MMMM, yyyy H:mm:ss        dddd, dd MMMM, yyyy H:mm:ss
        //dd MMMM, yyyy HH:mm:ss       dddd, dd MMMM, yyyy HH:mm:ss

(^|\s)(3[01]|[12]\d|0?[1-9])(/|-)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(/|-)\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?
        //dd-MMM-yy 
        //dd-MMM-yy h:mm tt 
        //dd-MMM-yy h:mm:ss tt  
        //dd-MMM-yy hh:mm tt    
        //dd-MMM-yy hh:mm:ss tt 
        //dd-MMM-yy H:mm    
        //dd-MMM-yy HH:mm   
        //dd-MMM-yy H:mm:ss 
        //dd-MMM-yy HH:mm:ss

(^|\s)(January|February|March|April|May|June|July|August|September|October|November|December)\s(3[01]|[12]\d|0?[1-9])(,\s?|\s)(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?
        //MMMM dd, yyyy             dddd, MMMM dd, yyyy
        //MMMM dd, yyyy h:mm tt     dddd, MMMM dd, yyyy h:mm tt
        //MMMM dd, yyyy h:mm:ss tt  dddd, MMMM dd, yyyy h:mm:ss tt
        //MMMM dd, yyyy hh:mm tt    dddd, MMMM dd, yyyy hh:mm tt
        //MMMM dd, yyyy hh:mm:ss tt dddd, MMMM dd, yyyy hh:mm:ss tt
        //MMMM dd, yyyy H:mm        dddd, MMMM dd, yyyy HH:mm       
        //MMMM dd, yyyy H:mm:ss     dddd, MMMM dd, yyyy H:mm:ss     
        //MMMM dd, yyyy HH:mm       dddd, MMMM dd, yyyy HH:mm:ss        
        //MMMM dd, yyyy HH:mm:ss

(^|\s)(19|20)\d\d(/|-)(1[0-2]|0?\d)(/|-)(3[01]|[12]\d|0?[1-9])(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?
        /yy/MM/dd   yyyy-MM-dd      
        //yy/MM/dd h:mm tt      yyyy-MM-dd h:mm tt      
        //yy/MM/dd hh:mm tt     yyyy-MM-dd hh:mm tt     
        //yy/MM/dd h:mm:ss tt   yyyy-MM-dd h:mm:ss tt       
        //yy/MM/dd hh:mm:ss tt  yyyy-MM-dd hh:mm:ss tt      
        //yy/MM/dd H:mm         yyyy-MM-dd H:mm     
        //yy/MM/dd HH:mm        yyyy-MM-dd HH:mm        
        //yy/MM/dd H:mm:ss      yyyy-MM-dd H:mm:ss      
        //yy/MM/dd HH:mm:ss     yyyy-MM-dd HH:mm:ss 

(^|\s)(3[01]|[12]\d|0?[1-9])(/|-|/.)(1[0-2]|0?\d)(/|-|/.)(19|20)?\d\d(\s+(0?\d|1\d|2[0-4]):[0-6]\d(:[0-6]\d)?(\s+([AP]M|GMT|[+-]\d\d:?\d\d))?)?
        //fr-FR         
        //dd.MM.yy              dd/MM/yy            dd-MM-yy            dd/MM/yyyy
        //dd.MM.yy H:mm         dd/MM/yy H:mm       dd-MM-yy H:mm       dd/MM/yyyy H:mm
        //dd.MM.yy H:mm:ss      dd/MM/yy H:mm:ss    dd-MM-yy H:mm:ss    dd/MM/yyyy H:mm:ss
        //dd.MM.yy HH' h 'mm    dd/MM/yy HH' h 'mm  dd-MM-yy HH' h 'mm  dd/MM/yyyy HH' h 'mm
        //dd.MM.yy HH.mm        dd/MM/yy HH.mm      dd-MM-yy HH.mm      dd/MM/yyyy HH.mm
        //dd.MM.yy HH:mm        dd/MM/yy HH:mm      dd-MM-yy HH:mm      dd/MM/yyyy HH:mm
        //dd.MM.yy HH:mm:ss     dd/MM/yy HH:mm:ss   dd-MM-yy HH:mm:ss   dd/MM/yyyy HH:mm:ss

解决方案

I'm gonna go out on a limb and assume you'd be fine with not parsing the name of the day, so long as the rest of the date and time is matched... after all, once the date is parsed, the name of the day can be regenerated (it would require additional expression complexity, so I decided to exclude it. That said, I have an expression that seems to do pretty well at finding all the date formats returned by GetAllDateTimePatterns, and also several others that might show up as well (not sure if you want these...):

Tuesday 20 February 2010
mon, jun 12, 1999
tue, december 9 1901
Friday, February 03, 1900
January 12, 2012

(mind you, it does not match the day names, but matches the dates)

This is the expression:

(?i)((3[01]|[12]\d|0?[1-9]|\d{4})([\s/.-]))?\b(1[0-2]|0?\d|(jan|febr?)(uary)?|ma(r(ch)?|y)|a(pr(il)?|ug(ust)?)|(sept?|oct|nov|dec)((em|o)ber)?|ju(ne?|ly?))\b(\3|\s)(((?(2)|3[01])|[12]\d|0?[1-9])(?(2)\d\d\b|\b,?\s+(20|19)?\d\d))?\s+(\d+([:.]\d+)+)?

I believe it's fairly good (I think about as accurate as a human skimming quickly over text), but obviously far from perfect, thus the need for true parsing after the soft match is found. Efficiency of the overall search could be increased by excluding parts of the messages from the search, if possible - if the dates you want to find are all in the header, then only run the expression against the header!

Let me know if it works well enough or if there are any edge cases you find, and I'll see if I can modify it.

这篇关于正则表达式的所有​​字符串传递.NET DateTime.Parse文化的en-US的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆