使正则表达式更具体以排除某些字符 [英] making regex more specific to exclude certain characters

查看:24
本文介绍了使正则表达式更具体以排除某些字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import re
s = '01.11.11 12/12/1981 1*51*12 . 22|1|13 03-02-1919 1-22-12 or 01-23-18 or 03-23-1984 01.11.18 or 2.2.17 or 02.02.18 or 12.1.16 12.23.1943 01-23-11 not 12.23.192 not 02.02.1'

我有以下字符串 s 并且我想提取由 3 个项目分隔的所有日期:1)一个句点,例如01.11.11 或 2) 破折号,例如1-22-12 或 3) 反斜杠,例如12/12/1981.

I have the following string s and I want to extract all the dates that are separated by 3 items: either 1) a period e.g. 01.11.11 or 2) a dash e.g. 1-22-12 or 3)a backslash e.g. 12/12/1981.

为此,我尝试了以下方法

To do so, I have tried the following

reg = r'\d{1,2}.\d{1,2}.(?:\d{4}|\d{2})' 
r1 = re.findall(reg,s)

它有效,但给了我一些不需要的东西,例如 '1*51*12'22|1|13'

It works but gives me some unwanted things such as '1*51*12' and 22|1|13'

['01.11.11',
 '12/12/1981',
 '1*51*12',
 '22|1|13',
 '03-02-1919',
 '1-22-12',
 '01-23-18',
 '03-23-1984',
 '01.11.18',
 '2.2.17',
 '02.02.18',
 '12.1.16',
 '12.23.1943',
 '01-23-11',
 '12.23.19']

我希望我的输出是

['01.11.11',
 '12/12/1981',
 '03-02-1919',
 '1-22-12',
 '01-23-18',
 '03-23-1984',
 '01.11.18',
 '2.2.17',
 '02.02.18',
 '12.1.16',
 '12.23.1943',
 '01-23-11',
 '12.23.19']

如何调整 reg 以使其更具体并获得我想要的输出?

How do I tweak reg to be more specific and get my desired output?

推荐答案

\b((?:\d{1,2}(?:\.|\/|-)){2}(?:\d{4}|\d{2}))\b

此正则表达式将匹配您的所有测试用例,并会过滤不正确的年份,例如 12.23.192

This regex will match all of your test cases, and will filter improper years, such as 12.23.192

在这里试试!

这篇关于使正则表达式更具体以排除某些字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆