不在python中的正则表达式中返回整个模式 [英] not returning the whole pattern in regex in python
问题描述
我有以下代码:
haystack = "aaa months(3) bbb"
needle = re.compile(r'(months|days)\([\d]*\)')
instances = list(set(needle.findall(haystack)))
print str(instances)
我希望它打印 months(3)
但我只得到 months
.有什么原因吗?
I'd expect it to print months(3)
but instead I just get months
. Is there any reason for this?
推荐答案
needle = re.compile(r'((?:months|days)\([\d]*\))')
解决您的问题.
您只捕获了月|天部分.
you were capturing only the months|days part.
在这种特定情况下,这个正则表达式要好一些:
in this specific situation, this regex is a bit better:
needle = re.compile(r'((?:months|days)\(\d+\))')
这样你只会得到一个数字的结果,以前像 months()
这样的结果可以工作.如果您想忽略月或日等选项的大小写,还可以添加 re.IGNORECASE
标志.像这样:
this way you will only get results with a number, previously a result like months()
would work. if you want to ignore case for options like Months or Days, then also add the re.IGNORECASE
flag. like this:
re.compile(r'((?:months|days)\(\d+\))', re.IGNORECASE)
对 OP 的一些解释:
一个正则表达式由许多元素组成,其中最主要的是捕获组."()
" 但是有时候我们想不捕获就做分组,所以我们使用 "(?:)
" 还有很多其他形式的分组,但这些是最多的常见的.
a regular expression is comprised of many elements, the chief among them is the capturing group. "()
" but sometimes we want to make groups without capturing, so we use "(?:)
" there are many other forms of groups, but these are the most common.
在这种情况下,我们将整个正则表达式包围在一个捕获组中,因为您试图捕获所有内容,通常 - 任何正则表达式都会自动被捕获组包围,但在这种情况下,您明确指定了一个,所以它没有用自动捕获组包围您的正则表达式.
in this case, we surround the entire regular expression in a capturing group, because you are trying to capture everything, normally - any regular expression is automatically surrounded by a capturing group, but in this case, you specified one explicitly, so it did not surround your regular expression with an automatic capture group.
既然我们已经用捕获组包围了整个正则表达式,我们通过在开头添加 ?:
将我们拥有的组变成非捕获组,如上所示.我们也不能包围整个正则表达式而只将组转换为非捕获组,因为如您所见,它会自动将整个正则表达式转换为存在 non 的捕获组.我个人更喜欢显式编码.
now that we have surrounded the entire regular expression with a capturing group, we turn the group we have into a non-capturing group by adding ?:
to the beginning, as shown above. we could also not have surrounded the entire regular expression and only turned the group into a non-capturing group, since as you saw, it will automatically turn the whole regular expression into a capturing group where non is present. i personally prefer explicit coding.
关于正则表达式的更多信息可以在这里找到:http://docs.python.org/图书馆/re.html
further information about regular expressions can be found here: http://docs.python.org/library/re.html
这篇关于不在python中的正则表达式中返回整个模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!