否定超前断言在python中不起作用 [英] negative lookahead assertion not working in python
问题描述
任务:
-给定:图像文件名列表
-待办事项:使用不包含单词"thumb"的文件名创建一个新列表-即,仅定位非缩略图图像(使用PIL-Python Imaging Library).
Task:
- given: a list of images filenames
- todo: create a new list with filenames not containing the word "thumb" - i.e. only target the non-thumbnail images (with PIL - Python Imaging Library).
我尝试了r".*(?!thumb).*"
,但是失败了.
I've tried r".*(?!thumb).*"
but it failed.
我找到了一种解决方案(在stackoverflow上),将^
放在正则表达式前,并将.*
置于负前瞻:r"^(?!.*thumb).*"
,现在可以使用.
I've found the solution (here on stackoverflow) to prepend a ^
to the regex and to put the .*
into the negative lookahead: r"^(?!.*thumb).*"
and this now works.
问题是,我想了解为什么我的第一个解决方案不起作用但我却不起作用. 由于正则表达式非常复杂,所以我真的很想了解它们.
The thing is, I would like to understand why my first solution did not work but I don't. Since regexes are complicated enough, I would really like to understand them.
我理解的是^
告诉解析器以下条件要在字符串的开头匹配.但是(第一个示例中的.*
不能(也不能正常工作)也不是从字符串的开头开始吗?
我以为它将从字符串的开头开始,并在到达拇指"之前搜索尽可能多的字符.如果是这样,它将返回不匹配项.
What I do understand is that the ^
tells the parser that the following condition is to match at the beginning of the string. But doesn't the .*
in the (not working) first example also start at the beginning of the string?
I thought it would start at the beginning of the string and search through as many characters as it can before reaching "thumb". If so it would return a non-match.
有人可以解释为什么r".*(?!thumb).*"
不起作用而r"^(?!.*thumb).*"
起作用吗?
Could someone please explain why r".*(?!thumb).*"
does not work but r"^(?!.*thumb).*"
does?
谢谢!
推荐答案
(该死,乔恩击败了我.哦,你还是可以看看例子)
(Darn, Jon beat me. Oh well, you can look at the examples anyway)
就像其他人所说的那样,正则表达式并不是这项工作的最佳工具.如果您正在使用文件路径,请查看 os.path .
Like the other guys have said, regex is not the best tool for this job. If you are working with filepaths, take a look at os.path.
对于过滤不需要的文件,一旦解剖路径(其中filename
是str
),就可以执行if 'thumb' not in filename: ...
.
As for filtering files you don't want, you can do if 'thumb' not in filename: ...
once you have dissected the path (where filename
is a str
).
为了后代,这是我对那些正则表达式的想法. r".*(?!thumb).*"
不起作用,因为.*
贪婪,并且超前优先级的优先级非常低.看看这个:
And for posterity, here are my thoughts on those regex. r".*(?!thumb).*"
does not work as because .*
is greedy and the lookahead is given a very low priority. Take a look at this:
>>> re.search('(.*)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('/tmp/somewhere/thumb', '', '')
>>> re.search('(.*?)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', '', '/tmp/somewhere/thumb')
>>> re.search('(.*?)((?!thumb))(.*?)', '/tmp/somewhere/thumb').groups()
('', '', '')
最后一个很奇怪...
The last one is quite strange...
另一个正则表达式(r"^(?!.*thumb).*"
)之所以起作用,是因为.*
位于先行内部,因此您不会遇到字符被盗的任何问题.实际上,您甚至不需要^
,这取决于您使用的是re.match
还是re.search
:
The other regex (r"^(?!.*thumb).*"
) works because .*
is inside the lookahead, so you don't have any issues with characters being stolen. You actually don't even need the ^
, depending on if you are using re.match
or re.search
:
>>> re.search('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', 'humb')
>>> re.search('^((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
>>> re.match('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
这篇关于否定超前断言在python中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!