否定超前断言在python中不起作用 [英] negative lookahead assertion not working in python

查看:116
本文介绍了否定超前断言在python中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务:
-给定:图像文件名列表
-待办事项:使用不包含单词"thumb"的文件名创建一个新列表-即,仅定位非缩略图图像(使用PIL-Python Imaging Library).

Task:
- given: a list of images filenames
- todo: create a new list with filenames not containing the word "thumb" - i.e. only target the non-thumbnail images (with PIL - Python Imaging Library).

我尝试了r".*(?!thumb).*",但是失败了.

I've tried r".*(?!thumb).*" but it failed.

我找到了一种解决方案(在stackoverflow上),将^放在正则表达式前,并将.*置于负前瞻:r"^(?!.*thumb).*",现在可以使用.

I've found the solution (here on stackoverflow) to prepend a ^ to the regex and to put the .* into the negative lookahead: r"^(?!.*thumb).*" and this now works.

问题是,我想了解为什么我的第一个解决方案不起作用但我却不起作用. 由于正则表达式非常复杂,所以我真的很想了解它们.

The thing is, I would like to understand why my first solution did not work but I don't. Since regexes are complicated enough, I would really like to understand them.

我理解的是^告诉解析器以下条件要在字符串的开头匹配.但是(第一个示例中的.*不能(也不能正常工作)也不是从字符串的开头开始吗? 我以为它将从字符串的开头开始,并在到达拇指"之前搜索尽可能多的字符.如果是这样,它将返回不匹配项.

What I do understand is that the ^ tells the parser that the following condition is to match at the beginning of the string. But doesn't the .* in the (not working) first example also start at the beginning of the string? I thought it would start at the beginning of the string and search through as many characters as it can before reaching "thumb". If so it would return a non-match.

有人可以解释为什么r".*(?!thumb).*"不起作用而r"^(?!.*thumb).*"起作用吗?

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

谢谢!

推荐答案

(该死,乔恩击败了我.哦,你还是可以看看例子)

(Darn, Jon beat me. Oh well, you can look at the examples anyway)

就像其他人所说的那样,正则表达式并不是这项工作的最佳工具.如果您正在使用文件路径,请查看 os.path .

Like the other guys have said, regex is not the best tool for this job. If you are working with filepaths, take a look at os.path.

对于过滤不需要的文件,一旦解剖路径(其中filenamestr),就可以执行if 'thumb' not in filename: ....

As for filtering files you don't want, you can do if 'thumb' not in filename: ... once you have dissected the path (where filename is a str).

为了后代,这是我对那些正则表达式的想法. r".*(?!thumb).*"不起作用,因为.*贪婪,并且超前优先级的优先级非常低.看看这个:

And for posterity, here are my thoughts on those regex. r".*(?!thumb).*" does not work as because .* is greedy and the lookahead is given a very low priority. Take a look at this:

>>> re.search('(.*)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('/tmp/somewhere/thumb', '', '')
>>> re.search('(.*?)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', '', '/tmp/somewhere/thumb')
>>> re.search('(.*?)((?!thumb))(.*?)', '/tmp/somewhere/thumb').groups()
('', '', '')

最后一个很奇怪...

The last one is quite strange...

另一个正则表达式(r"^(?!.*thumb).*")之所以起作用,是因为.*位于先行内部,因此您不会遇到字符被盗的任何问题.实际上,您甚至不需要^,这取决于您使用的是re.match还是re.search:

The other regex (r"^(?!.*thumb).*") works because .* is inside the lookahead, so you don't have any issues with characters being stolen. You actually don't even need the ^, depending on if you are using re.match or re.search:

>>> re.search('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', 'humb')
>>> re.search('^((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
>>> re.match('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

这篇关于否定超前断言在python中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆