否定前瞻断言在 python 中不起作用 [英] negative lookahead assertion not working in python

查看:27
本文介绍了否定前瞻断言在 python 中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务:
- 给定:图像文件名列表
- 待办事项:创建一个文件名不包含thumb"一词的新列表 - 即仅针对非缩略图图像(使用 PIL - Python 成像库).

我试过 r".*(?!thumb).*" 但它失败了.

我找到了在正则表达式前添加 ^ 并将 .* 放入负前瞻的解决方案(在 stackoverflow 上):r"^(?!.*thumb).*" 现在可以使用了.

问题是,我想了解为什么我的第一个解决方案不起作用,但我没有.由于正则表达式足够复杂,我真的很想了解它们.

我所理解的是 ^ 告诉解析器以下条件匹配字符串的开头.但是(不工作)第一个示例中的 .* 不是也从字符串的开头开始吗?我认为它会从字符串的开头开始,并在到达拇指"之前搜索尽可能多的字符.如果是这样,它将返回不匹配项.

有人可以解释为什么 r".*(?!thumb).*" 不起作用但 r"^(?!.*thumb).*" 吗?

谢谢!

解决方案

(该死,Jon 打败了我.哦,好吧,你还是可以看看例子)

就像其他人所说的那样,正则表达式不是这项工作的最佳工具.如果您正在使用文件路径,请查看 os.path.

至于过滤您不想要的文件,您可以在解析路径(其中 filename是一个 str).

对于后人,以下是我对这些正则表达式的看法.r".*(?!thumb).*" 不起作用,因为 .* 是贪婪的,并且前瞻的优先级非常低.看看这个:

<预><代码>>>>re.search('(.*)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()('/tmp/某处/拇指', '', '')>>>re.search('(.*?)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()('', '', '/tmp/某处/拇指')>>>re.search('(.*?)((?!thumb))(.*?)', '/tmp/somewhere/thumb').groups()('', '', '')

最后一个很奇怪...

另一个正则表达式 (r"^(?!.*thumb).*") 有效,因为 .* 在前瞻内,所以你没有角色被盗的任何问题.您实际上甚至不需要 ^,具体取决于您使用的是 re.match 还是 re.search:

<预><代码>>>>re.search('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()('', '哼')>>>re.search('^((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中AttributeError: 'NoneType' 对象没有属性 'groups'>>>re.match('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中AttributeError: 'NoneType' 对象没有属性 'groups'

Task:
- given: a list of images filenames
- todo: create a new list with filenames not containing the word "thumb" - i.e. only target the non-thumbnail images (with PIL - Python Imaging Library).

I've tried r".*(?!thumb).*" but it failed.

I've found the solution (here on stackoverflow) to prepend a ^ to the regex and to put the .* into the negative lookahead: r"^(?!.*thumb).*" and this now works.

The thing is, I would like to understand why my first solution did not work but I don't. Since regexes are complicated enough, I would really like to understand them.

What I do understand is that the ^ tells the parser that the following condition is to match at the beginning of the string. But doesn't the .* in the (not working) first example also start at the beginning of the string? I thought it would start at the beginning of the string and search through as many characters as it can before reaching "thumb". If so it would return a non-match.

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

Thanks!

解决方案

(Darn, Jon beat me. Oh well, you can look at the examples anyway)

Like the other guys have said, regex is not the best tool for this job. If you are working with filepaths, take a look at os.path.

As for filtering files you don't want, you can do if 'thumb' not in filename: ... once you have dissected the path (where filename is a str).

And for posterity, here are my thoughts on those regex. r".*(?!thumb).*" does not work as because .* is greedy and the lookahead is given a very low priority. Take a look at this:

>>> re.search('(.*)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('/tmp/somewhere/thumb', '', '')
>>> re.search('(.*?)((?!thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', '', '/tmp/somewhere/thumb')
>>> re.search('(.*?)((?!thumb))(.*?)', '/tmp/somewhere/thumb').groups()
('', '', '')

The last one is quite strange...

The other regex (r"^(?!.*thumb).*") works because .* is inside the lookahead, so you don't have any issues with characters being stolen. You actually don't even need the ^, depending on if you are using re.match or re.search:

>>> re.search('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
('', 'humb')
>>> re.search('^((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'
>>> re.match('((?!.*thumb))(.*)', '/tmp/somewhere/thumb').groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

这篇关于否定前瞻断言在 python 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆