为什么这需要这么长时间才能匹配?这是一个错误吗? [英] Why does this take so long to match? Is it a bug?

查看:31
本文介绍了为什么这需要这么长时间才能匹配?这是一个错误吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要匹配 Web 应用程序中的某些 URL,即 /123,456,789,并编写此正则表达式以匹配模式:

I need to match certain URLs in web application, i.e. /123,456,789, and wrote this regex to match the pattern:

r'(\d+(,)?)+/$'

我注意到它似乎没有评估,即使在测试模式几分钟后:

I noticed that it does not seem to evaluate, even after several minutes when testing the pattern:

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523')

预期的结果是没有匹配项.

The expected result would be that there were no matches.

然而,这个表达式几乎立即执行(注意尾部斜杠):

This expression, however, executes almost immediately (note the trailing slash):

re.findall(r'(\d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/')

这是一个错误吗?

推荐答案

有一些灾难性的回溯正在进行中这将导致处理量呈指数级增长,具体取决于非匹配字符串的长度.这与您的嵌套重复和可选的逗号有关(即使某些正则表达式引擎可以确定这与尝试所有无关的重复不匹配).这是通过优化表达式来解决的.

There is some catastrophic backtracking going on that will cause an exponential amount of processing depending on how long the non-match string is. This has to do with your nested repetitions and optional comma (even though some regex engines can determine that this wouldn't be a match with attempting all of the extraneous repetition). This is solved by optimizing the expression.

最简单的方法是查找 1+ 位数字或逗号,后跟斜杠和字符串结尾:[\d,]+/$.然而,这并不完美,因为它允许诸如 ,123,,4,5/ 之类的东西.

The easiest way to accomplish this is to just look for 1+ digits or commas followed by a slash and the end of the string: [\d,]+/$. However, that is not perfect since it would allow for something like ,123,,4,5/.

为此,您可以使用初始尝试的稍微优化的版本:(?:\d,?)+/$.首先,我让你的重复组非捕获 ((?:...)) 这不是必需的,但它提供了更干净的匹配".接下来,也是唯一关键的一步,我不再重复组内的 \d,因为组已经在重复了. 最后,我删除了可选的 , 周围的不必要组,因为 ? 只影响最后一个字符.这几乎会寻找一个数字,也许是一个逗号,然后重复,最后跟着一个尾随的 /.

For this you can use a slightly optimized version of your initial try: (?:\d,?)+/$. First, I made your repeating group non-capturing ((?:...)) which isn't necessary but it provides for a "cleaner match". Next, and the only crucial step, I stopped repeating the \d inside of the group since the group is already repeating. Finally, I removed the unnecessary group around the optional , since ? only affects the last character. Pretty much this will look for one digit, maybe a comma, then repeat, and finally followed by a trailing /.

这仍然可以匹配一个奇怪的字符串 1,2,3,/,所以对于它,我用 负面回顾:(?:\d,?)+(?.这将断言在尾随 / 之前没有逗号.

This can still match an odd string 1,2,3,/, so for the heck of it I improved your original regex with a negative lookbehind: (?:\d,?)+(?<!,)/$. This will assert that there is no comma directly before the trailing /.

这篇关于为什么这需要这么长时间才能匹配?这是一个错误吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆