Python重新无限执行 [英] Python re infinite execution
问题描述
我正在尝试执行此代码:
I'm trying to execute this code :
import re
pattern = r"(\w+)\*([\w\s]+)*/$"
re_compiled = re.compile(pattern)
results = re_compiled.search('COPRO*HORIZON 2000 HOR')
print(results.groups())
但是 Python 没有响应.该进程占用 100% 的 CPU 并且不会停止.我已经在 Python 2.7.1 和 Python 3.2 上尝试过,结果相同.
But Python does not respond. The process takes 100% of the CPU and does not stop. I've tried this both on Python 2.7.1 and Python 3.2 with identical results.
推荐答案
您的正则表达式遇到 灾难性回溯 因为你有嵌套的量词 (([...]+)*
).由于您的正则表达式要求字符串以 /
结尾(这在您的示例中失败),因此正则表达式引擎会尝试字符串的所有排列,徒劳地希望找到匹配的组合.这就是它卡住的地方.
Your regex runs into catastrophic backtracking because you have nested quantifiers (([...]+)*
). Since your regex requires the string to end in /
(which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That's where it gets stuck.
为了说明,让我们假设 "A*BCD"
作为正则表达式的输入,看看会发生什么:
To illustrate, let's assume "A*BCD"
as the input to your regex and see what happens:
(\w+)
匹配A
.很好.\*
匹配*
.是的.[\w\s]+
匹配BCD
.好的./
匹配失败(没有剩余的字符可以匹配).好的,让我们备份一个字符./
无法匹配D
.哼.让我们再备份一些.[\w\s]+
匹配BC
,重复的[\w\s]+
匹配D代码>.
/
匹配失败.备份./
无法匹配D
.再备份一些.[\w\s]+
匹配B
,重复的[\w\s]+
匹配CD代码>.
/
匹配失败.再次备份./
无法匹配D
.再备份一些.- 怎么样
[\w\s]+
匹配B
,重复[\w\s]+
匹配C
,重复的[\w\s]+
匹配D
?不?让我们试试别的. [\w\s]+
匹配BC
.让我们停下来看看会发生什么.- 该死,
/
仍然不匹配D
. [\w\s]+
匹配B
.- 仍然没有运气.
/
与C
不匹配. - 嘿,整个组都是可选的
(...)*
. - 不,
/
仍然不匹配B
. - 好吧,我放弃了.
(\w+)
matchesA
. Good.\*
matches*
. Yay.[\w\s]+
matchesBCD
. OK./
fails to match (no characters left to match). OK, let's back up one character./
fails to matchD
. Hum. Let's back up some more.[\w\s]+
matchesBC
, and the repeated[\w\s]+
matchesD
./
fails to match. Back up./
fails to matchD
. Back up some more.[\w\s]+
matchesB
, and the repeated[\w\s]+
matchesCD
./
fails to match. Back up again./
fails to matchD
. Back up some more, again.- How about
[\w\s]+
matchesB
, repeated[\w\s]+
matchesC
, repeated[\w\s]+
matchesD
? No? Let's try something else. [\w\s]+
matchesBC
. Let's stop here and see what happens.- Darn,
/
still doesn't matchD
. [\w\s]+
matchesB
.- Still no luck.
/
doesn't matchC
. - Hey, the whole group is optional
(...)*
. - Nope,
/
still doesn't matchB
. - OK, I give up.
现在这是一个只有三个字母的字符串.你的有大约 30 个,尝试所有排列会让你的计算机一直忙到几天结束.
Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.
我想你想要做的是在 *
之前/之后获取字符串,在这种情况下,使用
I suppose what you're trying to do is to get the strings before/after *
, in which case, use
pattern = r"(\w+)\*([\w\s]+)$"
这篇关于Python重新无限执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!