如果不匹配正则表达式慢 [英] RegEx slow when not match

查看:146
本文介绍了如果不匹配正则表达式慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这正则表达式查询运行得很好,当我没有插入任何迹象表明未在 [。] 字'这里'之前:

This regex query run fine when i not insert any sign that not in [,.] before the word 'here' :

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([,.]|$| here)", "")

不过,这将是非常非常慢(冻结约3-5秒以上),如果我插入一个标志,不是在 [。] 字'这里'之前。比如我插入字之前'这里'符号'/':

But it will be very very slow (freeze about 3-5 second or more) if i insert a sign that not in [,.] before the word 'here'. For example i insert the sign '/' before the word 'here' :

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([,.]|$| here)", "")

走了,当我添加/我的模式的问题 [。]

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([/,.]|$| here)", "")

但是我希望我的正则表达式忽略符号/,而不是相匹配的标志/我的句子的结尾。为什么这个问题来了,如何解决?

But i want my regex ignore the sign / instead of matching the sign / as the end of my sentence. Why this problem come and how to resolve it ?

推荐答案

您是的灾难性回溯。该部分:

(?:[a-z']* ??)*?

可以在可能的组合的指数数量相匹配的话。由于空间是可选的,这个词其他单独可以匹配所有这些变化(其中括号说明哪些匹配由内组中的一个实例)的:

can match the words in an exponential amount of possible combinations. Since the space is optional, the word else alone can be matched in all of these variations (where the parentheses indicate what is matched by one "instance" of the inner group):

(else)
(els)(e)
(el)(se)
(el)(s)(e)
(e)(lse)
(e)(l)(se)
(e)(ls)(e)
(e)(l)(s)(e)

和这个爆炸更长的话,尤其是一个完整的句子。只要你有嵌套的重复,这是不明确的,其中一个重复结束,而另一方面开始普遍出现的问题。然后,如果不存在匹配,发动机需要通过所有这些情况下,以回溯之前它可以声明故障。如果存在匹配,则回溯通常是不必要的,该问题被忽视。最好的解决办法是使用<一个href="http://stackoverflow.com/questions/17043454/using-regexes-how-to-efficiently-match-strings-between-double-quotes-with-embed">"unrolling-the-loop" 的格局,使空间在强制重复:

And this explodes for longer words, and especially an entire sentence. Generally the problem occurs whenever you have nested repetition, and it is not clear where one repetition ends and the other begins. Then, if there is no match, the engine needs to backtrack through all of these cases before it can declare failure. If there is a match, the backtracking is usually unnecessary, and the problem goes unnoticed. The best fix is to use an "unrolling-the-loop" pattern, to make the space mandatory in the repetition:

"My ([a-z']*(?: [a-z']*)*?)\s*([,.]|$| here)"

现在的空间​​是强制性的,每个实例重复的必须的匹配整个单词,它应该可以解决这个问题。

Now that the space is mandatory, each "instance" of the repeated has to match an entire word, which should resolve the problem.

这篇关于如果不匹配正则表达式慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆