为什么用Regex.IsMatch检查此字符串会导致CPU达到100%? [英] Why does checking this string with Regex.IsMatch cause CPU to reach 100%?
问题描述
在特定字符串上使用 Regex.IsMatch
(C#、. Net 4.5)时,CPU达到100%.
When using Regex.IsMatch
(C#, .Net 4.5) on a specific string, the CPU reaches 100%.
字符串:
https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792/?type=1&permPage=1
模式:
^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$
完整代码:
Regex.IsMatch("https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792/?type=1&permPage=1",
@"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$");
我发现编辑URL可以防止此问题.修改后的网址:
I found that redacting the URL prevents this problem. Redacted URL:
https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792
但是仍然非常有兴趣了解导致这种情况的原因.
But still very interested in understanding what causes this.
推荐答案
正如nu11p01n73R指出的那样,您使用正则表达式进行了大量回溯.这是因为表达式的各个部分都可以匹配相同的事物,这使引擎在找到结果之前必须尝试许多选择.
As nu11p01n73R pointed out, you have a lot backtracking with your regular expression. That’s because parts of your expression can all match the same thing, which gives the engine many choices it has to try before finding a result.
您可以通过更改正则表达式以使各个节更具体来避免这种情况.在您的情况下,原因是您想匹配一个真实的点,但使用了所有匹配字符.
.您应该将其转义为 \.
.
You can avoid this by changing the regular expression to make individual sections more specific. In your case, the cause is that you wanted to match a real dot but used the match-all character .
instead. You should escape that to \.
.
这应该已经大大减少了回溯的需要并使其快速:
This should already reduce the backtracking need a lot and make it fast:
^http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=])?$
如果要实际匹配原始字符串,则需要在字符类的末尾添加一个量词:
And if you want to actually match the original string, you need to add a quantifier to the character class at the end:
^http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]+)?$
↑
这篇关于为什么用Regex.IsMatch检查此字符串会导致CPU达到100%?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!