为什么regex_match抛出“复杂性异常”? [英] Why does regex_match throw "complexity exception"?

查看:558
本文介绍了为什么regex_match抛出“复杂性异常”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试测试(使用 boost :: regex )文件中的行是否仅包含由空格分隔的数字条目。我遇到了一个我不理解的异常(见下文)。如果有人可以解释为什么会抛出它,那就太好了。也许我在定义模式时在做一些愚蠢的事情?下面是代码:

I am trying to test (using boost::regex) whether a line in a file contains only numeric entries seperated by spaces. I encountered an exception which I do not understand (see below). It would be great if someone could explain why it is thrown. Maybe I am doing something stupid here in my way of defining the patterns? Here is the code:

// regex_test.cpp
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;

int main(){
  // My basic pattern to test for a single numeric expression
  const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+\\.?[[:d:]]*";
  // pattern for the full line
  const string numeric_sequence_pattern = "([[:s:]]*"+numeric_value_pattern+"[[:s:]]*)+";

  regex r(numeric_sequence_pattern);
  string line= "1 2 3 4.444444444444";
  bool match = regex_match(line, r);
  cout<<match<<endl;

  //...
}

我编译成功

g++ -std=c++11 -L/usr/lib64/ -lboost_regex regex_test.cpp  

到目前为止,结果程序运行良好,并且 match == true 如我所愿。但是然后我测试了一条输入线,例如

The resulting program worked fine so far and match == true as I wanted. But then I test an input line like

string line= "1 2 3 4.44444444e-16"; 

当然,我的模式无法识别 4.44444444e格式-16 ,我希望 match == false 。但是,相反,我收到以下运行时错误:

Of course, my pattern isn't built to recognise the format 4.44444444e-16 and I would expect that match == false. However, instead I get the following runtime error:

terminate called after throwing an instance of  
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'  
  what():  The complexity of matching the regular expression exceeded predefined bounds.  
Try refactoring the regular expression to make each choice made by the state machine unambiguous.  
This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.  

为什么?

注意:我给出的示例在感觉在点后少加一位数字就可以了。这意味着

Why is that?
Note: the example I gave is extremal in the sense that putting one digit less after the dot works ok. That means

string line= "1 2 3 4.4444444e-16";

仅导致 match == false 为预期。所以,我感到困惑。这是怎么回事

just results in match == false as expected. So, I'm baffled. What is happening here?

已经感谢!

更新:

问题似乎已解决。鉴于 alejrb 的提示,我将模式重构为

Update:
Problem seems to be solved. Given the hint of alejrb I refactored the pattern to

const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+(?:\\.[[:d:]]*)?";  

这似乎可以正常工作。不知何故,原始模式 [[:d:]] + \\。?[]中孤立的可选 \\。。 d:]] * 留下了许多以不同方式匹配长数字序列的可能性。

我希望该模式现在是安全的。但是,如果有人找到一种以新形式炸毁它的方法,请告诉我!对我来说,这是否仍然可能不是很明显……

That seems to work as it should. Somehow, the isolated optional \\. inside the original pattern [[:d:]]+\\.?[[:d:]]* left to many possibilities to match a long sequence of digits in different ways.
I hope the pattern is safe now. However, if someone finds a way to use it for a blow up in the new form, let me know! It's not so obvious for me whether that might still be possible...

推荐答案

我会说您的正则表达式可能是指数级的回溯。为了保护您免受循环的影响(如果输入不再可用),正则表达式引擎会中止尝试。

I'd say that your regex is probably exponentially backtracking. To protect you from a loop that would become entirely unworkable if the input were any longer, the regex engine just aborts the attempt.

经常导致此问题的一种模式是(x + x +)+ 的形式-当您将第一个图案放入第二个图案时,便会在此处建立。

One of the patterns that often causes this problem is anything of the form (x+x+)+ - which you build up here when you place the first pattern inside the second.

http://www.regular-expressions.info/catastrophic中进行了很好的讨论.html

这篇关于为什么regex_match抛出“复杂性异常”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆