此正则表达式无法解析所有有效的浮点数 [英] this regular expression fail to parse all valid floating numbers

查看:55
本文介绍了此正则表达式无法解析所有有效的浮点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试查找所有浮点数(可以是带 -/+ 前缀或不带 -/+ 前缀的指数形式).例如,以下是有效格式:-1.2 +1.2 .2 -3 3E4 -3e5 e-5

I am trying to find all floating number (could be in exponential forms with -/+ prefix or not). For example, the following is the valid format: -1.2 +1.2 .2 -3 3E4 -3e5 e-5

文本源包含多个以空格或逗号分隔的数字.我需要用正则表达式来告诉

The source of text contains several numbers separated with space or comma. I need to use regular expression to tell

  1. 判断是否有任何无效数字(例如 1.2 3.2 s3) s3 不是有效数字
  2. 列出每个有效数字

我不知道如何完成 (1) 但对于 (2),我使用的是 boost::regex 和以下代码

I have no idea how to get (1) done but for (2), I am using boost::regex and the following code

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
wstring src("1.2 -3.4 3.2 3 2 1e-3 3e3");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
regex_search(src, what, regexp, boost::match_continuous);
wcout << "RE: " << strre << endl << endl;
wcout << "SOURCE: [" << src << "]" << endl;

for (int i=0; i<what.size(); i++)
  wcout << "OUTPUT: [" << wstring(what[i].first, what[i].second) << "]"<< endl;

但是这段代码只显示了第一个数字(1.2).我也试过 boost::match_all、boost::match_default,结果一样.

But this code only show me the first number (1.2). I also try boost::match_all, boost::match_default, the same result.

附加信息:大家好,让我们不要担心双反斜杠问题,它在我的代码中正确表达(因为在我的测试代码中,我不是通过显式字符串从文本中读取字符串).无论如何,我修改代码如下

ADDITIONAL INFO: Hi all, let's not worry about double backslash issue, it is correctly expressed in my code (because in my testing code, I read the string from a text not by explicit string). Anyway, I modify the code as follow

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
wcout << "RE: " << strre << endl << endl;
while (src.length()>0)
{
  wcout << "SOURCE: [" << src << "]" << endl;
  regex_search(src, what, regexp, boost::match_default);
  wcout << "OUTPUT: [" << wstring(what[0].first, what[0].second) << endl;
  src = wstring(what[0].second, src.end());
}

现在,它正确地显示了所有单个数字,但我必须多次运行 regex_search,因为它一次只给出一个数字.好吧,我只是不明白为什么 regex_search 不会给我所有的结果.有没有办法运行一次搜索并返回所有结果?

Now, it is correctly show everything single numbers but I have to run regex_search several time due to it only give one number at a time. Well, I just don't understand why regex_search won't give me all results instead. Is that any way to run the search once and get all the results back?

推荐答案

您通常必须对 C++ 字符串中的内容进行双重转义.所以你的 "\." 变成了 ..你需要它是 "\\." 等.同样,你的 "\b" 不是一个字边界,而是一个文字退格!以同样的方式修复:"\\b".

You normally have to double-escape backslash things in a C++ string. So your "\." turns into just .. You would need it to be "\\.", etc. Similarly, your "\b" becomes not a word-boundary but rather a literal backspace! Fix the same way: "\\b".

另外,那个 strre 类的文档在哪里?您确定它理解您使用的语言吗?

Also, where’s the doc for that strre class? Are you sure it understands the language you are using?

显然,新的 C++ 标准具有原始字符串文字.这些就像 Go 中的反引号"字符串,或者像 Perl 中的单引号"字符串或/patterns/.有关详细信息,请参阅此答案.

Apparently the new C++ standard has raw string literals. These work like `backticked` strings in Go, or like 'single-quoted' strings or /patterns/ in Perl. See this answer for details.

这里有一种更高级的检测浮点文字的模式,但不使用反斜杠:

Here’s a somewhat fancier pattern for detecting floating-point literals, but which uses no backslashes:

 [+-]?(?=[.]?[0-9])[0-9]*(?:[.][0-9]*)?(?:[Ee][+-]?[0-9]+)?

请注意,它确实需要前瞻,而 ERE 不支持.您可能应该使用 PCRE 库,它确实如此.分解,那是

Note that it does require lookaheads, which EREs don’t support. You should probably use the PCRE library, which does. Broken down, that’s

[+-]?                   # optional leading sign
(?=[.]?[0-9])           # lookahead for a digit, maybe with an intervening dot
[0-9]*                  # maybe some digits
(?:[.][0-9]*)?          # maybe a (dot plus maybe some digits)
(?:[Ee][+-]?[0-9]+)?    # maybe an exponent, which may have a sign and must have digits

模式由 Perl 的 Regexp::Common 库提供.

Pattern courtesy of Perl’s Regexp::Common library.

这篇关于此正则表达式无法解析所有有效的浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆