在libc ++上,为什么regex_match(“tournament”,regex(“tour | to | tournament”))失败? [英] On libc++, why does regex_match("tournament", regex("tour|to|tournament")) fail?

查看:126
本文介绍了在libc ++上,为什么regex_match(“tournament”,regex(“tour | to | tournament”))失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http: //llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp
以下测试存在:

In http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp, the following test exists:

    std::cmatch m;
    const char s[] = "tournament";
    assert(!std::regex_match(s, m, std::regex("tour|to|tournament")));
    assert(m.size() == 0);

为什么此匹配失败?

在VC ++ 2012和boost,匹配成功。

在Chrome和Firefox的JavaScript上,tournament.match(/ ^(tour | to | tournament) $ /)成功。

On VC++2012 and boost, the match succeeds.
On Javascript of Chrome and Firefox, "tournament".match(/^(?:tour|to|tournament)$/) succeeds.

仅在libc ++上,匹配失败。

Only on libc++, the match fails.

推荐答案

我相信测试是正确的。在re.alg下的所有libc ++测试中搜索锦标赛,并比较不同的引擎如何处理 regex(tour | to | tournament) regex_search regex_match 不同。

I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament"), and how regex_search differs from regex_match.

让我们从 regex_search 开始:

awk,egrep,extended:

awk, egrep, extended:

regex_search("tournament", m, regex("tour|to|tournament"))




匹配整个输入字符串:tournament。

matches the entire input string: "tournament".

ECMAScript:

ECMAScript:

regex_search("tournament", m, regex("tour|to|tournament"))




只匹配部分输入字符串:tour。

matches only part of the input string: "tour".

grep,basic:

grep, basic:

regex_search("tournament", m, regex("tour|to|tournament"))




完全不匹配。 '|'字符不是特殊的。

Doesn't match at all. The '|' character is not special.

awk,egrep和extended将尽可能与交替匹配。然而,ECMAScript交替是有序的。这在 ECMA-262 中有详细说明。一旦ECMAScript匹配交替中的分支,它将退出搜索。标准包括此示例:

awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:

/a|ab/.exec("abc")




返回结果a而不是ab。

returns the result "a" and not "ab".

< plug>

这在掌握Jeffrey EF Friedl的正则表达式中有详细讨论。没有这本书,我不能实现< regex> 。我会自由地承认,我还不知道正则表达式,而不是我所知道的更多。

This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex> without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.

在交替章节的结尾作者状态:

At the end of the chapter on alternation the author states:


如果你理解本章的第一次你读到
它,你可能没有读它

If you understood everything in this chapter the first time you read it, you probably didn't read it in the first place.

相信!

< / plug>

无论如何,ECMAScript仅匹配tour。只有当整个输入字符串匹配时, regex_match 算法才会返回成功。因为只有输入字符串的前4个字符匹配,所以与awk,egrep和extended不同,ECMAScript返回一个零大小为 cmatch 的false。

Anyway, ECMAScript matches only "tour". The regex_match algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch.

这篇关于在libc ++上,为什么regex_match(“tournament”,regex(“tour | to | tournament”))失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆