std :: regex_match和具有奇怪行为的惰性量词 [英] std::regex_match and lazy quantifier with strange behavior

查看:157
本文介绍了std :: regex_match和具有奇怪行为的惰性量词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道:
惰性量词匹配:尽可能少(最短匹配)

也知道构造函数:

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

并且:
ECMAScript支持非贪婪匹配,
ECMAScript正则表达式"<tag[^>]*>.*?</tag>"
仅在第一个结束标记之前匹配... en.cppreference

And:
ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference

并且:
最多只能从ECMAScript中选择一个语法选项, basicextendedawkgrepegrep.如果没有选择语法, 假定选择了ECMAScript ... en.cppreference

And:
At most one grammar option must be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference

并且:
请注意,regex_match仅将成功地将正则表达式匹配到整个字符序列,而std::regex_search将成功地将子序列匹配...

And:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match

这是我的代码:+ 实时

#include <iostream>
#include <string>
#include <regex>

int main(){

        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;

        using namespace std::regex_constants;

        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
} 

和输出:

1
s/one/
1
s/one/two/three/four/five/six/g

Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.


std::regex_match不应与任何内容匹配,并且应返回带有非贪婪量词.+?


std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?

实际上,这里的非贪婪 .+?量词与贪婪的含义相同,并且/.+?//.+/匹配相同细绳.它们是不同的模式. 所以问题是为什么问号会被忽略?

In fact, here, the non-greedy .+? quantifier has the same meaning as greedy one, and both /.+?/ and /.+/ match the same string. They are different patterns. So the problem is why the question mark is ignored?

regex101

快速测试:

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g


注意
此正则表达式:std::basic_regex< char > regex ( "s?/.+?/g?" );非贪婪
这是std::basic_regex< char > regex ( "s?/.+/g?" );贪婪
std::regex_match具有相同的输出.仍然都匹配整个字符串!
但是std::regex_search具有不同的输出.
同样,s?g?无关紧要,并且与/.*?/仍然匹配整个字符串!


NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match. Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!

更多详细信息

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901

推荐答案

我看不到任何不一致之处. regex_match尝试匹配整个字符串,所以s?/.+?/g?延迟扩展直到覆盖整个字符串.

I don't see any inconsistency. regex_match tries to match the whole string, so s?/.+?/g? lazily expands till the whole string is covered.

这些图"(用于regex_search)有望帮助人们获得贪婪的想法:

These "diagrams" (for regex_search) will hopefully help to get the idea of greediness:

Non-greedy:

a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba  # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba  # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".

Greedy:

a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa|  # try .* == "baba" first
# backtrack
a.*|a: abab|a  # try .* == "bab" now
a.*a|: ababa|

在这种情况下,regex_match( abc )就像regex_search( ^abc$ ).

这篇关于std :: regex_match和具有奇怪行为的惰性量词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆