std :: regex和忽略标志 [英] std::regex and ignoring flags

查看：86 发布时间：2020/9/27 20:02:25 c++ regex c++11

本文介绍了std :: regex和忽略标志的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

学习了基本的c ++ 规则，我专注于 std :: regex ，创建了两个控制台应用程序：1。 renrem 和2. bfind 。

我决定在regex = / questions / tagged / c％2b％2b class = post-tag title =显示标记为'c ++' rel = tag> c ++ 的问题尽可能地容易，并加上 std ;命名为RFC（=正则表达式函数集合）

After learning basic c++ rules,I specialized my focus on std::regex, creating two console apps: 1.renrem and 2.bfind.
And I decided to create some convenient functions to deal with regex in c++ as easy as possible plus all with std; named RFC ( = regex function collection )

有很多奇怪的事情总是让我感到惊讶，但是这毁了我所有的尝试以及那两个控制台应用程序。

There are several strange things that always make me surprise, but this one ruined all my attempt and those two console apps.

重要功能之一是 count_match ，它计算字符串中的匹配数。这是完整的代码：

One of the important functions is count_match that counts number of match inside a string. Here is the full code:

unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){

    const bool flags_has_i = flags.find( "i" ) < flags.size();
    const bool flags_has_g = flags.find( "g" ) < flags.size();

    std::regex::flag_type regex_flag                  = flags_has_i ? std::regex_constants::icase         : std::regex_constants::ECMAScript;
//    std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
    std::regex rx( user_pattern, regex_flag );
    std::match_results< std::string::const_iterator > mr;

    unsigned int counter = 0;
    std::string temp = user_string;
    while( std::regex_search( temp, mr, rx ) ){
        temp = mr.suffix().str();
        ++counter;
    }

    if( flags_has_g ){
        return counter;
    } else {
        if( counter >= 1 ) return 1;
        else               return 0;
    }

}

首先，您可以看到， search_flag 的行已被注释，因为它被 std :: regex_search 忽略，而知道原因吗？，因为- std :: regex_repalce 接受了确切的标志。因此 std :: regex_search 会忽略 format_first_only ，而忽略 std :: regex_replace 接受它。随它去吧。

First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce. So std::regex_search ignores the format_first_only but std::regex_replace accepts it. Let's it goes.

主要问题在于，当模式为 character class <时， icase 标志也会被忽略/ strong>-> [] 。实际上，当模式仅是大写字母或小写字母时： [AZ] 或 [az]

The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [A-Z] or [a-z]

假设此字符串s = ONE二三四五六七

Supposing this string s = "ONE TWO THREE four five six seven"

c ++ std

std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 3 => Global match plus insensitive

而确切的 perl 和 d laugauge和 c ++ 具有 boost 的输出是：

whereas for the exact perl and d laugauge and c++ with boost the output is:

std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 7 => Global match plus insensitive

我知道正则表达式的味道 PCRE ；或c ++使用它的 ECMAScript 262 ，但是我没有想法为什么简单的标志对于c ++唯一的搜索功能会被忽略？，因为 std :: regex_iterator 和 std :: regex_token_iterator 也在内部使用此函数。

I know about regex flavors PCRE; or ECMAScript 262 that c++ uses it, But I have no ides why a simple flag, is ignored for the only search function that c++ has? Since std::regex_iterator and std::regex_token_iterator are also use this function internally.

很快，我不能将这两个应用程序和RFC与 std 库一起使用，因为如果这样！

And shortly, I can not use those two my apps and RFC with std library because if this!

因此，如果有人知道根据哪个规则，在 ECMAScript 262 中也许是有效的粗鲁，或者如果我是错误的地方请告诉我。谢谢。

So if someone knows according to which rule it is maybe a valid rude in ECMAScript 262 or perhaps if I am wrong anywhere please tell me. Thanks.

经过

gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04) clang version 3.8.0-2ubuntu4

perl 代码：

perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3 perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7

d 代码：

uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){ const bool flag_has_g = flags.indexOf( "g" ) != -1; Regex!( char ) rx = regex( user_pattern, flags ); uint counter = 0; foreach( mr; matchAll( user_string, rx ) ){ ++counter; } if( flag_has_g ){ return counter; } else { if( counter >= 1 ) return 1; else return 0; } }

输出：

writeln( count_match( s, "[A-Z]+", "g" ) ); // 3 writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7

js 代码：

var s = "ONE TWO THREE four five six seven"; var rx1 = new RegExp( "[A-Z]+" , "g" ); var rx2 = new RegExp( "[A-Z]+" , "gi" ); var counter = 0; while( rx1.exec( s ) ){ ++counter; } document.write( counter + "<br>" ); // 3 counter = 0; while( rx2.exec( s ) ){ ++counter; } document.write( counter ); // 7

gcc 7.1.0 测试之后，结果发现，低于 6.3.0 的版本输出为： 1 3 3 ，但使用 7.1.0 时，输出为 1 3 7
此处是链接。

Okay. After testing with gcc 7.1.0 it turned out that with version below 6.3.0 the output is: 1 3 3 and but with 7.1.0 the output is 1 3 7 here is the link.

此版本的 clang 的输出也是正确的。此处是链接。感谢 igor-tandetnik 用户

Also with this version of clang the output is correct. Here is the link. thanks to igor-tandetnik user

推荐答案

首先，因为您可以看到，对search_flag的行已注释，因为它被std :: regex_search忽略，我不知道为什么？因为-std :: regex_repalce接受了确切的标志。

First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce.

有问题的标志为 format_first_only 。此标志仅对替换操作有意义。在 regex_replace 中，默认值为全部替换，但如果传递此标志，它将变为仅替换第一。

The flag in question is format_first_only. This flag makes sense only for a "replace" operation. In regex_replace, the default is "replace all" but if you pass this flag it becomes "replace first only."

在 regex_match 和 regex_search 中，根本没有替代品。这两个函数都只找到第一个匹配项（对于 regex_match 而言，匹配项必须占用了整个字符串）。由于该标志在这种情况下是没有意义的，因此我希望实现忽略它。但是我也不会因为抛出异常而错失实现。

In regex_match and regex_search, there is no replacement going on at all; both of those functions just find the first match (and in the case of regex_match, that match must consume the entire string). Since the flag is meaningless in that case, I would expect the implementation to ignore it; but I wouldn't fault the implementation for throwing an exception, either, if it chose to be noisy about it.

主要问题在于，当模式为字符类-> []时，icase标志也被忽略。实际上，当模式仅是大写字母或小写字母时：[AZ]或[az]

The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [A-Z] or [a-z]

icase 字符类工作异常肯定是您的供应商库中的错误。

icase working wrong for character classes is definitely a bug in your vendor's library.

看起来 libstdc ++的错误已在GCC 6.3（2016年12月）和GCC 7.1（2017年5月）之间修复。

看起来像 libc ++的错误已在Clang 3.2（2012年12月）和Clang 3.3（2013年6月）之间修复。）。

Looks like libstdc++'s bug was fixed between GCC 6.3 (Dec 2016) and GCC 7.1 (May 2017).

Looks like libc++'s bug was fixed between Clang 3.2 (Dec 2012) and Clang 3.3 (Jun 2013).

这篇关于std :: regex和忽略标志的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

std :: regex和忽略标志 [英] std::regex and ignoring flags

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

std :: regex和忽略标志 [英] std::regex and ignoring flags

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭