std :: regex和忽略标志 [英] std::regex and ignoring flags

查看:86
本文介绍了std :: regex和忽略标志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

学习了基本的c ++ 规则,我专注于 std :: regex ,创建了两个控制台应用程序:1。 renrem 和2. bfind

我决定在regex = / questions / tagged / c%2b%2b class = post-tag title =显示标记为'c ++' rel = tag> c ++ 的问题尽可能地容易,并加上 std ;命名为RFC(=正则表达式函数集合)

After learning basic c++ rules,I specialized my focus on std::regex, creating two console apps: 1.renrem and 2.bfind.
And I decided to create some convenient functions to deal with regex in c++ as easy as possible plus all with std; named RFC ( = regex function collection )

有很多奇怪的事情总是让我感到惊讶,但是这毁了我所有的尝试以及那两个控制台应用程序。

There are several strange things that always make me surprise, but this one ruined all my attempt and those two console apps.

重要功能之一是 count_match ,它计算字符串中的匹配数。这是完整的代码:

One of the important functions is count_match that counts number of match inside a string. Here is the full code:

unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){

    const bool flags_has_i = flags.find( "i" ) < flags.size();
    const bool flags_has_g = flags.find( "g" ) < flags.size();

    std::regex::flag_type regex_flag                  = flags_has_i ? std::regex_constants::icase         : std::regex_constants::ECMAScript;
//    std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
    std::regex rx( user_pattern, regex_flag );
    std::match_results< std::string::const_iterator > mr;

    unsigned int counter = 0;
    std::string temp = user_string;
    while( std::regex_search( temp, mr, rx ) ){
        temp = mr.suffix().str();
        ++counter;
    }

    if( flags_has_g ){
        return counter;
    } else {
        if( counter >= 1 ) return 1;
        else               return 0;
    }

}  

首先,您可以看到, search_flag 的行已被注释,因为它被 std :: regex_search 忽略,而知道原因吗?,因为- std :: regex_repalce 接受了确切的标志。因此 std :: regex_search 会忽略 format_first_only ,而忽略 std :: regex_replace 接受它。随它去吧。

First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce. So std::regex_search ignores the format_first_only but std::regex_replace accepts it. Let's it goes.

主要问题在于,当模式为 character class <时, icase 标志也会被忽略/ strong>-> [] 。实际上,当模式仅是大写字母小写字母时: [AZ] [az]

The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [A-Z] or [a-z]

假设此字符串s = ONE二三四五六七

Supposing this string s = "ONE TWO THREE four five six seven"

std

std::cout << count_match( s, "[A-Z]+" ) << '\n';          // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n';     // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n';    // 3 => Global match plus insensitive  

而确切的 laugauge和具有 boost 的输出是:

whereas for the exact perl and d laugauge and c++ with boost the output is:

std::cout << count_match( s, "[A-Z]+" ) << '\n';          // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n';     // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n';    // 7 => Global match plus insensitive  






我知道正则表达式的味道 PCRE ;或c ++使用它的 ECMAScript 262 ,但是我没有想法为什么简单的标志对于c ++唯一的搜索功能会被忽略?,因为 std :: regex_iterator std :: regex_token_iterator 也在内部使用此函数。


I know about regex flavors PCRE; or ECMAScript 262 that c++ uses it, But I have no ides why a simple flag, is ignored for the only search function that c++ has? Since std::regex_iterator and std::regex_token_iterator are also use this function internally.

很快,我不能将这两个应用程序和RFC与 std 库一起使用,因为如果这样!

And shortly, I can not use those two my apps and RFC with std library because if this!

因此,如果有人知道根据哪个规则,在 ECMAScript 262 中也许是有效的粗鲁,或者如果我是错误的地方请告诉我。谢谢。

So if someone knows according to which rule it is maybe a valid rude in ECMAScript 262 or perhaps if I am wrong anywhere please tell me. Thanks.

经过

gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04)
clang version 3.8.0-2ubuntu4  

代码:

perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7  

代码:

uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){

    const bool flag_has_g = flags.indexOf( "g" ) != -1;

    Regex!( char ) rx = regex( user_pattern, flags );
    uint counter = 0;
    foreach( mr; matchAll( user_string, rx ) ){
        ++counter;
    }

    if( flag_has_g ){
        return counter;
    } else {
        if( counter >= 1 ) return 1;
        else               return 0;
    }
} 

输出:

writeln( count_match( s, "[A-Z]+", "g" ) );  // 3
writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7  

代码:

var s = "ONE TWO THREE four five six seven";

var rx1 = new RegExp( "[A-Z]+" , "g" );
var rx2 = new RegExp( "[A-Z]+" , "gi" );

var counter = 0;
while( rx1.exec( s ) ){
   ++counter;
}
document.write( counter + "<br>" ); // 3

counter = 0;
while( rx2.exec( s ) ){
   ++counter;
}
document.write( counter ); // 7

gcc 7.1.0 测试之后,结果发现,低于 6.3.0 的版本输出为: 1 3 3 ,但使用 7.1.0 时,输出为 1 3 7
此处是链接

Okay. After testing with gcc 7.1.0 it turned out that with version below 6.3.0 the output is: 1 3 3 and but with 7.1.0 the output is 1 3 7 here is the link.

此版本的 clang 的输出也是正确的。 此处是链接。感谢 igor-tandetnik 用户

Also with this version of clang the output is correct. Here is the link. thanks to igor-tandetnik user

推荐答案


首先,因为您可以看到,对search_flag的行已注释,因为它被std :: regex_search忽略,我不知道为什么?因为-std :: regex_repalce接受了确切的标志。

First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce.

有问题的标志为 format_first_only 。此标志仅对替换操作有意义。在 regex_replace 中,默认值为全部替换,但如果传递此标志,它将变为仅替换第一。

The flag in question is format_first_only. This flag makes sense only for a "replace" operation. In regex_replace, the default is "replace all" but if you pass this flag it becomes "replace first only."

regex_match regex_search 中,根本没有替代品。这两个函数都只找到第一个匹配项(对于 regex_match 而言,匹配项必须占用了整个字符串)。由于该标志在这种情况下是没有意义的,因此我希望实现忽略它。但是我也不会因为抛出异常而错失实现。

In regex_match and regex_search, there is no replacement going on at all; both of those functions just find the first match (and in the case of regex_match, that match must consume the entire string). Since the flag is meaningless in that case, I would expect the implementation to ignore it; but I wouldn't fault the implementation for throwing an exception, either, if it chose to be noisy about it.


主要问题在于,当模式为字符类-> []时,icase标志也被忽略。实际上,当模式仅是大写字母或小写字母时:[AZ]或[az]

The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [A-Z] or [a-z]

icase 字符类工作异常肯定是您的供应商库中的错误。

icase working wrong for character classes is definitely a bug in your vendor's library.


  • 看起来 libstdc ++的错误已在GCC 6.3(2016年12月)和GCC 7.1(2017年5月)之间修复。

  • 看起来像 libc ++的错误已在Clang 3.2(2012年12月)和Clang 3.3(2013年6月)之间修复。 )。

  • Looks like libstdc++'s bug was fixed between GCC 6.3 (Dec 2016) and GCC 7.1 (May 2017).
  • Looks like libc++'s bug was fixed between Clang 3.2 (Dec 2012) and Clang 3.3 (Jun 2013).

这篇关于std :: regex和忽略标志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆