std :: regex和忽略标志 [英] std::regex and ignoring flags
问题描述
学习了基本的c ++ 规则,我专注于 std :: regex
,创建了两个控制台应用程序:1。 renrem
和2. bfind
。
我决定在 std
;命名为RFC(=正则表达式函数集合)
After learning basic c++ rules,I specialized my focus on std::regex
, creating two console apps: 1.renrem
and 2.bfind
.
And I decided to create some convenient functions to deal with regex
in c++ as easy as possible plus all with std
; named RFC ( = regex function collection )
有很多奇怪的事情总是让我感到惊讶,但是这毁了我所有的尝试以及那两个控制台应用程序。
There are several strange things that always make me surprise, but this one ruined all my attempt and those two console apps.
重要功能之一是 count_match
,它计算字符串中的匹配数。这是完整的代码:
One of the important functions is count_match
that counts number of match inside a string. Here is the full code:
unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){
const bool flags_has_i = flags.find( "i" ) < flags.size();
const bool flags_has_g = flags.find( "g" ) < flags.size();
std::regex::flag_type regex_flag = flags_has_i ? std::regex_constants::icase : std::regex_constants::ECMAScript;
// std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
std::regex rx( user_pattern, regex_flag );
std::match_results< std::string::const_iterator > mr;
unsigned int counter = 0;
std::string temp = user_string;
while( std::regex_search( temp, mr, rx ) ){
temp = mr.suffix().str();
++counter;
}
if( flags_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
首先,您可以看到, search_flag
的行已被注释,因为它被 std :: regex_search
忽略,而知道原因吗?,因为- std :: regex_repalce
接受了确切的标志。因此 std :: regex_search
会忽略 format_first_only
,而忽略 std :: regex_replace
接受它。随它去吧。
First of all, as you can see, the line for search_flag
was commented because it is ignored by std::regex_search
and I do not know why? since -- the exact flag is accepted for std::regex_repalce
. So std::regex_search
ignores the format_first_only
but std::regex_replace
accepts it. Let's it goes.
主要问题在于,当模式为 character class <时, icase
标志也会被忽略/ strong>-> []
。实际上,当模式仅是大写字母
或小写字母
时: [AZ]
或 [az]
The main problem is here that the icase
flag is also ignored when the pattern is character class -> []
. In fact when the pattern is only capital letter
or small letter
: [A-Z]
or [a-z]
假设此字符串s = ONE二三四五六七
Supposing this string s = "ONE TWO THREE four five six seven"
c ++ std
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 3 => Global match plus insensitive
而确切的 perl 和 d laugauge和 c ++ 具有 boost
的输出是:
whereas for the exact perl and d laugauge and c++ with boost
the output is:
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 7 => Global match plus insensitive
我知道正则表达式的味道 PCRE ;或c ++使用它的 ECMAScript 262 ,但是我没有想法为什么简单的标志对于c ++唯一的搜索功能会被忽略?,因为 std :: regex_iterator
和 std :: regex_token_iterator
也在内部使用此函数。
I know about regex flavors PCRE; or ECMAScript 262 that c++ uses it, But I have no ides why a simple flag, is ignored for the only search function that c++ has? Since std::regex_iterator
and std::regex_token_iterator
are also use this function internally.
很快,我不能将这两个应用程序和RFC与 std
库一起使用,因为如果这样!
And shortly, I can not use those two my apps and RFC with std
library because if this!
因此,如果有人知道根据哪个规则,在 ECMAScript 262
中也许是有效的粗鲁,或者如果我是错误的地方请告诉我。谢谢。
So if someone knows according to which rule it is maybe a valid rude in ECMAScript 262
or perhaps if I am wrong anywhere please tell me. Thanks.
经过
gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04)
clang version 3.8.0-2ubuntu4
perl 代码:
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7
d 代码:
uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){
const bool flag_has_g = flags.indexOf( "g" ) != -1;
Regex!( char ) rx = regex( user_pattern, flags );
uint counter = 0;
foreach( mr; matchAll( user_string, rx ) ){
++counter;
}
if( flag_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
输出:
writeln( count_match( s, "[A-Z]+", "g" ) ); // 3
writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7
js 代码:
var s = "ONE TWO THREE four five six seven";
var rx1 = new RegExp( "[A-Z]+" , "g" );
var rx2 = new RegExp( "[A-Z]+" , "gi" );
var counter = 0;
while( rx1.exec( s ) ){
++counter;
}
document.write( counter + "<br>" ); // 3
counter = 0;
while( rx2.exec( s ) ){
++counter;
}
document.write( counter ); // 7
gcc 7.1.0 测试之后,结果发现,低于 6.3.0
的版本输出为: 1 3 3
,但使用 7.1.0
时,输出为 1 3 7
此处是链接。
Okay. After testing with gcc 7.1.0
it turned out that with version below 6.3.0
the output is: 1 3 3
and but with 7.1.0
the output is 1 3 7
here is the link.
此版本的 clang
的输出也是正确的。 此处是链接。感谢 igor-tandetnik 用户
Also with this version of clang
the output is correct. Here is the link. thanks to igor-tandetnik user
推荐答案
首先,因为您可以看到,对search_flag的行已注释,因为它被std :: regex_search忽略,我不知道为什么?因为-std :: regex_repalce接受了确切的标志。
First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce.
有问题的标志为 format_first_only
。此标志仅对替换操作有意义。在 regex_replace
中,默认值为全部替换,但如果传递此标志,它将变为仅替换第一。
The flag in question is format_first_only
. This flag makes sense only for a "replace" operation. In regex_replace
, the default is "replace all" but if you pass this flag it becomes "replace first only."
在 regex_match
和 regex_search
中,根本没有替代品。这两个函数都只找到第一个匹配项(对于 regex_match
而言,匹配项必须占用了整个字符串)。由于该标志在这种情况下是没有意义的,因此我希望实现忽略它。但是我也不会因为抛出异常而错失实现。
In regex_match
and regex_search
, there is no replacement going on at all; both of those functions just find the first match (and in the case of regex_match
, that match must consume the entire string). Since the flag is meaningless in that case, I would expect the implementation to ignore it; but I wouldn't fault the implementation for throwing an exception, either, if it chose to be noisy about it.
主要问题在于,当模式为字符类-> []时,icase标志也被忽略。实际上,当模式仅是大写字母或小写字母时:[AZ]或[az]
The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [A-Z] or [a-z]
icase
字符类工作异常肯定是您的供应商库中的错误。
icase
working wrong for character classes is definitely a bug in your vendor's library.
- 看起来 libstdc ++的错误已在GCC 6.3(2016年12月)和GCC 7.1(2017年5月)之间修复。
- 看起来像 libc ++的错误已在Clang 3.2(2012年12月)和Clang 3.3(2013年6月)之间修复。 )。
- Looks like libstdc++'s bug was fixed between GCC 6.3 (Dec 2016) and GCC 7.1 (May 2017).
- Looks like libc++'s bug was fixed between Clang 3.2 (Dec 2012) and Clang 3.3 (Jun 2013).
这篇关于std :: regex和忽略标志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!