C ++中的正则表达式问题 [英] Issue with regular expressions in C++
问题描述
我尝试使用以下正则表达式,该正则表达式已经在 C#
中,在 C ++
中工作,但是在 C ++
中不工作.
I tried to use the following regular expression, which already works in C#
, in C++
as well but it's not working in C++
.
std::regex r = std::regex("([^%]*(%[.[0-9]*]?[a-z])*)*", std::regex::extended);
它设法匹配多个字符串并正确拒绝其他字符串,但被卡在字符串%d小于可用的pbn%f%d"上(确实被卡住-没有错误),应该将其拒绝(因为存在%并没有紧随法律后缀).
it manages to match several strings and reject others correctly, but gets stuck (really stuck - no error) on the string "%d smaller than available pbn % f %d", which it should reject (since there is a % that's not immediately preceded by a legal suffix).
使用 std :: regex r = std :: regex((([[^%] *)(%(\\.([0-9] *))?[az])*)*);
表现出与我之前描述的行为完全相同的行为.(我假设这两个正则表达式是等效的-只是一个正则表达式(如C#使用),第二个是ECMAScript(如c ++默认值)
Using std::regex r = std::regex("(([^%]*)(%(\\.([0-9]*))?[a-z])*)*");
exhibits exactly the same behavior i described before. (i assume those two regexes are equivalent - just one is in canonical form like C# uses and the second is ECMAScript like the c++ default)
我不确定是什么问题.我也想将整个字符串匹配到该模式,以便仅在整个字符串整体匹配时才匹配.所以我想为此目的使用 regex_match
.我在C ++中使用以下代码:
I am not sure what's the problem.
Also i want to match the entire string to that pattern so it matches only if the entire string matches as a whole. So i want to use regex_match
for that purpose. I use the following code in C++:
if (std::regex_match(str, r))
此外,在C#中,我使用以下代码执行该检查(整个字符串整体匹配):
Also, in C# i use the following code to perform that check (that the entire string matches as a whole):
Regex^ r = gcnew Regex("([^%]*(%[.[0-9]*]?[a-z])*)*", RegexOptions::IgnoreCase);
Match^ m = r->Match(str);
if (m->Success && m->Groups[0]->Length== str->Length)
仅举一个我想要正则表达式匹配的示例:
Just to give an example of what i want the regular expression to match is:
状态为%s
(%d
)的事件%s
(%d
),移至状态%s
(%d
)...
Got event %s
(%d
) in state %s
(%d
), moving to state %s
(%d
) ...
或
一些%.34x
事件
并且正则表达式应该与以下内容不匹配:
And the regular expression is supposed not to match the following:
一些%
东西.
用语言解释正则表达式应做的事情-它应接受唯一的字符串,其中所有出现的%(如果有的话)立即紧跟一个字母或一个.46456x(也就是一些数字和一个字母))并拒绝所有其他人.
To explain in words what the regex should do - it should accept the only string that all the occurrences (if any) of % in them is immediately preceded by a letter or by a .46456x (aka . some numbers and a letter) and reject all others.
更新:有效的正则表达式为 ^([^%] |%((\\.)?[0-9] +)?[a-zA-Z])* $
.问题在于,与C#正则表达式不同,它确实很慢,并且会拖慢ALOTTT的应用程序.所以我在想也许最好使用std :: regex_search来查找是否出现%的情况,而不是立即跟着后者,或者是.NUMBERS,然后是字母,或者是数字,然后是字母.将不胜感激正则表达式的帮助.
UPDATE:
The regex that works is ^([^%]|%((\\.)?[0-9]+)?[a-zA-Z])*$
. The problem is that unlike the C# regex this one is really slow and slows down the application by ALOTTT. So i was thinking maybe it's better to maybe use std::regex_search in order to find if there is an occurrence of % that's not immediately followed either by a latter or by a .NUMBERS and then a letter or by NUMBERS and then a letter. Will appreciate help with a regex that does that.
更新2:
我正在使用正则表达式 ^.*%(?!([.]?[0-9] +)?[a-zA-Z]).* $
起作用,并且我将其与std :: regex_search结合使用.它比以前的解决方案要快得多,但比C#版本要慢得多(43秒,而在C#中不到6秒).有没有办法进一步优化它?
I am using the regex ^.*%(?!([.]?[0-9]+)?[a-zA-Z]).*$
which works, and i use it with std::regex_search. It's much faster then the previous solution but still much slower then the C# version (43 seconds vs less then 6 seconds in C#). Is there a way to optimize it even farther?
推荐答案
在这里,字符串中所有%
都必须兼容.
如果是,则匹配整个字符串,如果不匹配,则不匹配
字符串.
Here you go, all %
in string must be compliant.
If so, match the entire string, if not, don't match
the string.
我建议您使用 if(regex_search(sTarget,sRx,sMatch,flags))
但是 regex_match()会做同样的事情.
I suggest you do this with i.e. if ( regex_search( sTarget, sRx, sMatch, flags ) )
but regex_match() would do the same thing.
^(?:[^%] *%(?:\.[0-9] *)?[az])+ [^%] * $
扩展
^ # BOS
(?: # Cluster begin
[^%]* # Not % characters
% # % found
(?: \. [0-9]* )? # optional .###
[a-z] # single a-z required
)+ # Cluster end, 1 to many times
[^%]* # Not % characters
$ # EOS
这篇关于C ++中的正则表达式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!