C ++中的正则表达式问题 [英] Issue with regular expressions in C++

查看:78
本文介绍了C ++中的正则表达式问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用以下正则表达式,该正则表达式已经在 C#中,在 C ++ 中工作,但是在 C ++ 中不工作.

I tried to use the following regular expression, which already works in C#, in C++ as well but it's not working in C++.

std::regex r = std::regex("([^%]*(%[.[0-9]*]?[a-z])*)*", std::regex::extended);

它设法匹配多个字符串并正确拒绝其他字符串,但被卡在字符串%d小于可用的pbn%f%d"上(确实被卡住-没有错误),应该将其拒绝(因为存在%并没有紧随法律后缀).

it manages to match several strings and reject others correctly, but gets stuck (really stuck - no error) on the string "%d smaller than available pbn % f %d", which it should reject (since there is a % that's not immediately preceded by a legal suffix).

使用 std :: regex r = std :: regex((([[^%] *)(%(\\.([0-9] *))?[az])*)*); 表现出与我之前描述的行为完全相同的行为.(我假设这两个正则表达式是等效的-只是一个正则表达式(如C#使用),第二个是ECMAScript(如c ++默认值)

Using std::regex r = std::regex("(([^%]*)(%(\\.([0-9]*))?[a-z])*)*"); exhibits exactly the same behavior i described before. (i assume those two regexes are equivalent - just one is in canonical form like C# uses and the second is ECMAScript like the c++ default)

我不确定是什么问题.我也想将整个字符串匹配到该模式,以便仅在整个字符串整体匹配时才匹配.所以我想为此目的使用 regex_match .我在C ++中使用以下代码:

I am not sure what's the problem. Also i want to match the entire string to that pattern so it matches only if the entire string matches as a whole. So i want to use regex_match for that purpose. I use the following code in C++:

if (std::regex_match(str, r))

此外,在C#中,我使用以下代码执行该检查(整个字符串整体匹配):

Also, in C# i use the following code to perform that check (that the entire string matches as a whole):

        Regex^ r = gcnew Regex("([^%]*(%[.[0-9]*]?[a-z])*)*", RegexOptions::IgnoreCase);
        Match^ m = r->Match(str);
        if (m->Success && m->Groups[0]->Length== str->Length)

仅举一个我想要正则表达式匹配的示例:

Just to give an example of what i want the regular expression to match is:

状态为%s (%d )的事件%s (%d ),移至状态%s (%d )...

Got event %s (%d) in state %s (%d), moving to state %s (%d) ...

一些%.34x 事件

并且正则表达式应该与以下内容不匹配:

And the regular expression is supposed not to match the following:

一些东西.

用语言解释正则表达式应做的事情-它应接受唯一的字符串,其中所有出现的%(如果有的话)立即紧跟一个字母或一个.46456x(也就是一些数字和一个字母))并拒绝所有其他人.

To explain in words what the regex should do - it should accept the only string that all the occurrences (if any) of % in them is immediately preceded by a letter or by a .46456x (aka . some numbers and a letter) and reject all others.

更新:有效的正则表达式为 ^([^%] |%((\\.)?[0-9] +)?[a-zA-Z])* $ .问题在于,与C#正则表达式不同,它确实很慢,并且会拖慢ALOTTT的应用程序.所以我在想也许最好使用std :: regex_search来查找是否出现%的情况,而不是立即跟着后者,或者是.NUMBERS,然后是字母,或者是数字,然后是字母.将不胜感激正则表达式的帮助.

UPDATE: The regex that works is ^([^%]|%((\\.)?[0-9]+)?[a-zA-Z])*$ . The problem is that unlike the C# regex this one is really slow and slows down the application by ALOTTT. So i was thinking maybe it's better to maybe use std::regex_search in order to find if there is an occurrence of % that's not immediately followed either by a latter or by a .NUMBERS and then a letter or by NUMBERS and then a letter. Will appreciate help with a regex that does that.

更新2:

我正在使用正则表达式 ^.*%(?!([.]?[0-9] +)?[a-zA-Z]).* $ 起作用,并且我将其与std :: regex_search结合使用.它比以前的解决方案要快得多,但比C#版本要慢得多(43秒,而在C#中不到6秒).有没有办法进一步优化它?

I am using the regex ^.*%(?!([.]?[0-9]+)?[a-zA-Z]).*$ which works, and i use it with std::regex_search. It's much faster then the previous solution but still much slower then the C# version (43 seconds vs less then 6 seconds in C#). Is there a way to optimize it even farther?

推荐答案

在这里,字符串中所有都必须兼容.
如果是,则匹配整个字符串,如果不匹配,则不匹配
字符串.

Here you go, all % in string must be compliant.
If so, match the entire string, if not, don't match
the string.

我建议您使用 if(regex_search(sTarget,sRx,sMatch,flags))
但是 regex_match()会做同样的事情.

I suggest you do this with i.e. if ( regex_search( sTarget, sRx, sMatch, flags ) )
but regex_match() would do the same thing.

^(?:[^%] *%(?:\.[0-9] *)?[az])+ [^%] * $

扩展

 ^                             # BOS
 (?:                           # Cluster begin
      [^%]*                         # Not % characters
      %                             # % found
      (?: \. [0-9]* )?              # optional .###
      [a-z]                         # single a-z required
 )+                            # Cluster end, 1 to many times
 [^%]*                         # Not % characters
 $                             # EOS

这篇关于C ++中的正则表达式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆