C ++ 11正则表达式多次匹配捕获组 [英] C++11 regex matching capturing group multiple times

查看:86
本文介绍了C ++ 11正则表达式多次匹配捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以帮助我使用C ++ 11中的JavaScript(ECMAScript)正则表达式来提取:和^符号之间的文本.我不需要捕获 hw-descriptor 本身-但是它必须存在于行中才能将其余的行考虑为匹配项.另外:p .... ^ :m .... ^ :u .... ^ 订单,并且必须至少有1个礼物.

Could someone please help me to extract the text between the : and the ^ symbols using a JavaScript (ECMAScript) regular expression in C++11. I do not need to capture the hw-descriptor itself - but it does have to be present in the line in order for the rest of the line to be considered for a match. Also the :p....^, :m....^ and :u....^ can arrive in any order and there has to be at least 1 present.

我尝试使用以下正则表达式:

I tried using the following regular expression:

static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);

针对以下文本行:

"hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^"

这是发布在 活大肠杆菌(ruco) 上的代码 .它显示了我是如何尝试解决此问题的,但是我只得到了1场比赛.我需要了解如何提取与前面描述的p m或u字符相对应的潜在3个匹配项.

Here is the code which posted on a live coliru. It shows how I attempted to solve this problem, however I am only getting 1 match. I need to see how to extract each of the potential 3 matches corresponding to the p m or u characters described earlier.

#include <iostream>
#include <string>
#include <vector>
#include <regex>

int main()
{
    static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
    std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
    // I seem to only get 1 match here, I was expecting 
    // to loop through each of the matches, looks like I need something like 
    // a pcre global option but I don't know how.
    std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(), 
        [&](const auto& rMatch) {
            for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
                std::cout << rMatch[i] << std::endl;
            }
        });
}

上面的程序给出以下输出:

The above program gives the following output:

g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
:uTEXT3^
TEXT3

推荐答案

使用 std :: regex ,当将具有连续重复模式的特定字符串匹配时,您将无法保持多重重复捕获.

With std::regex, you cannot keep mutliple repeated captures when matching a certain string with consecutive repeated patterns.

您可能要做的是匹配包含前缀和重复块的整体文本,将后者捕获到一个单独的组中,然后使用第二个较小的正则表达式来分别捕获您想要的所有子字符串.

What you may do is to match the overall texts containing the prefix and the repeated chunks, capture the latter into a separate group, and then use a second smaller regex to grab all the occurrences of the substrings you want separately.

这里的第一个正则表达式可能是

The first regex here may be

hw-descriptor((?::[pmu][^^]*\\^)+)

请参见在线演示.它将匹配 hw-descriptor ((?:: pmu [^^] * \\ ^)+)会将第一个或多个重复项捕获到第1组中:pmu [^^] * \ ^ 模式:: p / m /u ,除了 ^ 以外的0个或更多字符,然后为 ^ .找到匹配项后,使用:p [[^^] * \ ^ 正则表达式返回所有实际的匹配项".

See the online demo. It will match hw-descriptor and ((?::[pmu][^^]*\\^)+) will capture into Group 1 one or more repetitions of :[pmu][^^]*\^ pattern: :, p/m/u, 0 or more chars other than ^ and then ^. Upon finding a match, use :[pmu][^^]*\^ regex to return all the real "matches".

C ++演示:

static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
                         i != std::sregex_iterator();
                         ++i)
{
    std::smatch m = *i;
    std::cout << "Match value: " << m.str() << std::endl;
    std::string x = m.str(1);
    for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
                         j != std::sregex_iterator();
                         ++j)
    {
        std::cout << "Element value: " << (*j).str() << std::endl;
    }
}

输出:

Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^

这篇关于C ++ 11正则表达式多次匹配捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆