如何仅获取给定的捕获组< regex> C ++ [英] How to get only given captured group <regex> c++

查看:61
本文介绍了如何仅获取给定的捕获组< regex> C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提取标签的内部内容。从下面的字符串中:

I want to extract tag's inner content. From the following string:

<tag1 val=123>Hello</tag1>

我只想得到

Hello

我做什么:

string s = "<tag1 val=123>Hello</tag1>";
regex re("<tag1.*>(.*)</tag1>");
smatch matches;
bool b = regex_match(s, matches, re);

但它会返回两个匹配项:

But it returns two matches:

<tag1 val=123>Hello</tag1>
Hello

当我尝试仅获得第一个这样捕获的组时:

And when I try to get only 1st captured group like this:

"<tag1.*>(.*)</tag1>\1"

我得到零匹配。

请告知。

推荐答案

regex_match 仅返回单个匹配项,其中包含所有捕获组子匹配项(它们的数量取决于模式中有多少个组)。

The regex_match returns only a single match, with all the capturing group submatches (their number depends on how many groups there are in the pattern).

在这里,您仅获得包含两个子匹配项的1个匹配项:1)完全匹配项,2)捕获第1组值。

Here, you only get 1 match that contains two submatches: 1) whole match, 2) capture group 1 value.

要获取捕获组的内容,您需要访问 matches 对象的第二个元素 matches [1] .str() matches.str(1)

To obtain the contents of the capturing group, you need to access the smatches object second element, matches[1].str() or matches.str(1)

请注意,当您写< tag1。*>(。*)< / tag1&\1 \1 是不是解析为 backreference ,而是解析为八进制代码1的字符。即使您定义了 backreference (如< tag1。*> ;(。*)< / tag1> \\1 ),则需要在< / tag1>之后重复捕获组1捕获的整个文本。 ; -绝对不是您想要的。实际上,我怀疑此正则表达式是否有用,至少您需要将。* 替换为 [\\s\ \S] *? ,但是用正则表达式解析HTML仍然是一种脆弱的方法。

Note that when you write "<tag1.*>(.*)</tag1>\1", the \1 is not parsed as a backreference, but as a char with octal code 1. Even if you defined a backreference (as "<tag1.*>(.*)</tag1>\\1") you would require the whole text captured with the capturing group 1 to be repeated after </tag1> - that is definitely not what you want. Actually, I doubt this regex is any good, at least, you need to replace ".*" with "[\\s\\S]*?", but it is still a fragile approach to parse HTML with regex.

这篇关于如何仅获取给定的捕获组&lt; regex&gt; C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆