什么是正则表达式中的非捕获组? [英] What is a non-capturing group in regular expressions?

查看:23
本文介绍了什么是正则表达式中的非捕获组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

非捕获组,即 (?:),如何在正则表达式中使用,它们有什么用?

How are non-capturing groups, i.e. (?:), used in regular expressions and what are they good for?

推荐答案

让我试着用一个例子来解释这个.

Let me try to explain this with an example.

考虑以下文本:

http://stackoverflow.com/
https://stackoverflow.com/questions/tagged/regex

现在,如果我将下面的正则表达式应用于它......

Now, if I apply the regex below over it...

(https?|ftp)://([^/
]+)(/[^
]*)?

...我会得到以下结果:

... I would get the following result:

Match "http://stackoverflow.com/"
     Group 1: "http"
     Group 2: "stackoverflow.com"
     Group 3: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "https"
     Group 2: "stackoverflow.com"
     Group 3: "/questions/tagged/regex"

但我不关心协议——我只想要 URL 的主机和路径.因此,我将正则表达式更改为包含非捕获组 (?:).

But I don't care about the protocol -- I just want the host and path of the URL. So, I change the regex to include the non-capturing group (?:).

(?:https?|ftp)://([^/
]+)(/[^
]*)?

现在,我的结果是这样的:

Now, my result looks like this:

Match "http://stackoverflow.com/"
     Group 1: "stackoverflow.com"
     Group 2: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "stackoverflow.com"
     Group 2: "/questions/tagged/regex"

看到了吗?第一组没有被捕获.解析器使用它来匹配文本,但稍后会在最终结果中忽略它.

See? The first group has not been captured. The parser uses it to match the text, but ignores it later, in the final result.

按照要求,让我也试着解释一下组.

As requested, let me try to explain groups too.

好吧,群组有很多用途.它们可以帮助您从更大的匹配(也可以命名)中提取准确信息,它们让您重新匹配之前的匹配组,并可用于替换.让我们试试一些例子,好吗?

Well, groups serve many purposes. They can help you to extract exact information from a bigger match (which can also be named), they let you rematch a previous matched group, and can be used for substitutions. Let's try some examples, shall we?

假设您有某种 XML 或 HTML(请注意 regex 可能不是这项工作的最佳工具,但作为一个例子很好).你想解析标签,所以你可以做这样的事情(我添加了空格以使其更容易理解):

Imagine you have some kind of XML or HTML (be aware that regex may not be the best tool for the job, but it is nice as an example). You want to parse the tags, so you could do something like this (I have added spaces to make it easier to understand):

   <(?<TAG>.+?)> [^<]*? </k<TAG>>
or
   <(.+?)> [^<]*? </1>

第一个正则表达式有一个命名组(TAG),而第二个使用一个公共组.两个正则表达式做同样的事情:它们使用第一组中的值(标签的名称)来匹配结束标签.区别在于第一个使用名称匹配值,第二个使用组索引(从 1 开始).

The first regex has a named group (TAG), while the second one uses a common group. Both regexes do the same thing: they use the value from the first group (the name of the tag) to match the closing tag. The difference is that the first one uses the name to match the value, and the second one uses the group index (which starts at 1).

现在让我们尝试一些替换.考虑以下文本:

Let's try some substitutions now. Consider the following text:

Lorem ipsum dolor sit amet consectetuer feugiat fames malesuada pretium egestas.

现在,让我们在上面使用这个愚蠢的正则表达式:

Now, let's use this dumb regex over it:

(S)(S)(S)(S*)

此正则表达式匹配至少 3 个字符的单词,并使用组分隔前三个字母.结果是这样的:

This regex matches words with at least 3 characters, and uses groups to separate the first three letters. The result is this:

Match "Lorem"
     Group 1: "L"
     Group 2: "o"
     Group 3: "r"
     Group 4: "em"
Match "ipsum"
     Group 1: "i"
     Group 2: "p"
     Group 3: "s"
     Group 4: "um"
...

Match "consectetuer"
     Group 1: "c"
     Group 2: "o"
     Group 3: "n"
     Group 4: "sectetuer"
...

所以,如果我们应用替换字符串:

So, if we apply the substitution string:

$1_$3$2_$4

... 在它上面,我们尝试使用第一组,添加下划线,使用第三组,然后是第二组,添加另一个下划线,然后是第四组.生成的字符串将如下所示.

... over it, we are trying to use the first group, add an underscore, use the third group, then the second group, add another underscore, and then the fourth group. The resulting string would be like the one below.

L_ro_em i_sp_um d_lo_or s_ti_ a_em_t c_no_sectetuer f_ue_giat f_ma_es m_la_esuada p_er_tium e_eg_stas.

您也可以使用命名组进行替换,使用 ${name}.

You can use named groups for substitutions too, using ${name}.

要使用正则表达式,我推荐 http://regex101.com/,它提供了大量有关正则表达式如何工作的详细信息;它还提供了一些正则表达式引擎可供选择.

To play around with regexes, I recommend http://regex101.com/, which offers a good amount of details on how the regex works; it also offers a few regex engines to choose from.

这篇关于什么是正则表达式中的非捕获组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆