Java regex 捕获组索引 [英] Java regex capturing groups indexes

查看:28
本文介绍了Java regex 捕获组索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一行,

typeName="ABC:xxxxx;";

我需要取词ABC

我写了以下代码片段,

Pattern pattern4=Pattern.compile("(.*):");
matcher=pattern4.matcher(typeName);

String nameStr="";
if(matcher.find())
{
    nameStr=matcher.group(1);

}

所以如果我输入 group(0) 我得到 ABC: 但如果我输入 group(1) 它是 ABC,所以我想知道

So if I put group(0) I get ABC: but if I put group(1) it is ABC, so I want to know

  1. 这个 01 是什么意思?如果有人能用很好的例子来解释我会更好.

  1. What does this 0 and 1 mean? It will be better if anyone can explain me with good examples.

正则表达式中包含一个 : ,那么为什么 group(1) 结果会忽略它呢?第 1 组是否检测到括号内的所有单词?

The regex pattern contains a : in it, so why group(1) result omits that? Does group 1 detects all the words inside the parenthesis?

那么,如果我再添加两个括号,例如 \s*(d*)(.*): 那么,会不会有两个组?group(1) 将返回 (d*) 部分和 group(2) 返回 (.*)部分?

So, if I put two more parenthesis such as, \s*(d*)(.*): then, will be there two groups? group(1) will return the (d*) part and group(2) return the (.*) part?

给出代码片段是为了消除我的困惑.这不是我正在处理的代码.上面给出的代码可以用 String.split() 以更简单的方式完成.

The code snippet was given in a purpose to clear my confusions. It is not the code I am dealing with. The code given above can be done with String.split() in a much easier way.

推荐答案

捕获和分组

Capturing group (pattern) 创建一个具有 capturing 属性的 group.

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

您可能经常看到(和使用)的一个相关代码是 (?:pattern),它创建一个 group 而不捕获 属性,因此命名为非捕获组.

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

当您需要重复一系列模式时,通常使用组,例如(.w+)+,或者指定交替生效的位置,例如^(0*1|1*0)$ (^, 然后 0*11*0,然后 $) 与 ^0*1|1*0$ (^0*11*0$).

A group is usually used when you need to repeat a sequence of patterns, e.g. (.w+)+, or to specify where alternation should take effect, e.g. ^(0*1|1*0)$ (^, then 0*1 or 1*0, then $) versus ^0*1|1*0$ (^0*1 or 1*0$).

一个捕获组,除了分组之外,还会记录捕获组内的模式匹配的文本(pattern).使用您的示例, (.*):, .* 匹配 ABC: 匹配 :,并且由于.*在捕获组(.*)内,文本ABC被记录为捕获组1.

A capturing group, apart from grouping, will also record the text matched by the pattern inside the capturing group (pattern). Using your example, (.*):, .* matches ABC and : matches :, and since .* is inside capturing group (.*), the text ABC is recorded for the capturing group 1.

整个模式定义为组号0.

模式中的任何捕获组从 1 开始索引.索引由捕获组的左括号的顺序定义.例如,以下是全部 5 个捕获组,如下所示:

Any capturing group in the pattern start indexing from 1. The indices are defined by the order of the opening parentheses of the capturing groups. As an example, here are all 5 capturing groups in the below pattern:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

组号用于模式中的反向引用 和替换字符串中的$n.

The group numbers are used in back-reference in pattern and $n in replacement string.

在其他正则表达式(PCRE、Perl)中,它们也可以用于子例程调用.

您可以使用Matcher.group(int group).组号可以通过上述规则来识别.

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

在某些正则表达式(PCRE、Perl)中,有一个branch reset 功能,允许您使用相同的数字捕获组在不同的交替分支.

In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.

从 Java 7 开始,您可以定义一个 命名捕获组 (?pattern),可以访问与Matcher.group(String name).正则表达式更长,但代码更有意义,因为它表明您正试图用正则表达式匹配或提取什么.

From Java 7, you can define a named capturing group (?<name>pattern), and you can access the content matched with Matcher.group(String name). The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.

组名用于模式中的反向引用 k 和替换字符串中的 ${name}.

The group names are used in back-reference k<name> in pattern and ${name} in replacement string.

命名的捕获组仍然使用相同的编号方案编号,因此它们也可以通过Matcher.group(int group)访问.

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

在内部,Java 的实现只是从名称映射到组号.因此,您不能对 2 个不同的捕获组使用相同的名称.

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.

这篇关于Java regex 捕获组索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆