Java正则表达式捕获组索引 [英] Java regex capturing groups indexes

查看:219
本文介绍了Java正则表达式捕获组索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一行

typeName="ABC:xxxxx;";

我需要提取单词ABC

我编写了以下代码段,

Pattern pattern4=Pattern.compile("(.*):");
matcher=pattern4.matcher(typeName);

String nameStr="";
if(matcher.find())
{
    nameStr=matcher.group(1);

}

所以,如果我放group(0),我会得到ABC:,但是如果我放group(1),它是ABC,所以我想知道

So if I put group(0) I get ABC: but if I put group(1) it is ABC, so I want to know

  1. 01是什么意思?如果有人可以用很好的例子向我解释,那会更好.

  1. What does this 0 and 1 mean? It will be better if anyone can explain me with good examples.

正则表达式模式中包含一个:,那么为什么group(1)结果忽略了该内容?组1是否检测到括号内的所有单词?

The regex pattern contains a : in it, so why group(1) result omits that? Does group 1 detects all the words inside the parenthesis?

那么,如果我再加上两个括号,例如\\s*(\d*)(.*):那么,会有两个分组吗? group(1)将返回(\d*)部分,而group(2)将返回(.*)部分?

So, if I put two more parenthesis such as, \\s*(\d*)(.*): then, will be there two groups? group(1) will return the (\d*) part and group(2) return the (.*) part?

给出该代码段的目的是为了消除我的困惑.这不是我要处理的代码.上面给出的代码可以使用String.split()来轻松完成.

The code snippet was given in a purpose to clear my confusions. It is not the code I am dealing with. The code given above can be done with String.split() in a much easier way.

推荐答案

捕获和分组

捕获组 (pattern)创建一个具有捕获属性的 group .

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

您可能经常看到(和使用)的一个相关的东西是(?:pattern),它创建一个没有 capturing 属性的 group ,因此命名为 non-capturing组.

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

通常在需要重复一系列图案时使用一个组,例如(\.\w+)+,或指定交替应在何处生效,例如^(0*1|1*0)$(^,然后是0*11*0,然后是$)与^0*1|1*0$(^0*11*0$).

A group is usually used when you need to repeat a sequence of patterns, e.g. (\.\w+)+, or to specify where alternation should take effect, e.g. ^(0*1|1*0)$ (^, then 0*1 or 1*0, then $) versus ^0*1|1*0$ (^0*1 or 1*0$).

除了分组以外,捕获组还将记录与捕获组(pattern)中的模式匹配的文本.在您的示例中,(.*):.*匹配ABC:匹配:,并且由于.*在捕获组(.*)内部,因此为捕获组1记录了文本ABC.

A capturing group, apart from grouping, will also record the text matched by the pattern inside the capturing group (pattern). Using your example, (.*):, .* matches ABC and : matches :, and since .* is inside capturing group (.*), the text ABC is recorded for the capturing group 1.

定义整个模式为组号0.

模式中的任何捕获组均从1开始索引.索引是按捕获组的开头括号的顺序定义的.举例来说,以下是所有 5个捕获组:

Any capturing group in the pattern start indexing from 1. The indices are defined by the order of the opening parentheses of the capturing groups. As an example, here are all 5 capturing groups in the below pattern:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

组号在模式的后向引用\n和替换字符串的$n中使用.

The group numbers are used in back-reference \n in pattern and $n in replacement string.

在其他正则表达式版本(PCRE,Perl)中,它们也可以用于子例程调用.

您可以使用

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

在某些正则表达式版本(PCRE,Perl)中,有一个分支重置功能,可让您将相同的数字用于捕获组在交替的不同分支中.

In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.

从Java 7开始,您可以定义 命名为捕获组 (?<name>pattern),您可以访问与

From Java 7, you can define a named capturing group (?<name>pattern), and you can access the content matched with Matcher.group(String name). The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.

组名在模式的后向引用\k<name>和替换字符串中的${name}中使用.

The group names are used in back-reference \k<name> in pattern and ${name} in replacement string.

命名的捕获组仍然使用相同的编号方式进行编号,因此也可以通过Matcher.group(int group)对其进行访问.

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

在内部,Java的实现只是从名称映射到组号.因此,不能将相同的名称用于2个不同的捕获组.

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.

这篇关于Java正则表达式捕获组索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆