对Java正则表达式中的Matcher组感到困惑 [英] Confused about Matcher group in Java regex

查看:178
本文介绍了对Java正则表达式中的Matcher组感到困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一行,

  typeName =ABC:xxxxx;; 

我需要提取 ABC 这个词,



我写了以下代码片段,

  Pattern pattern4 = Pattern .compile( (*):); 
matcher = pattern4.matcher(typeName);

String nameStr =;
if(matcher.find())
{
nameStr = matcher.group(1);

}

所以如果我把组( 0)我得到 ABC:但是如果我把 group(1)它是 ABC ,所以我想知道


  1. 这是什么 0 1 是什么意思?如果有人能用很好的例子来解释我会更好。


  2. 正则表达式模式中包含,为什么 group( 1)结果省略了吗?第1组是否检测到括号内的所有单词?


  3. 所以,如果我再添加两个括号,例如 \\\\ *(\d *)(。* ):那么,会有两组吗? group(1)将返回(\d *)部分和组(2) )返回(。*)部分?


代码片段的目的是为了清除我的困惑。这不是我正在处理的代码。上面给出的代码可以用 String.split()以更简单的方式完成。

解决方案

捕获和分组



捕获组 (模式)创建具有捕获属性的



您可能经常看到(并使用)的一个相关的是(?: pattern),这会创建一个没有捕获属性的,因此命名为非捕获组



组通常在需要重复一系列模式时使用,例如(\.\w +)+ ,或指定替换应在何处生效,例如 ^(0 * 1 | 1 * 0)$ ^ ,然后 0 * 1 1 * 0 ,然后 $ )与 ^ 0 * 1 | 1 * 0 $ ^ 0 * 1 1 * 0 $ )。



除了分组外,捕获组还将记录捕获组内模式匹配的文本(模式)。使用您的示例(。*):。* 匹配 ABC 匹配,以及。* 在捕获组(。*)内,记录捕获组1的文本 ABC



组号



整个模式已定义为组号0。



模式中的任何捕获组都从1开始索引。索引由捕获组的左括号的顺序定义。例如,以下模式中的所有 5个捕获组:

 (group)( ?:非捕获组)(g(?:ro | u)p(内嵌(嵌套))(另一组)(?=断言)
| | | | | | || | |
1 ----- 1 | | 4 ------ 4 | 5 ------- 5 |
| 3 --------------- 3 |
2 ----------------------------------------- 2

组号用于反向引用 \ n in pattern和 $ n 替换字符串。



其他正则表达式(PCRE,Perl) ,它们也可用于子程序调用



您可以使用 Matcher.group(int group) 。可以使用上述规则识别组号。



在一些正则表达式(PCRE,Perl)中,有一个分支重置功能允许您使用相同数字在不同的交替分支中捕获组



组名



从Java 7中,您可以定义名称捕获组 (?< name> pattern),并且您可以访问与 Matcher.group(字符串名称) 。正则表达式更长,但代码更有意义,因为它表示您要使用正则表达式匹配或提取的内容。



组名称用于后面 - 在模式中引用 \k< name> ,在替换字符串中引用 $ {name}



命名捕获组仍然使用相同的编号方案编号,因此也可以通过 Matcher.group(int group)来访问它们。 / p>

在内部,Java的实现只是从名称映射到组号。因此,您不能对2个不同的捕获组使用相同的名称。


I have the following line,

typeName="ABC:xxxxx;";

I need to fetch the word ABC,

I wrote the following code snippet,

Pattern pattern4=Pattern.compile("(.*):");
matcher=pattern4.matcher(typeName);

String nameStr="";
if(matcher.find())
{
    nameStr=matcher.group(1);

}

So if I put group(0) I get ABC: but if I put group(1) it is ABC, so I want to know

  1. What does this 0 and 1 mean? It will be better if anyone can explain me with good examples.

  2. The regex pattern contains a : in it, so why group(1) result omits that? Does group 1 detects all the words inside the parenthesis?

  3. So, if I put two more parenthesis such as, \\s*(\d*)(.*): then, will be there two groups? group(1) will return the (\d*) part and group(2) return the (.*) part?

The code snippet was given in a purpose to clear my confusions. It is not the code I am dealing with. The code given above can be done with String.split() in a much easier way.

解决方案

Capturing and grouping

Capturing group (pattern) creates a group that has capturing property.

A related one that you might often see (and use) is (?:pattern), which creates a group without capturing property, hence named non-capturing group.

A group is usually used when you need to repeat a sequence of patterns, e.g. (\.\w+)+, or to specify where alternation should take effect, e.g. ^(0*1|1*0)$ (^, then 0*1 or 1*0, then $) versus ^0*1|1*0$ (^0*1 or 1*0$).

A capturing group, apart from grouping, will also record the text matched by the pattern inside the capturing group (pattern). Using your example, (.*):, .* matches ABC and : matches :, and since .* is inside capturing group (.*), the text ABC is recorded for the capturing group 1.

Group number

The whole pattern is defined to be group number 0.

Any capturing group in the pattern start indexing from 1. The indices are defined by the order of the opening parentheses of the capturing groups. As an example, here are all 5 capturing groups in the below pattern:

(group)(?:non-capturing-group)(g(?:ro|u)p( (nested)inside)(another)group)(?=assertion)
|     |                       |          | |      |      ||       |     |
1-----1                       |          | 4------4      |5-------5     |
                              |          3---------------3              |
                              2-----------------------------------------2

The group numbers are used in back-reference \n in pattern and $n in replacement string.

In other regex flavors (PCRE, Perl), they can also be used in sub-routine calls.

You can access the text matched by certain group with Matcher.group(int group). The group numbers can be identified with the rule stated above.

In some regex flavors (PCRE, Perl), there is a branch reset feature which allows you to use the same number for capturing groups in different branches of alternation.

Group name

From Java 7, you can define a named capturing group (?<name>pattern), and you can access the content matched with Matcher.group(String name). The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.

The group names are used in back-reference \k<name> in pattern and ${name} in replacement string.

Named capturing groups are still numbered with the same numbering scheme, so they can also be accessed via Matcher.group(int group).

Internally, Java's implementation just maps from the name to the group number. Therefore, you cannot use the same name for 2 different capturing groups.

这篇关于对Java正则表达式中的Matcher组感到困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆