Java Regex-用一美元捕获字符串,但是当它有两个连续的字符串时不捕获 [英] Java Regex - capture string with single dollar, but not when it has two successive ones

查看:110
本文介绍了Java Regex-用一美元捕获字符串,但是当它有两个连续的字符串时不捕获的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我早些时候发布了这个问题./p>

但是那还没结束. 在那里适用的所有规则仍然适用.

所以字符串:

  • "%ABC%"结果将产生ABC(在百分号之间捕获内容)
  • "$ABC."一样(捕获$之后的内容,当出现另一个美元或点时放弃)
  • "$ABC$XYZ"也会,并且也给出XYZ.

为此添加更多内容:

  • "${ABC}"也应产生ABC. (忽略大括号(如果存在的话-可能不捕获字符?).
  • 如果您连续有两个美元符号,例如"$$EFG""$${EFG}"
    应该出现在正则表达式结果中. (这是编号或命名反向引用开始起作用的地方-也是我将它们视为非捕获组的原因).据我了解,使用此语法(?:)的组成为非捕获组.

1)我可以说%或$是一个非捕获组,并按数字进行引用吗?还是只有捕获组才能获得分配的编号?

2)如果有((A) (B) (C)),编号的顺序是什么.外层是1,A 2,B 3 C 4吗?

我一直在研究命名组.在此处

看到了语法

(?<name>capturing text)定义命名组名称"

\k<name>反向引用命名组名称"

3)不知道是否可以用Java命名非捕获组吗?有人可以阐明吗?

  • 更多信息此处位于非捕获组.
  • 有关更多信息,请参见幕后花絮
  • 此处对问题的相似答案,但并没有完全了解我想要的东西.不确定Java中是否存在反向引用问题.
  • 类似的问题此处.但是我无法理解适用于此版本的工作版本.

我使用了与原始问题完全相同的Java,除了:

String search = "/bla/$V_N.$$XYZ.bla";
String pattern = "(?:(?<oc>[%$]))(?!(\\k<oc>))([^%.$]*)+";

这只会导致 V_N .

我真的很努力地解决这个问题,想知道是否有人可以帮助我解决这个问题.谢谢.

解决方案

您可以编写带有多个捕获组的更详细的正则表达式,仅捕获不属于null的捕获组,或者从那里简单地连接找到的组值在每次比赛中,始终只有其中一个会被初始化:

%([^%.]+)%|(?<!\$)\$(?:\{([^{}]+)\}|([^$.]+))

请参见 regex演示.

详细信息

  • %([^%.]+)%-%,组1:除了%.之外的一个或多个字符,然后消耗了%
  • |-或
  • (?<!\$)-与字符串中没有紧跟在$
  • 之后的位置匹配的否定性
  • \$-一个$
  • (?:-与以下任意一个匹配的非捕获容器组的开始:
    • \{([^{}]+)\}-{,第2组:除{}之外的任何一个或多个字符,则消耗了}
    • |-或
    • ([^$.]+)-第3组:$.
    • 以外的1个或更多字符
  • )-非捕获容器组的末尾.

Java用法:

String regex = "%([^%.]+)%|(?<!\\$)\\$(?:\\{([^\\{}]+)\\}|([^$.\\s]+))";
String string = "%ABC%\n$ABC.\n$ABC$XYZ  ${ABC}\n\n$$EFG $${EFG}.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (m.find()) {
    results.add(Objects.toString(m.group(1),"") + 
        Objects.toString(m.group(2),"") + 
        Objects.toString(m.group(3),""));
}
System.out.println(results); // => [ABC, ABC, ABC, XYZ, ABC]

请注意,在常规Java字符串文字中,应转义\(即\\)以引入单个 literal 反斜杠,该反斜杠用作 regex转义的一部分>.

I posted this question earlier.

But that wasn't quite the end of it. All the rules that applied there still apply.

So the strings:

  • "%ABC%" would yield ABC as a result (capture stuff between percent signs)
  • as would "$ABC." (capture stuff after $, giving up when another dollar or dot appears)
  • "$ABC$XYZ" would too, and also give XYZ as a result.

To add a bit more to this:

  • "${ABC}" should yield ABC too. (ignore curly braces if present - non capture chars perhaps?).
  • if you have two successive dollar signs, such as "$$EFG", or "$${EFG}",
    that should not appear in a regex result. (This is where either numbered or named back- references come into play - and the reason I contemplated them as non-capture groups). As I understand it, a group becomes a non-capture group with this syntax (?:).

1) Can I say the % or $ is a non-capture group and reference that by number? Or do only capture groups get allocated numbers?

2) What is the order of the numbering, if you have ((A) (B) (C)). Is the outer group 1, A 2, B 3 C 4?

I have been look at named groups. Saw the syntax mentioned here

(?<name>capturing text) to define a named group "name"

\k<name> to backreference a named group "name"

3) Not sure if a non-capture group can be named in Java? Can someone elucidate?

  • More info here on non capture groups.
  • More info here on lookbehinds
  • Similar answer to a question here, but didn't quite get me what I wanted. Not sure if there is a back-reference issue in Java.
  • Similar question here. But could not get my head around the working version to apply to this.

I have used the exact same Java I had in my original question, except for:

String search = "/bla/$V_N.$$XYZ.bla";
String pattern = "(?:(?<oc>[%$]))(?!(\\k<oc>))([^%.$]*)+";

This should only result in V_N.

I am really struggling with this one, and wondered if someone can help me work out how to solve this. Thanks.

解决方案

You may write a little bit more verbose regex with multiple capturing groups and only grab those that are not null, or plainly concatenate the found group values since there will be always only one of them initialized upon each match:

%([^%.]+)%|(?<!\$)\$(?:\{([^{}]+)\}|([^$.]+))

See the regex demo.

Details

  • %([^%.]+)% - %, Group 1: one or more chars other than % and ., then a % is consumed
  • | - or
  • (?<!\$) - a negative lookbehind that matches a location in string that is not immediately preceded with $
  • \$ - a $
  • (?: - start of the non-capturing container group matching either of:
    • \{([^{}]+)\} - {, Group 2: any one or more chars other than { and }, then } is consumed
    • | - or
    • ([^$.]+) - Group 3: 1 or more chars other than $ and .
  • ) - end of the non-capturing container group.

Java usage:

String regex = "%([^%.]+)%|(?<!\\$)\\$(?:\\{([^\\{}]+)\\}|([^$.\\s]+))";
String string = "%ABC%\n$ABC.\n$ABC$XYZ  ${ABC}\n\n$$EFG $${EFG}.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (m.find()) {
    results.add(Objects.toString(m.group(1),"") + 
        Objects.toString(m.group(2),"") + 
        Objects.toString(m.group(3),""));
}
System.out.println(results); // => [ABC, ABC, ABC, XYZ, ABC]

Mind that in regular Java string literals, \ should be escaped (i.e. \\) to introduce a single literal backslash that is used as part of regex escapes.

这篇关于Java Regex-用一美元捕获字符串,但是当它有两个连续的字符串时不捕获的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆