Java Regex-用一美元捕获字符串,但是当它有两个连续的字符串时不捕获 [英] Java Regex - capture string with single dollar, but not when it has two successive ones
问题描述
我早些时候发布了这个问题./p>
但是那还没结束. 在那里适用的所有规则仍然适用.
所以字符串:
-
"%ABC%"
结果将产生ABC(在百分号之间捕获内容) - 与
"$ABC."
一样(捕获$之后的内容,当出现另一个美元或点时放弃) -
"$ABC$XYZ"
也会,并且也给出XYZ.
为此添加更多内容:
-
"${ABC}"
也应产生ABC. (忽略大括号(如果存在的话-可能不捕获字符?). - 如果您连续有两个美元符号,例如
"$$EFG"
或"$${EFG}"
,
应该不出现在正则表达式结果中. (这是编号或命名反向引用开始起作用的地方-也是我将它们视为非捕获组的原因).据我了解,使用此语法(?:)
的组成为非捕获组.
1)我可以说%或$是一个非捕获组,并按数字进行引用吗?还是只有捕获组才能获得分配的编号?
2)如果有((A) (B) (C))
,编号的顺序是什么.外层是1,A 2,B 3 C 4吗?
我一直在研究命名组.在此处
看到了语法
(?<name>capturing text)
定义命名组名称"
\k<name>
反向引用命名组名称"
3)不知道是否可以用Java命名非捕获组吗?有人可以阐明吗?
- 更多信息此处位于非捕获组.
- 有关更多信息,请参见幕后花絮
- 在此处对问题的相似答案,但并没有完全了解我想要的东西.不确定Java中是否存在反向引用问题.
- 类似的问题此处.但是我无法理解适用于此版本的工作版本.
我使用了与原始问题完全相同的Java,除了:
String search = "/bla/$V_N.$$XYZ.bla";
String pattern = "(?:(?<oc>[%$]))(?!(\\k<oc>))([^%.$]*)+";
这只会导致 V_N .
我真的很努力地解决这个问题,想知道是否有人可以帮助我解决这个问题.谢谢.
您可以编写带有多个捕获组的更详细的正则表达式,仅捕获不属于null
的捕获组,或者从那里简单地连接找到的组值在每次比赛中,始终只有其中一个会被初始化:
%([^%.]+)%|(?<!\$)\$(?:\{([^{}]+)\}|([^$.]+))
请参见 regex演示.
详细信息
-
%([^%.]+)%
-%
,组1:除了%
和.
之外的一个或多个字符,然后消耗了%
-
|
-或 -
(?<!\$)
-与字符串中没有紧跟在$
之后的位置匹配的否定性
-
\$
-一个$
-
(?:
-与以下任意一个匹配的非捕获容器组的开始:-
\{([^{}]+)\}
-{
,第2组:除{
和}
之外的任何一个或多个字符,则消耗了}
-
|
-或 -
([^$.]+)
-第3组:$
和.
以外的1个或更多字符
-
-
)
-非捕获容器组的末尾.
String regex = "%([^%.]+)%|(?<!\\$)\\$(?:\\{([^\\{}]+)\\}|([^$.\\s]+))";
String string = "%ABC%\n$ABC.\n$ABC$XYZ ${ABC}\n\n$$EFG $${EFG}.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (m.find()) {
results.add(Objects.toString(m.group(1),"") +
Objects.toString(m.group(2),"") +
Objects.toString(m.group(3),""));
}
System.out.println(results); // => [ABC, ABC, ABC, XYZ, ABC]
请注意,在常规Java字符串文字中,应转义\
(即\\
)以引入单个 literal 反斜杠,该反斜杠用作 regex转义的一部分>.
I posted this question earlier.
But that wasn't quite the end of it. All the rules that applied there still apply.
So the strings:
"%ABC%"
would yield ABC as a result (capture stuff between percent signs)- as would
"$ABC."
(capture stuff after $, giving up when another dollar or dot appears) "$ABC$XYZ"
would too, and also give XYZ as a result.
To add a bit more to this:
"${ABC}"
should yield ABC too. (ignore curly braces if present - non capture chars perhaps?).- if you have two successive dollar signs, such as
"$$EFG"
, or"$${EFG}"
,
that should not appear in a regex result. (This is where either numbered or named back- references come into play - and the reason I contemplated them as non-capture groups). As I understand it, a group becomes a non-capture group with this syntax(?:)
.
1) Can I say the % or $ is a non-capture group and reference that by number? Or do only capture groups get allocated numbers?
2) What is the order of the numbering, if you have ((A) (B) (C))
. Is the outer group 1, A 2, B 3 C 4?
I have been look at named groups. Saw the syntax mentioned here
(?<name>capturing text)
to define a named group "name"
\k<name>
to backreference a named group "name"
3) Not sure if a non-capture group can be named in Java? Can someone elucidate?
- More info here on non capture groups.
- More info here on lookbehinds
- Similar answer to a question here, but didn't quite get me what I wanted. Not sure if there is a back-reference issue in Java.
- Similar question here. But could not get my head around the working version to apply to this.
I have used the exact same Java I had in my original question, except for:
String search = "/bla/$V_N.$$XYZ.bla";
String pattern = "(?:(?<oc>[%$]))(?!(\\k<oc>))([^%.$]*)+";
This should only result in V_N.
I am really struggling with this one, and wondered if someone can help me work out how to solve this. Thanks.
You may write a little bit more verbose regex with multiple capturing groups and only grab those that are not null
, or plainly concatenate the found group values since there will be always only one of them initialized upon each match:
%([^%.]+)%|(?<!\$)\$(?:\{([^{}]+)\}|([^$.]+))
See the regex demo.
Details
%([^%.]+)%
-%
, Group 1: one or more chars other than%
and.
, then a%
is consumed|
- or(?<!\$)
- a negative lookbehind that matches a location in string that is not immediately preceded with$
\$
- a$
(?:
- start of the non-capturing container group matching either of:\{([^{}]+)\}
-{
, Group 2: any one or more chars other than{
and}
, then}
is consumed|
- or([^$.]+)
- Group 3: 1 or more chars other than$
and.
)
- end of the non-capturing container group.
String regex = "%([^%.]+)%|(?<!\\$)\\$(?:\\{([^\\{}]+)\\}|([^$.\\s]+))";
String string = "%ABC%\n$ABC.\n$ABC$XYZ ${ABC}\n\n$$EFG $${EFG}.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (m.find()) {
results.add(Objects.toString(m.group(1),"") +
Objects.toString(m.group(2),"") +
Objects.toString(m.group(3),""));
}
System.out.println(results); // => [ABC, ABC, ABC, XYZ, ABC]
Mind that in regular Java string literals, \
should be escaped (i.e. \\
) to introduce a single literal backslash that is used as part of regex escapes.
这篇关于Java Regex-用一美元捕获字符串,但是当它有两个连续的字符串时不捕获的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!