\ G如何在.split中工作? [英] How does \G work in .split?

查看:72
本文介绍了\ G如何在.split中工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢用Java进行代码搜索(尽管Java太冗长而无法竞争),这正在以尽可能少的字节完成一定的挑战.在我的一个答案中,我有以下代码:

I like to do code-golfing in Java (even though Java way too verbose to be competitive), which is completing a certain challenge in as few bytes as possible. In one of my answers I had the following piece of code:

for(var p:"A4;B8;CU;EM;EW;E3;G6;G9;I1;L7;NZ;O0;R2;S5".split(";"))

在我们将其转换为带有.split的字符串数组之后,基本上遍历了2个字符的字符串.有人建议我可以打高尔夫球,以节省4个字节:

Which basically loops over the 2-char Strings after we converted it into a String-array with .split. Someone suggested I could golf it to this instead to save 4 bytes:

for(var p:"A4B8CUEMEWE3G6G9I1L7NZO0R2S5".split("(?<=\\G..)"))

功能仍然相同.循环遍历2个字符的字符串.

The functionality is still the same. It loops over the 2-char Strings.

但是,我们两个人都不是100%知道它是如何工作的,因此是这个问题.

However, neither of us was 100% sure how this works, hence this question.

我所知道的:

我知道.split("(?<= ... )")用于拆分,但保留尾随定界符.
还有一种方法可以将前导定界符或定界符保留为单独的项目:

I know .split("(?<= ... )") is used to split, but keep the trailing delimiter.
There is also a way to keep a leading delimiter, or delimiter as separated item:

"a;b;c;d".split("(?<=;)")            // Results in ["a;", "b;", "c;", "d"]
"a;b;c;d".split("(?=;)")             // Results in ["a", ";b", ";c", ";d"]
"a;b;c;d".split("((?<=;)|(?=;))")    // Results in ["a", ";", "b", ";", "c", ";", "d"]

我知道\G用于遇到不匹配项后停止.
\G用于指示最后一场比赛结束的位置(或第一次运行的字符串的开始).由于 @SebastianProske ,更正了定义.

I know \G is used to stop after a non-match is encountered.
\G is used to indicate the position where the last match ended (or the start of the string for the first run). Corrected definition thanks to @SebastianProske.

int count = 0;
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("match,");
java.util.regex.Matcher matcher = pattern.matcher("match,match,match,blabla,match,match,");
while(matcher.find())
  count++;
System.out.println(count); // Results in 5

count = 0;
pattern = java.util.regex.Pattern.compile("\\Gmatch,");
matcher = pattern.matcher("match,match,match,blabla,match,match,");
while(matcher.find())
  count++;
System.out.println(count); // Results in 3


但是在拆分内使用\G.split("(?<=\\G..)")到底如何工作?
为什么.split("(?=\\G..)")不起作用?


But how does .split("(?<=\\G..)") work exactly when using \G inside the split?
And why does .split("(?=\\G..)") not work?

这里,在线试用"链接,查看上述所有代码段,以查看它们的实际效果.

推荐答案

.split("(?<=\\G..)")的工作方式

(?<=X)是X的零宽度正向后方.\G是上一个匹配的结束(不是某种停止指令)或输入的开始,当然..是两个单独的字符.因此,(?<=\G..)是前一场比赛的结尾后面的零宽度加两个字符.由于这是split,并且我们正在描述定界符,因此将整个内容设为零宽度断言意味着我们仅将其用于标识在何处打断字符串,而不实际消耗任何字符.

(?<=X) is a zero-width positive lookbehind for X. \G is the end of the previous match (not some kind of stop instruction) or beginning of input, and of course .. is two individual characters. So (?<=\G..) is a zero-width lookbehind for the end of the previous match plus two characters. Since this is split and we're describing a delimiter, making the entire thing a zero-width assertion means we only use it to identify where to break the string, not to actually consume any characters.

因此,让我们来看一下ABCDEF:

So let's walk through ABCDEF:

  1. \G匹配输入的开头,而..匹配AB,因此(?<=\G..)查找ABCD之间的零宽度空格,因为这是一个回溯:即第一点正则表达式光标位于\G.. prior 的位置是ABCD之间的点.因此,将ABCD分开.
  2. \GAB之后标记位置,因此(?<=\G..)CDEF之间找到零宽度的空间,因为当正则表达式光标向前移动时,这是\G..匹配的第一个位置:\G匹配ABCD之间的位置,而..匹配CD.因此,将CDEF分开.
  3. 再次相同:\GCD之后标记位置,因此(?<=\G..)查找EF与输入结束之间的零宽度空格.因此请在EF和输入结束之间进行分配.
  4. 创建一个包含所有匹配项的数组,但末尾只有一个空匹配项(因为这是split并带有一个隐式length = 0,它会在末尾丢弃空字符串).
  1. \G matches beginning of input, and .. matches AB, so (?<=\G..) finds the zero-width space between AB and CD because this is a lookbehind: That is, the first point at which there is \G.. prior to the regex cursor is the point between AB and CD. So split between AB and CD.
  2. \G marks the location just after AB so (?<=\G..) finds the zero-width space between CD and EF, because as the regex cursor goes forward, that's the first place where \G.. matches: \G matching the location between AB and CD and .. matching CD. So split between CD and EF.
  3. Same again: \G marks the location just after CD so (?<=\G..) finds the zero-width space between EF and end-of-input. So split between EF and end-of-input.
  4. Create an array with all of the matches except the empty one at the end (because this is split with an implicit length = 0 which discards empty strings at the end).

结果{ "AB", "CD", "EF" }.

为什么.split("(?=\\G..)")不起作用?

因为(?=X)是<正面>正面.上一个匹配项的结尾永远不会在正则表达式光标的前面.它只能在它的后面.

Because (?=X) is a positive lookahead. The end of the previous match will never be ahead of the regex cursor. It can only be behind it.

这篇关于\ G如何在.split中工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆