使用String.split()提取单词对 [英] Extracting pairs of words using String.split()

查看:125
本文介绍了使用String.split()提取单词对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于:

String input = "one two three four five six seven";

是否有适用于 String.split() 一次抓取(最多)两个单词,这样:

Is there a regex that works with String.split() to grab (up to) two words at a time, such that:

String[] pairs = input.split("some regex");
System.out.println(Arrays.toString(pairs));

结果如下:

[one two, three four, five six, seven]






这个问题是关于拆分正则表达式关于找到解决方法或其他让它以其他方式工作的解决方案。


This question is about the split regex. It is not about "finding a work-around" or other "making it work in another way" solutions.

推荐答案

目前(包括Java 8)可以用 split()来实现,但在现实世界中不要使用这种方法,因为它看起来它基于bug(Java中的后视应该有明显的最大长度,但是这个解决方案使用 \w + ,这不遵守这个限制)。而是使用 Pattern Matcher 类来避免过度复杂的问题和维护地狱,因为这种行为可能会在下一版本的Java中发生变化或者在像Android这样的类似Java的环境中。

Currently (including Java 8) it is possible to do it with split(), but in real world don't use this approach since it looks like it is based on bug (look-behind in Java should have obvious maximum length, but this solution uses \w+ which doesn't respect this limitation). Instead use Pattern and Matcher classes to avoid overcomplicating thins and maintenance hell since this behaviour may change in next versions of Java or in Java-like environments like Android.

这是你在寻找什么?

(您可以用 \\\\ \\\\ c>包括所有非空格字符,但在本例中,我将保留 \\\\ ,因为使用 \更容易阅读正则表达式\\\\\ 然后 \\\\\\

Is this what you are looking for?
(you can replace \\w with \\S to include all non-space characters but for this example I will leave \\w since it is easier to read regex with \\w\\s then \\S\\s)

String input = "one two three four five six seven";
String[] pairs = input.split("(?<!\\G\\w+)\\s");
System.out.println(Arrays.toString(pairs));

输出:

[one two, three four, five six, seven]






\ G 是前一场比赛,(?<!regex)是否为负面看法。


\G is previous match, (?<!regex) is negative lookbehind.

拆分我们正在尝试


  1. 查找空格 - > \\\\

  2. 未预测 - > (?<!negativeLookBehind)

  3. 用一些词 - > \\\\ + + / c $ c>

  4. 之前匹配(空格) - > \\\\

  5. 之前 - > \\G \\ w +

  1. find spaces -> \\s
  2. that are not predicted -> (?<!negativeLookBehind)
  3. by some word -> \\w+
  4. with previously matched (space) -> \\G
  5. before it ->\\G\\w+.

仅限我在开始时的困惑是它如何适用于第一个空间,因为我们希望忽略该空间。 重要信息是 \\\\ 在开始时匹配字符串的开头 ^

Only confusion that I had at start was how would it work for first space since we want that space to be ignored. Important information is that \\G at start matches start of the String ^.

所以在第一次迭代之前,负面后卫的正则表达式看起来像(?<!^ \\\\ + +)并且由于第一个空格执行之前有 ^ \\\\ + + / c $ c>,因此无法匹配拆分。下一个空格不会有这个问题,所以它会被匹配并且有关于它的信息(比如输入中的 位置 字符串)将存储在 \\\\ 中,稍后将用于下一个负面监视。

So before first iteration regex in negative look-behind will look like (?<!^\\w+) and since first space do have ^\\w+ before, it can't be match for split. Next space will not have this problem, so it will be matched and informations about it (like its position in input String) will be stored in \\G and used later in next negative look-behind.

所以对于第三空格,正则表达式将检查以前是否匹配空格 \\\\ 和单词 \\\\ + + + / c $ c>在它之前。由于这个测试的结果是积极的,负面的后卫不会接受它所以这个空间不会匹配,但第四个空间不会有这个问题因为它之前的空间不会与存储在 \\\中的空间相同\\ G (它在输入中有不同的位置字符串)。

So for 3rd space regex will check if there is previously matched space \\G and word \\w+ before it. Since result of this test will be positive, negative look-behind wont accept it so this space wont be matched, but 4th space wont have this problem because space before it wont be the same as stored in \\G (it will have different position in input String).

此外,如果有人愿意分开,可以说每隔3个空格就可以使用此表格(基于 @maybeWeCouldStealAVan 答案,当我发布这段答案时删除了

Also if someone would like to separate on lets say every 3rd space you can use this form (based on @maybeWeCouldStealAVan's answer which was deleted when I posted this fragment of answer)

input.split("(?<=\\G\\w{1,100}\\s\\w{1,100}\\s\\w{1,100})\\s")

而不是100你可以使用一些更大的值,它至少是String中最长单词长度的大小。

Instead of 100 you can use some bigger value that will be at least the size of length of longest word in String.

我j我注意到我们也可以使用 + 而不是 {1,maxWordLength} 如果我们想要用每个奇数分开比如每3日,5日,7日例如

I just noticed that we can also use + instead of {1,maxWordLength} if we want to split with every odd number like every 3rd, 5th, 7th for example

String data = "0,0,1,2,4,5,3,4,6,1,3,3,4,5,1,1";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+,\\d+,\\d+),");//every 5th comma 

这篇关于使用String.split()提取单词对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆