如何使用条件拆分字符串 [英] How to split a string with conditions
问题描述
分割字符串时,如何确定如果分隔符位于两个字符之间则不会被视为?
When splitting a string, how can I make sure that if the delimiter is located between two characters then it won't be considered?
// Input
String string = "a,b,[c,d],e";
String[] split = string.split(",");
// Output
split[0] // "a"
split[1] // "b"
split[2] // "[c"
split[3] // "d]"
split[4] // "e"
// Required
split[0] // "a"
split[1] // "b"
split[2] // "[c,d]"
split[3] // "e"
推荐答案
答案结束时的首选方法
您好像正在寻找环顾四周机制。
It seems you are looking for look-around mechanism.
例如,如果你想拆分之前没有 foo
的空格而没有 bar
之后你的代码看起来像
For instance if you want to split on whitespace which has no foo
before and no bar
after it your code can look like
split("(?<!foo)\\s(?!bar)")
更新(假设没有任何嵌套的 [...]
,并且它们格式正确,例如所有 [
以结束]
):
Update (assuming that there can't be any nested [...]
and they are well formatted for instance all [
are closed with ]
):
您的情况似乎有点复杂。您可以做的是接受,
如果
Your case seems little more complex. What you can do is accept ,
if
- 它没有任何
[
或]
之后, -
或首先打开括号此逗号之后
[
,此逗号与其自身之间没有右括号]
,否则表示逗号在里面区域如
- it doesn't have any
[
or]
after it, or if first opening bracket
[
after this comma, has no closing bracket]
between this comma and itself, otherwise it would mean that comma is inside of area like
[ , ] [
^ ^ ^ - first `[` after tested comma
| +---- one `]` between tested comma and first `[` after it
+------ tested comma
所以你的代码看起来像是
(这是原始版本,但是下面的内容很简单一)
So your code can look like
(this is original version, but below is little simplified one)
split(",(?=[^\\]]*(\\[|$))")
此正则表达式基于您不想要的逗号的想法接受是在 [foo,bar]
里面。但是如何确定我们在这个区块内部(或外部)?
This regex is based on idea that commas you don't want to accept are inside [foo,bar]
. But how to determine that we are inside (or outside) such block?
- 如果字符在里面那么就没有
[
之后的字符,直到我们找到]
(下一个[
可以出现在找到]
,如果[a,b],[c,d]
逗号a
和b
没有[
,直到找到]
,但可能会有一些新的区域[..]
之后哪个部分以开始[
) - 如果字符在
[...]
区域之外,则接下来只能出现非]
字符,直到我们找到[...]
区域的开头,或者我们将读取字符串的结尾。
- if character is inside then there will be no
[
character after it, until we find]
(next[
can appear after found]
like in case[a,b],[c,d]
comma betweena
andb
has no[
until it finds]
, but there can be some new area[..]
after it which ofcourse starts with[
) - if character are outside
[...]
area then next after it can appear only non]
characters, until we find start of[...]
area, or we will read end of string.
第二种情况是您感兴趣的。所以我们需要创建接受,
的正则表达式之后只有非]
(它不在 [...]
内),直到找到 [
或读取字符串结尾(由 $
)
Second case is the one you are interested in. So we need to create regex which will accept ,
which has only non ]
after it (it is not inside [...]
) until it finds [
or read end of string (represented by $
)
这样的正则表达式可以写成
Such regex can be written as
-
,
逗号 -
(?= ...)
哪个有它之后 -
[^ \\]] *(\\ [| $)
-
[^ \\]] *
零或更多非]
字符(]
需要作为元字符进行转义) -
(\\ [| $)
哪些[
(它还需要在正则表达式中转义)或字符串结束后
,
comma(?=...)
which has after it[^\\]]*(\\[|$)
[^\\]]*
zero or more non]
characters (]
need to be escaped as metacharacter)(\\[|$)
which have[
(it also needs to be escaped in regex) or end of string after it
小简化拆分版
string.split(",(?![^\\[]*\\])");
这意味着:用逗号分隔
,
之后没有(由(?!...)
表示)未结算]
(未结算]
在测试过的逗号与其自身之间没有[
,可以写成[^ \\ [] * \\]
)Which means: split on comma
,
which after it has no (represented by(?!...)
) unclosed]
(unclosed]
has no[
between tested comma and itself which can be written as[^\\[]*\\]
)首选方法
为了避免这种复杂的正则表达式,不要使用
split
,而是使用Pattern和Matcher类来搜索[...]
或非逗号词。To avoid such complex regex don't use
split
but Pattern and Matcher classes, which will search for areas like[...]
or non-comma words.String string = "a,b,[c,d],e"; Pattern p = Pattern.compile("\\[.*?\\]|[^,]+"); Matcher m = p.matcher(string); while (m.find()) System.out.println(m.group());
输出:
a b [c,d] e
这篇关于如何使用条件拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-