如何使用条件拆分字符串 [英] How to split a string with conditions

查看:133
本文介绍了如何使用条件拆分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分割字符串时,如何确定如果分隔符位于两个字符之间则不会被视为

When splitting a string, how can I make sure that if the delimiter is located between two characters then it won't be considered?

// Input
String string = "a,b,[c,d],e";
String[] split = string.split(",");
// Output
split[0] // "a"
split[1] // "b"
split[2] // "[c"
split[3] // "d]"
split[4] // "e"
// Required
split[0] // "a"
split[1] // "b"
split[2] // "[c,d]"
split[3] // "e"


推荐答案

答案结束时的首选方法

您好像正在寻找环顾四周机制。

It seems you are looking for look-around mechanism.

例如,如果你想拆分之前没有 foo 的空格而没有 bar 之后你的代码看起来像

For instance if you want to split on whitespace which has no foo before and no bar after it your code can look like

split("(?<!foo)\\s(?!bar)")






更新(假设没有任何嵌套的 [...] ,并且它们格式正确,例如所有 [结束] ):


Update (assuming that there can't be any nested [...] and they are well formatted for instance all [ are closed with ]):

您的情况似乎有点复杂。您可以做的是接受如果

Your case seems little more complex. What you can do is accept , if


  • 它没有任何 [] 之后,

  • 或首先打开括号此逗号之后 [,此逗号与其自身之间没有右括号] ,否则表示逗号在里面区域如

  • it doesn't have any [ or ] after it,
  • or if first opening bracket [ after this comma, has no closing bracket ] between this comma and itself, otherwise it would mean that comma is inside of area like

[ , ] [
  ^ ^ ^ - first `[` after tested comma
  | +---- one `]` between tested comma and first `[` after it
  +------ tested comma


所以你的代码看起来像是
(这是原始版本,但是下面的内容很简单一)

So your code can look like
(this is original version, but below is little simplified one)

split(",(?=[^\\]]*(\\[|$))")

此正则表达式基于您不想要的逗号的想法接受是在 [foo,bar] 里面。但是如何确定我们在这个区块内部(或外部)?

This regex is based on idea that commas you don't want to accept are inside [foo,bar]. But how to determine that we are inside (or outside) such block?


  1. 如果字符在里面那么就没有 [之后的字符,直到我们找到] (下一个 [可以出现在找到] ,如果 [a,b],[c,d] 逗号 a b 没有 [,直到找到] ,但可能会有一些新的区域 [..] 之后哪个部分以开始[

  2. 如果字符在 [...] 区域之外,则接下来只能出现非 ] 字符,直到我们找到 [...] 区域的开头,或者我们将读取字符串的结尾。

  1. if character is inside then there will be no [ character after it, until we find ] (next [ can appear after found ] like in case [a,b],[c,d] comma between a and b has no [ until it finds ], but there can be some new area [..] after it which ofcourse starts with [)
  2. if character are outside [...] area then next after it can appear only non ] characters, until we find start of [...] area, or we will read end of string.

第二种情况是您感兴趣的。所以我们需要创建接受的正则表达式之后只有非] (它不在 [...] 内),直到找到 [或读取字符串结尾(由 $ )

Second case is the one you are interested in. So we need to create regex which will accept , which has only non ] after it (it is not inside [...]) until it finds [ or read end of string (represented by $)

这样的正则表达式可以写成

Such regex can be written as


  • 逗号

  • (?= ...)哪个有它之后

  • [^ \\]] *(\\ [| $)


    • [^ \\]] * 零或更多非] 字符(] 需要作为元字符进行转义)

    • (\\ [| $)哪些 [(它还需要在正则表达式中转义)或字符串结束后

    • , comma
    • (?=...) which has after it
    • [^\\]]*(\\[|$)
      • [^\\]]* zero or more non ] characters (] need to be escaped as metacharacter)
      • (\\[|$) which have [ (it also needs to be escaped in regex) or end of string after it

      小简化拆分版

      string.split(",(?![^\\[]*\\])");
      

      这意味着:用逗号分隔之后没有(由(?!...)表示)未结算] (未结算] 在测试过的逗号与其自身之间没有 [,可以写成 [^ \\ [] * \\]

      Which means: split on comma , which after it has no (represented by (?!...)) unclosed ] (unclosed ] has no [ between tested comma and itself which can be written as [^\\[]*\\])

      首选方法

      为了避免这种复杂的正则表达式,不要使用 split ,而是使用Pattern和Matcher类来搜索 [...] 或非逗号词。

      To avoid such complex regex don't use split but Pattern and Matcher classes, which will search for areas like [...] or non-comma words.

      String string = "a,b,[c,d],e";
      Pattern p = Pattern.compile("\\[.*?\\]|[^,]+");
      Matcher m = p.matcher(string);
      while (m.find())
          System.out.println(m.group());
      

      输出:

      a
      b
      [c,d]
      e
      

      这篇关于如何使用条件拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆