标记包含空标记的字符串 [英] Tokenising a String containing empty tokens

查看:87
本文介绍了标记包含空标记的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看似简单的问题,将用逗号分隔的String拆分为令牌,在以下情况下,输出应包括空令牌:

I have a seemingly simple problem of splitting a comma separated String into tokens, whereby the output should include empty tokens in cases where:

  • String中的第一个字符是逗号.
  • String中的最后一个字符是逗号.
  • 连续出现两个逗号.
  • The first character in the String is a comma.
  • The last character in the String is a comma.
  • Two consecutive commas occur.

例如,对于String:",abd,def,,g"应该产生输出:{"", "abd", "def", "", "ghi", ""}.

For example, for the String: ",abd,def,,g" should yield the output: {"", "abd", "def", "", "ghi", ""}.

我尝试为此使用String.splitScannerStringTokenizer,但是每个都提供了不同的不希望的输出(下面的示例).谁能为此建议一个优雅解决方案,最好使用JDK类?显然,我可以自己编写一些代码,但是我感觉我在提到的三种方法之一中缺少了某些东西.请注意,定界符是固定的String,尽管不一定是逗号,也不是单个字符.

I have tried using String.split, Scanner and StringTokenizer for this but each gives a different undesired output (examples below). Can anyone suggest an elegant solution for this, preferably using JDK classes? Obviously I could code something myself but I feel like I'm missing something on one of the three approaches mentioned. Note that the delimiter is a fixed String although not necessarily a comma, nor a single character.

示例代码

import java.util.*;

public class Main12 {
  public static void main(String[] args) {
    String s = ",abd,def,,ghi,";
    String[] tokens = s.split(",");

    System.err.println("--- String.split Output ---");
    System.err.println(String.format("%s -> %s", s, Arrays.asList(tokens)));

    for (int i=0; i<tokens.length; ++i) {
      System.err.println(String.format("tokens[%d] = %s", i, tokens[i]));
    }

    System.err.println("--- Scanner Output ---");

    Scanner sc = new Scanner(s);
    sc.useDelimiter(",");
    while (sc.hasNext()) {
      System.err.println(sc.next());
    }

    System.err.println("--- StringTokenizer Output ---");

    StringTokenizer tok = new StringTokenizer(s, ",");
    while (tok.hasMoreTokens()) {
      System.err.println(tok.nextToken());
    }
  }
}

输出

$ java Main12
--- String.split Output ---
,abd,def,,ghi, -> [, abd, def, , ghi]
tokens[0] =
tokens[1] = abd
tokens[2] = def
tokens[3] =
tokens[4] = ghi
--- Scanner Output ---
abd
def

ghi
--- StringTokenizer Output ---
abd
def
ghi

推荐答案

-1作为limit参数传递给split:

String s = ",abd,def,,ghi,";
String[] tokens = s.split(",", -1);

然后您的结果数组将包含任何结尾的空字符串.

Then your result array will include any trailing empty strings.

来自 javadocs :

如果[the limit]是非正数,则该模式将被尽可能多地应用,并且数组可以具有任何长度.如果[limit]为零,则将尽可能多地应用该模式,该数组可以具有任何长度,并且尾随的空字符串将被丢弃.

If [the limit] is non-positive then the pattern will be applied as many times as possible and the array can have any length. If [the limit] is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

调用split(regex)的行为就像limit参数是0一样,因此尾随的空字符串将被丢弃.

Calling split(regex) acts as if the limit argument is 0, so trailing empty strings are discarded.

这篇关于标记包含空标记的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆