Java 中使用分隔符“标记化"的问题." [英] Tokenize problem in Java with separator ". "

查看:20
本文介绍了Java 中使用分隔符“标记化"的问题."的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用分隔符 "." 拆分文本.例如我想要这个字符串:

I need to split a text using the separator ". ". For example I want this string :

Washington is the U.S Capital. Barack is living there.

分为两部分:

Washington is the U.S Capital. 
Barack is living there.

这是我的代码:

// Initialize the tokenizer
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". ");
 while (tokenizer.hasMoreTokens()) {
      System.out.println(tokenizer.nextToken());

}

不幸的是输出是:

Washington
is
the
U
S
Capital
Barack
is
living
there

谁能解释一下这是怎么回事?

Can someone explain what's going on?

推荐答案

Don't use StringTokenizer;这是一个遗留类.使用 java.util.Scanner 或简单地 String.split 代替.

Don't use StringTokenizer; it's a legacy class. Use java.util.Scanner or simply String.split instead.

    String text = "Washington is the U.S Capital. Barack is living there.";
    String[] tokens = text.split("\\. ");
    for (String token : tokens) {
        System.out.println("[" + token + "]");
    }

打印:

[Washington is the U.S Capital]
[Barack is living there.]

请注意,splitScanner 是基于regex"的(正则表达式),并且由于 . 是一个特殊的正则表达式meta-字符",需要用 \ 转义.反过来,由于 \ 本身是 Java 字符串文字的转义字符,因此您需要编写 "\\." 作为分隔符.

Note that split and Scanner are "regex"-based (regular expressions), and since . is a special regex "meta-character", it needs to be escaped with \. In turn, since \ is itself an escape character for Java string literals, you need to write "\\. " as the delimiter.

这听起来可能很复杂,但实际上并非如此.splitScanner 远优于 StringTokenizer,并且正则表达式并不难学.

This may sound complicated, but it really isn't. split and Scanner are much superior to StringTokenizer, and regex isn't that hard to pick up.

  • Java Lessons/Regular expressions
  • regular-expressions.info - Very good tutorial, not Java specific
  • java.util.StringTokenizer
    • StringTokenizer 是一个遗留类,出于兼容性原因保留,但不鼓励在新代码中使用它.建议任何寻求此功能的人使用 Stringsplit 方法或 java.util.regex 包.
    • java.util.StringTokenizer
      • StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
      • 围绕给定正则表达式的匹配项拆分此字符串.

      问题在于StringTokenizer 将分隔符字符串中的每个字符作为单独的分隔符,即不是整个String代码>本身.

      The problem is that StringTokenizer takes each character in the delimiter string as individual delimiters, i.e. NOT the entire String itself.

      来自 API:

      StringTokenizer(String str, String delim):为指定的字符串构造一个字符串标记器.delim 参数中的字符是分隔标记的分隔符. 分隔符本身不会被视为标记.

      StringTokenizer(String str, String delim): Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.

      这篇关于Java 中使用分隔符“标记化"的问题."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆