Java 中使用分隔符“标记化"的问题." [英] Tokenize problem in Java with separator ". "
问题描述
我需要使用分隔符 "."
拆分文本.例如我想要这个字符串:
I need to split a text using the separator ". "
. For example I want this string :
Washington is the U.S Capital. Barack is living there.
分为两部分:
Washington is the U.S Capital.
Barack is living there.
这是我的代码:
// Initialize the tokenizer
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". ");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
不幸的是输出是:
Washington
is
the
U
S
Capital
Barack
is
living
there
谁能解释一下这是怎么回事?
Can someone explain what's going on?
推荐答案
Don't use StringTokenizer
;这是一个遗留类.使用 java.util.Scanner
或简单地 String.split
代替.
Don't use StringTokenizer
; it's a legacy class. Use java.util.Scanner
or simply String.split
instead.
String text = "Washington is the U.S Capital. Barack is living there.";
String[] tokens = text.split("\\. ");
for (String token : tokens) {
System.out.println("[" + token + "]");
}
打印:
[Washington is the U.S Capital]
[Barack is living there.]
请注意,split
和 Scanner
是基于regex"的(正则表达式),并且由于 .
是一个特殊的正则表达式meta-字符",需要用 \
转义.反过来,由于 \
本身是 Java 字符串文字的转义字符,因此您需要编写 "\\."
作为分隔符.
Note that split
and Scanner
are "regex"-based (regular expressions), and since .
is a special regex "meta-character", it needs to be escaped with \
. In turn, since \
is itself an escape character for Java string literals, you need to write "\\. "
as the delimiter.
这听起来可能很复杂,但实际上并非如此.split
和 Scanner
远优于 StringTokenizer
,并且正则表达式并不难学.
This may sound complicated, but it really isn't. split
and Scanner
are much superior to StringTokenizer
, and regex isn't that hard to pick up.
- Java 课程/正则表达式
- regular-expressions.info - 非常好的教程,不是针对 Java 的
- Java Lessons/Regular expressions
- regular-expressions.info - Very good tutorial, not Java specific
java.util.StringTokenizer代码>
StringTokenizer
是一个遗留类,出于兼容性原因保留,但不鼓励在新代码中使用它.建议任何寻求此功能的人使用String
的split
方法或java.util.regex
包.
java.util.StringTokenizer
StringTokenizer
is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use thesplit
method ofString
or thejava.util.regex
package instead.
- 一个简单的文本扫描器,可以使用正则表达式解析原始类型和字符串.
- Java 教程 - 基本 I/O - 扫描和格式化
- 围绕给定正则表达式的匹配项拆分此字符串.
问题在于
StringTokenizer
将分隔符字符串中的每个字符作为单独的分隔符,即不是整个String代码>本身.
The problem is that
StringTokenizer
takes each character in the delimiter string as individual delimiters, i.e. NOT the entireString
itself.来自 API:
StringTokenizer(String str, String delim)
:为指定的字符串构造一个字符串标记器.delim
参数中的字符是分隔标记的分隔符. 分隔符本身不会被视为标记.StringTokenizer(String str, String delim)
: Constructs a string tokenizer for the specified string. The characters in thedelim
argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.这篇关于Java 中使用分隔符“标记化"的问题."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!