你怎么保持scanner.next()不包括换行符? [英] How do you keep scanner.next() from including newline?

查看:1163
本文介绍了你怎么保持scanner.next()不包括换行符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图简单地使用带有分隔符等于的scanner .next()读取文本文件中的单词,但扫描程序包含带有令牌的换行符/回车符。

I am trying to simply read words in a text file using scanner.next() with delimiter equal " " but the scanner includes the newline/carriage return with the token.

我已经在互联网上搜索试图找到这个问题的一个很好的例子而没有找到它所以我在这里发布它。我在SO上发现了另一个类似的问题。我还查看了有关扫描仪和模式的文档( http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html )但我仍然无法找到解决此问题的方法。

I have scoured the internet trying to find a good example of this problem and have not found it so I am posting it here. I can't find another similar problem posted here on SO. I also looked over the documentation on scanner and pattern (http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html) but I still cannot find a way to solve this.

文本文件:

这是一个测试

查看if1 this,是否正常工作

to see if1 this, is working

ok!

代码:

int i = 0;
String string;
try(Scanner scanner = new Scanner(new File(filename))) {
    scanner.useDelimiter(" ");
    while(scanner.hasNext())
    {
    string = scanner.next();
    System.out.println(i++ + ": " + string);
    }
}catch(IOException io_error) {
    System.out.println(io_error);
    }

输出:

0:这个

1:

2:a

3:测试

4:见

5:if1

6:this,

7:是

8:工作

ok!

如您所见,#3和#8有两个单词用换行符分隔。 (我知道我可以将它们分成两个单独的字符串。)

As you can see, #3 and #8 have two words separated by a newline. (I know I can separate these into two separate strings.)

推荐答案

扫描仪文档说:


扫描仪使用的默认空白分隔符由 Character.isWhitespace

$ b $识别b

链接的 Character.isWhitespace 的文档说:

And the linked documentation of Character.isWhitespace says:


确定如果指定的字符是根据Java的空格。当且仅当它满足以下条件之一时,字符才是Java空白字符:

Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:


  • 它是一个Unicode空格字符(SPACE_SEPARATOR, LINE_SEPARATOR或PARAGRAPH_SEPARATOR)但也不是一个不间断的空格('\ u00A0','\ u2007','\ u202F')。

  • 它是'\ t',U + 0009 HORIZONTAL TABULATION。

  • 这是'\ n',U + 000A LINE FEED。

  • 这是'\ u000B',U + 000B垂直制表。

  • 这是'\'',U + 000C FORM FEED。

  • 它是'\ r',U + 000D CARRIAGE RETURN。

  • 这是'\ u001C',U + 001C FILE SEPARATOR。

  • 这是'\ u001D',U + 001D GROUP SEPARATOR。

  • 它是'\ u001E',U + 001E RECORD SEPARATOR。

  • 它是'\ u001F',U + 001F单元分离器。

  • It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
  • It is '\t', U+0009 HORIZONTAL TABULATION.
  • It is '\n', U+000A LINE FEED.
  • It is '\u000B', U+000B VERTICAL TABULATION.
  • It is '\f', U+000C FORM FEED.
  • It is '\r', U+000D CARRIAGE RETURN.
  • It is '\u001C', U+001C FILE SEPARATOR.
  • It is '\u001D', U+001D GROUP SEPARATOR.
  • It is '\u001E', U+001E RECORD SEPARATOR.
  • It is '\u001F', U+001F UNIT SEPARATOR.

因此,不要设置任何特定的分隔符。保持默认值,换行符将被视为分隔符,就像空格一样,这意味着令牌不会包含换行符。

So, just don't set any specific delimiter. Keep the default, and newlines will be considered as a delimiter just like spaces, which means the token won't include newline characters.

这篇关于你怎么保持scanner.next()不包括换行符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆