尝试从Java文件中读取2个单词 [英] Trying to read 2 words from a file in Java

查看:52
本文介绍了尝试从Java文件中读取2个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个简单的程序来读取文本文件并将成对的单词存储在Set中.这是我为此编写的代码

I'm trying to write a simple program to read a text file and store pair of words in a Set. Here is the code I wrote for that

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.TreeSet;

public class Main {

    public static void main(String[] args) {

        TreeSet<String> phraseSet = new TreeSet<String>();

        try {
            Scanner readfile = new Scanner(new File("data.txt"));
            while(readfile.hasNext("\\w{2}")) {
                String phrase = readfile.next("\\w{2}");
                phraseSet.add(phrase);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        for(String p : phraseSet) {
            System.out.println(p);
        }       
    }
}

代码可以编译,但会打印出空白行(永远不会输入while循环). data.txt文件的内容为:

The code compiles but prints out a blank line (The while loop is never entered). The data.txt file contents are:

There are seven words in this line.
And then there are few more words in this line.

我希望在TreeSet中跟随字符串(当然,按排序顺序)

I'm expecting following Strings in my TreeSet (off course in sorted order)

There are
are seven
seven words
words in
in this
this line
line And
And then
then there
there are
....
this line

推荐答案

您的主要问题是,默认情况下Scanner通过空格解析标记.
根据 API :

Your main problem is that Scanner by default parses tokens by whitespace.
According to the API:

扫描程序使用定界符模式将其输入分为令牌,默认情况下,该模式与空格匹配.然后,可以使用各种下一种方法将生成的令牌转换为不同类型的值.

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

如果您查看 hasNext(String pattern) ,您会看到它

If you take a look at hasNext(String pattern), you'll see that it

如果下一个标记与从指定字符串构造的模式匹配,则返回该标记.如果匹配成功,则扫描程序将前进经过与模式匹配的输入.
(强调我的)

Returns the next token if it matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.
(emphasis mine)

即在您要求Scanner检查令牌时,它已经用空格将输入分开,因此要求查找中间有空格的令牌将始终失败.

i.e. By the time you are asking for the Scanner to check for your token, it's already broken up the input by whitespace, so asking to find a token with a space in the middle will always fail.

一种更好的方法是一次将Scanner读入一行,然后只需split()一行并自己解析:

A better way to do this would be to have the Scanner read in a line at a time, and then just split() the line and parse it yourself:

Scanner readfile = new Scanner(new File("data.txt"));
while (readfile.hasNextLine()) {
    String[] words = readfile.nextLine().split("\\s");
    for (int i=0; i<words.length-1; i++) {
        phraseSet.add(words[i] + " " + words[i+1]);
    }
}

您的问题并未明确提及,但从示例输出中,您似乎想忽略阅读中的换行符.这种方法稍微复杂一点,但是您可以存储每行的最后一个单词,然后在解析下一行时将其添加,就像这样:

Your question didn't explicitly mention it, but from your example output, it looks like you want to ignore line breaks in reading. This approach makes that slightly more complicated, but you can just store off the last word of each line and add it when parsing the next, like so:

String lastWord = null;
while (readfile.hasNextLine()) {
    String[] words = readfile.nextLine().split("\\s");
    if (lastWord != null) {
        phraseSet.add(lastWord + " " + words[0]);
    }
    for (int i=0; i<words.length-1; i++) {
        phraseSet.add(words[i] + " " + words[i+1]);
    }
    lastWord = words[words.length-1];
}

如果这实际上是您要查找的内容,则最好使用next()一次将每个单词拉出一个单词,就像其他答案已说明如何操作一样.

If this is actually what you're looking for, you're probably better off just using next() to pull each word one at a time like other answers have shown how to do.

您不能使用Scanner直接查找多字标记,而您必须自己进行解析.

You cannot use Scanner to directly look for multi-word tokens, you'll have to do the parsing yourself.

这篇关于尝试从Java文件中读取2个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆