如何计算文本文件中的单词,java 8-style [英] How to count words in a text file, java 8-style

查看:29
本文介绍了如何计算文本文件中的单词,java 8-style的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行一项作业,首先计算目录中的文件数,然后计算每个文件中的字数.我的文件计数没问题,但是我很难将老师给我的一些代码从进行频率计数的课程转换为更简单的字数计数.此外,我似乎无法找到正确的代码来查看每个文件来计算单词(我试图找到通用"而不是特定的东西,但我尝试使用特定的文本文件测试程序).这是预期的输出:

I'm trying to perform an assignment that first counts the number of files in a directory and then give a word count within each file. I got the file count alright, but I'm having a hard time converting some code my instructor gave me from a class that does a frequency count to the simpler word count. Moreover, I can't seem to find the proper code to look at each file to count the words (I'm trying to find something "generic" rather than a specific, but I trying to test the program using a specific text file). This is the intended output:

Count 11 files:
word length: 1 ==> 80
word length: 2 ==> 321
word length: 3 ==> 643

然而,这是输出的内容:

However, this is what's being outputted instead:

primes.txt
but
are
sometimes
sense
refrigerator
make
haiku
dont
they
funny
word length: 1 ==> {but=1, are=1, sometimes=1, sense=1, refrigerator=1, make=1, haiku=1, dont=1, they=1, funny=1}

.....

Count 11 files:

我使用了两个类:WordCount 和 FileCatch8

I'm using two classes: WordCount and FileCatch8

字数:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.AbstractMap.SimpleEntry;
import java.util.Arrays;
import java.util.Map;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;

    /**
     *
     * @author 
     */
    public class WordCount {

        /**
         *
         * @param filename
         * @return
         * @throws java.io.IOException
         */
        public Map<String, Long> count(String filename) throws IOException {
            //Stream<String> lines = Files.lines(Paths.get(filename));
            Path path = Paths.get("haiku.txt");
            Map<String, Long> wordMap = Files.lines(path)
                    .parallel()
                    .flatMap(line -> Arrays.stream(line.trim().split(" ")))
                    .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
                    .filter(word -> word.length() > 0)
                    .map(word -> new SimpleEntry<>(word, 1))
                    //.collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
                    .collect(groupingBy(SimpleEntry::getKey, counting()));

            wordMap.forEach((k, v) -> System.out.println(String.format(k,v)));
            return wordMap;
        }
    }

和 FileCatch:

And FileCatch:

import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */

/**
 *
 * @author 
 */
public class FileCatch8 {
    public static void main(String args[]) {
        List<String> fileNames = new ArrayList<>();
        try {
            DirectoryStream<Path> directoryStream = Files.newDirectoryStream
        (Paths.get("files"));
            int fileCounter = 0;
            WordCount wordCnt = new WordCount();
            for (Path path : directoryStream) {
                System.out.println(path.getFileName());
                fileCounter++;
                fileNames.add(path.getFileName().toString()); 
                System.out.println("word length: " +  fileCounter + " ==> " + 
                        wordCnt.count(path.getFileName().toString()));
}
        } catch(IOException ex){
    }
    System.out.println("Count: "+fileNames.size()+ " files");

  }
}

程序使用带有 lambda 语法的 Java 8 流

The program uses Java 8 streams with lambda syntax

推荐答案

字数统计示例:

Files.lines(Paths.get(file))
    .flatMap(line -> Arrays.stream(line.trim().split(" ")))
    .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
    .filter(word -> !word.isEmpty())
    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

文件数:

Files.walk(Paths.get(file), Integer.MAX_VALUE).count();
Files.walk(Paths.get(file)).count();

这篇关于如何计算文本文件中的单词,java 8-style的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆