文档中的字数统计频率 [英] word count frequency in document

查看：135 发布时间：2020/7/14 6:16:18 java word-frequency

本文介绍了文档中的字数统计频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个目录，其中有1000个txt.files.我想知道每个单词在1000个文档中出现了多少次.因此，即使X上出现了牛"一词，也要算作一个.如果它出现在其他文档中，则将其加一.因此，如果每个文档中都出现牛"，则最大值为1000.如何在不使用任何其他外部库的情况下以简便的方式执行此操作.这是我到目前为止的内容

I have a directory in which I have 1000 txt.files in it. I want to know for every word how many times it occurs in the 1000 document. So say even the word "cow" occured 100 times in X it will still be counted as one. If it occured in a different document it is incremented by one. So the maximum is 1000 if "cow" appears in every single document. How do I do this the easy way without the use of any other external library. Here's what I have so far

     private Hashtable<String, Integer> getAllWordCount()
     private Hashtable<String, Integer> getAllWordCount()
    {
        Hashtable<String, Integer> result = new Hashtable<String, Integer>();
        HashSet<String> words = new HashSet<String>();
        try {   
            for (int j = 0; j < fileDirectory.length; j++){
                File theDirectory = new File(fileDirectory[j]);
                File[] children = theDirectory.listFiles();

                for (int i = 0; i < children.length; i++){
                    Scanner scanner = new Scanner(new FileReader(children[i]));

                    while (scanner.hasNext()){
String text = scanner.next().replaceAll("[^A-Za-z0-9]", "");
                        if (words.contains(text) == false){
                            if (result.get(text) == null)
                                result.put(text, 1);
                            else
                                result.put(text, result.get(text) + 1);
                            words.add(text);
                        }
                    }
                }
                words.clear();
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println(result.size());
        return result;
    }

文档中的字数统计频率 [英] word count frequency in document

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

文档中的字数统计频率 [英] word count frequency in document

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭