couning单词数occurence在一个文件 [英] couning the number of words occurence in a File

查看:117
本文介绍了couning单词数occurence在一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到我们有 TXT 文件,我们想知道,多少次的是出现了 TXT 每个字。我用下面的code,但它不工作。它给所有值1。
首先,我读 TXT 文件,并在一个单独的行写的每一个字。在同一时间,我把它们放在数组列表。再后来,我读了 TXT 文件的第一行,并获取数组列表的第一个元素,并与比较全 TXT 文件。如果任何发生,增加一个阵列,显示发生次数。再取第二个数组列表项等,直到我们达到数组列表的末端。

 私有静态无效计数(字符串文本)抛出FileNotFoundException异常,IOException异常{        FileOutputStream中thewords =新的FileOutputStream(检查);         ArrayList的<串GT; keyArrayList =新的ArrayList<串GT;();
         INT countWord = 0;        StringTokenizer的标记生成器=新的StringTokenizer(文本);
         而(tokenizer.hasMoreTokens())
         {
             串nextWord = tokenizer.nextToken();
             keyArrayList.add(nextWord);
             thewords.write(nextWord.getBytes());
             thewords.write(System.getProperty(line.separator)的getBytes());
             countWord ++;
         }
         INT [] = numbOfOccurance新INT [countWord]         BR的BufferedReader =新的BufferedReader(新的FileReader(检查));
         字符串的ReadLine;
         对(INT loopIndex = 0; loopIndex&下; countWord; loopIndex ++)
         {
          的ReadLine = br.readLine();
          字符串测试= keyArrayList.get(loopIndex);
            如果(test.equals(readline的))
            {
                numbOfOccurance [loopIndex] ++;            }         }


解决方案

您的方法是慢得令人难以置信,你有订单,如果找出在整个的ArrayList 搜索一个词出现一次以上。

此外,的StringTokenizer 是pcated德$ P $。

我建议以下办法:

 进口静态java.util.function.Function.identity;
引入静态java.util.stream.Collectors.toMap;公共静态无效的主要(字串[] args)抛出异常{
    最终路径path = Paths.get(路径,来,文件);
    最终地图<字符串,整数>数= countOccurrences(路径);
}私有静态地图<字符串,整数> countOccurrences(路径路径)抛出IOException
    最终的模式模式= Pattern.compile([^ A-ZA-Z'] +);
    尝试(最终流<串GT;线= Files.lines(路径)){
        返回线
                .flatMap(模式:: splitAsStream)
                .collect(toMap(身份()中,W - →1,整数::总和));
    }
}

这使用Java 8 API来从文件中读取行。然后,它分割的行[^ A-ZA-Z'] + ,即非字,不撇号人物 - 使用的 flatMap 来创建一个的各个单词。

我们再使用 地图 收集的话,因为我们把每个字 1 地图。然后,我们使用合并功能整数::总和已增加值一起在地图

您可以然后列出的地图,由发生排序,使用下面的内容:

  counts.entrySet()流。()
        .sorted(Map.Entry.comparingByValue())
        .MAP(E - >的String.format(%S - >%S,e.getKey(),e.​​getValue()))
        .forEach(的System.out ::的println);

considering we have txt file and we wish to know that how many times each words of the txt is appeared. I used the following code but it does not work. it gives all values 1 . First I read the txt file and write each word in a separate line. at the same time, I put them in the Array List. then later, I read first line of the txt file and fetch the first element of the Array List and make comparison with the whole txt file. if any occurrence, increasing one to an array that shows the number of occurrence. and then fetching the second Array List item and so on until we reach the end of Array List.

 private static void count(String text) throws FileNotFoundException, IOException {

        FileOutputStream thewords=new FileOutputStream(Check);

         ArrayList<String> keyArrayList=new ArrayList<String>();
         int countWord=0;

        StringTokenizer tokenizer =new StringTokenizer(text) ;


         while(tokenizer.hasMoreTokens())
         {
             String nextWord=tokenizer.nextToken();
             keyArrayList.add(nextWord);
             thewords.write(nextWord.getBytes());
             thewords.write(System.getProperty("line.separator").getBytes());


             countWord++;
         }


         int[] numbOfOccurance=new int[countWord];

         BufferedReader br=new BufferedReader(new FileReader(Check));
         String readline;
         for(int loopIndex=0;loopIndex<countWord;loopIndex++)
         {
          readline=br.readLine();
          String test=keyArrayList.get(loopIndex);
            if(test.equals(readline))
            {
                numbOfOccurance[loopIndex]++;

            }

         }

解决方案

Your method is incredibly slow, you have to search through the entire ArrayList in order to find out if a word appears more than once.

Further, StringTokenizer is deprecated.

May I suggest the following approach:

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.toMap;

public static void main(String[] args) throws Exception {
    final Path path = Paths.get("path", "to", "file");
    final Map<String, Integer> counts = countOccurrences(path);
}

private static Map<String, Integer> countOccurrences(Path path) throws IOException {
    final Pattern pattern = Pattern.compile("[^A-Za-z']+");
    try (final Stream<String> lines = Files.lines(path)) {
        return lines
                .flatMap(pattern::splitAsStream)
                .collect(toMap(identity(), w -> 1, Integer::sum));
    }
}

This uses the Java 8 Stream API to read lines from a file. It then splits the lines on [^A-Za-z']+, i.e. non-word, non-apostrophe, characters - using flatMap to create a Stream of individual words.

We then use a Map to collect the words, for each word we put 1 into the Map. We then use the merging function Integer::sum to add together values already in the Map.

You can then list the contents of the Map, sorted by occurrence, using the following:

counts.entrySet().stream()
        .sorted(Map.Entry.comparingByValue())
        .map(e -> String.format("%s -> %s", e.getKey(), e.getValue()))
        .forEach(System.out::println);

这篇关于couning单词数occurence在一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆