按频率对单词进行排序? (从最小到最大) [英] Sorting words in order of frequency? (least to greatest)

查看:261
本文介绍了按频率对单词进行排序? (从最小到最大)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人知道如何使用内置的collection.sortcomparator<string>接口按单词的频率顺序(从最小到最大)对单词列表进行排序?

does any one have any idea how to sort a list of words in the order of their frequency (least to greatest) using the built in collection.sort and a comparator<string> interface?

我已经有一种方法可以获取文本文件中某个单词的计数.现在,我只需要创建一个比较每个单词计数的方法,然后将它们放入按频率从最低到最大排序的列表中.

I already have a method that gets the count of a certain word in the text file. Now, I just need to create a method that compares the counts of each word and then puts them in a list sorted by the least frequency to the greatest.

任何想法和技巧将不胜感激.我在开始使用此特定方法时遇到了麻烦.

Any ideas and tips would be very much appreciated. I'm having trouble getting started on this particular method.

public class Parser implements Comparator<String> {

    public Map<String, Integer> wordCount;

    void parse(String filename) throws IOException {
        File file = new File(filename);
        Scanner scanner = new Scanner(file);

        //mapping of string -> integer (word -> frequency)
        Map<String, Integer> wordCount = new HashMap<String, Integer>();

        //iterates through each word in the text file
        while(scanner.hasNext()) {
            String word = scanner.next();
            if (scanner.next()==null) {
                wordCount.put(word, 1);
            }
            else {
                wordCount.put(word, wordCount.get(word) + 1);;
                }
            }
            scanner.next().replaceAll("[^A-Za-z0-9]"," ");
            scanner.next().toLowerCase();
        }

    public int getCount(String word) {
        return wordCount.get(word);
    }

    public int compare(String w1, String w2) {
        return getCount(w1) - getCount(w2);
    } 

        //this method should return a list of words in order of frequency from least to   greatest
    public List<String> getWordsInOrderOfFrequency() {
        List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
        //this part is unfinished.. the part i'm having trouble sorting the word frequencies
        List<String> result = new ArrayList<String>();


    }
}

推荐答案

首先,您对scanner.next()的用法似乎不正确. next()将返回下一个单词,并在每次您调用它时移至下一个单词,因此为以下代码:

First of all your usage of scanner.next() seems incorrect. next() will return the next word and move onto next one every time you call it, therefore the following code:

if(scanner.next() == null){ ... }

还有

scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();

将消耗掉然后丢弃单词.您可能想要做的是:

will consume and then just throw away words. What you probably want to do is:

String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();

while循环的开始处

,以便将对单词的更改保存在word变量中,而不仅仅是丢弃.

at the beginning of your while loop, so that the changes to your word are saved in the word variable, and not just thrown away.

第二,wordCount映射的用法略有中断.您要做的是检查地图中是否已经存在word,以确定要设置的字数.为此,您无需查看scanner.next() == null,而应查看地图,例如:

Secondly, the usage of the wordCount map is slightly broken. What you want to do is to check if the word is already in the map to decide what word count to set. To do this, instead of checking for scanner.next() == null you should look in the map, for example:

if(!wordCount.containsKey(word)){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, wordCount.get(word) + 1);
}

或者,您可以执行以下操作:

alternatively you can do this:

Integer count = wordCount.get(word);
if(count == null){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, count+1);
}

我更喜欢这种方法,因为它比较干净,每个单词只进行一次地图查找,而第一种方法有时会进行两次查找.

I would prefer this approach, because it's a bit cleaner, and does only one map look-up per word, whereas the first approach sometimes does two look-ups.

现在,要获取按频率降序排列的单词列表,您可以先将地图转换为列表,然后按照Collections.sort()/109383/how-to-to-sort-a-mapkey-value-on-the-values-in-java>这篇文章.以下是适合您需求的简化版本:

Now, to get a list of words in descending order of frequencies, you can convert your map to a list first, then apply Collections.sort() as was suggested in this post. Below is a simplified version suited to your needs:

static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {

    // Convert map to list of <String,Integer> entries
    List<Map.Entry<String, Integer>> list = 
        new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());

    // Sort list by integer values
    Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
        public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
            // compare o2 to o1, instead of o1 to o2, to get descending freq. order
            return (o2.getValue()).compareTo(o1.getValue());
        }
    });

    // Populate the result into a list
    List<String> result = new ArrayList<String>();
    for (Map.Entry<String, Integer> entry : list) {
        result.add(entry.getKey());
    }
    return result;
}

希望这会有所帮助.

修改: 更改了@ dragon66建议的比较功能.谢谢.

Changed the comparison function as suggested by @dragon66. Thanks.

这篇关于按频率对单词进行排序? (从最小到最大)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆