如何在给定文本中最多得到N个词,从最大到最小排序? [英] How to get N most often words in given text, sorted from max to min?

查看:186
本文介绍了如何在给定文本中最多得到N个词,从最大到最小排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经给了一个大文本作为输入。我已经制作了一个HashMap,它将每个不同的单词作为一个键,以及以值(Integer)形式出现的次数。



现在我必须创建一个名为 mostOften(int k):List 的方法,它返回一个列表,它给出了第一个k-使用我之前制作的HashMap的最大出现次数与最小出现次数(降序)。
问题是,每当2个字有相同的出现次数时,应按字母顺序排列。



我想到的第一个想法是交换给定的HashMap的键和值,并将其放入TreeMap,TreeMap将按键进行排序(整数 - 单词的出现次数),然后从TreeMap中弹出最后/第一个K条目。



但是,当我的数量2或3个字是相同的。我会按字母顺序比较字母,但是应该把Integer作为第二个字的关键字。



任何想法如何实现这个或其他选项? >

解决方案

这是我来的解决方案。


  1. 首先创建一个可以存储 MyWord >字符串可以实现 Comparable 对于这个类,首先按照发生次序进行排序,然后按字母顺序排列出现的次数。

  2. 然后对于最常用的方法,您将创建一个新的 List MyWord 从您原来的地图。您将这些条目添加到您的列表

  3. 您排序此列表

  4. 使用 subList

  5. 添加这些字符串列表< String> ,然后返回






  public class Test {
public static void main(String [] args){
Map< String,Integer> m = new HashMap<>();
m.put(hello,5);
m.put(halo,5);
m.put(this,2);
m.put(that,2);
m.put(good,1);
System.out.println(mostOften(m,3));
}

public static List< String> mostOften(Map< String,Integer> m,int k){
列表< MyWord> l = new ArrayList<>();
for(Map.Entry< String,Integer>条目:m.entrySet())
l.add(new MyWord(entry.getKey(),entry.getValue()));

Collections.sort(l);
列表< String> list = new ArrayList<>();
for(MyWord w:l.subList(0,k))
list.add(w.word);
返回列表;
}
}

class MyWord实现Comparable< MyWord> {
public String word;
public int occurence;

public MyWord(String word,int occurence){
super();
this.word = word;
this.occurence =发生;
}

@Override
public int compareTo(MyWord arg0){
int cmp = Integer.compare(arg0.occurence,this.occurence);
返回cmp!= 0? cmp:word.compareTo(arg0.word);
}

@Override
public int hashCode(){
final int prime = 31;
int result = 1;
result = prime * result + occurence;
result = prime * result +((word == null)?0:word.hashCode());
返回结果;
}

@Override
public boolean equals(Object obj){
if(this == obj)
return true;
if(obj == null)
return false;
if(getClass()!= obj.getClass())
return false;
MyWord other =(MyWord)obj;
if(occurence!= other.occurence)
return false;
if(word == null){
if(other.word!= null)
return false;
} else if(!word.equals(other.word))
return false;
返回true;
}

}

输出: [晕,你好,那]


I have been given a large text as input. I have made a HashMap that stores each different word as a key, and number of times that occurs as value (Integer).

Now I have to make a method called mostOften(int k):List that return a List that gives the first k-words that from max number of occurrence to min number of occurrence ( descending order ) using the HashMap that I have made before. The problem is that whenever 2 words have the same number of occurrence, then they should be sorted alphabetically.

The first idea that was on my mind was to swap keys and values of the given HashMap, and put it into TreeMap and TreeMap will sort the words by the key(Integer - number of occurrence of the word ) and then just pop the last/first K-entries from the TreeMap.

But I will have collision for sure, when the number of 2 or 3 words are the same. I will compare the words alphabetically but what Integer should I put as a key of the second word comming.

Any ideas how to implement this, or other options ?

解决方案

Here's the solution with I come up.

  1. First you create a class MyWord that can store the String value of the word and the number of occurences it appears.
  2. You implement the Comparable interface for this class to sort by occurences first and then alphabetically if the number of occurences is the same
  3. Then for the most often method, you create a new List of MyWord from your original map. You add the entries of this to your List
  4. You sort this list
  5. You take the k-first items of this list using subList
  6. You add those Strings to the List<String> and you return it


public class Test {
    public static void main(String [] args){
        Map<String, Integer> m = new HashMap<>();
        m.put("hello",5);
        m.put("halo",5);
        m.put("this",2);
        m.put("that",2);
        m.put("good",1);
        System.out.println(mostOften(m, 3));
    }

    public static List<String> mostOften(Map<String, Integer> m, int k){
        List<MyWord> l = new ArrayList<>();
        for(Map.Entry<String, Integer> entry : m.entrySet())
            l.add(new MyWord(entry.getKey(), entry.getValue()));

        Collections.sort(l);
        List<String> list = new ArrayList<>();
        for(MyWord w : l.subList(0, k))
            list.add(w.word);
        return list;
    }
}

class MyWord implements Comparable<MyWord>{
    public String word;
    public int occurence;

    public MyWord(String word, int occurence) {
        super();
        this.word = word;
        this.occurence = occurence;
    }

    @Override
    public int compareTo(MyWord arg0) {
        int cmp = Integer.compare(arg0.occurence,this.occurence);
        return cmp != 0 ? cmp : word.compareTo(arg0.word);
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + occurence;
        result = prime * result + ((word == null) ? 0 : word.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyWord other = (MyWord) obj;
        if (occurence != other.occurence)
            return false;
        if (word == null) {
            if (other.word != null)
                return false;
        } else if (!word.equals(other.word))
            return false;
        return true;
    }   

}

Output : [halo, hello, that]

这篇关于如何在给定文本中最多得到N个词,从最大到最小排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆