查找前K个常用词 [英] Finding top K frequent words

查看:64
本文介绍了查找前K个常用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决Leetcode网站上的一个问题-找到前K个经常出现的单词.

I am trying to solve a question from Leetcode website - Finding top K frequent words.

给出一个非空的单词列表,返回k个最频繁的元素.
您的答案应按频率从高到低的顺序排序.如果两个单词具有相同的频率,则字母顺序较低的单词排在第一位.例如:如果输入为:["the","day","is","sunny","the","the","the","sunny","is","is"]],k= 4,则输出应为:["the","is","sunny","day"].

Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. E.g.: if the input is: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4, then the output should be: ["the", "is", "sunny", "day"].

一种受好评的解决方案如下:

One of the upvoted solutions is like below:

class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        unordered_map<string,int> dict;
        for(const string& s:words) dict[s]++;

        priority_queue<pair<string,int>, vector<pair<string,int>>, Comp> pq;
        for(const auto& pa:dict) {
            pq.push(pa);
            if(pq.size()>k) pq.pop();
        }    

        vector<string> result;
        while(!pq.empty()) {
            result.push_back(pq.top().first);
            pq.pop();
        }
        reverse(result.begin(),result.end());    
        return result;    
    }
private:
    struct Comp {
        Comp() {}
        ~Comp() {}
        bool operator()(const pair<string,int>& a, const pair<string,int>& b) {
            return a.second>b.second || (a.second==b.second && a.first<b.first);
        }
    };

};

我试图更好地理解它,并提出一些问题:

I am trying to understand it better and have a few questions:

  1. pq.size()> k 时,我们 pop()-这不是错误的,因为在这种情况下,我们丢失了频率最高的元素?我认为是因为根据比较器,具有较高频率的元素(或在频率相等的情况下按字母顺序较小的元素)被插入优先级队列的顶部.
  2. 在优先级队列的情况下,当我们实现自己的比较器时,我们必须传递第二个参数(表示要使用的Container),但是在使用默认比较器时不是必需的-为什么呢?我的意思是,不能根据我要存储的值的类型自动推断出 default 容器类型(第一个参数,在这种情况下为 pair< string,int> )?
  3. 如果是 pq.push(pa); pa 的类型到底是什么?我想知道,因为 pq 包含 vector< pair< string,int>> ,但是 dict 仅包含 string (在 int 中映射到其频率).如何使用 auto 自动将 string 键映射到其 int 频率以插入到优先级队列中?
  1. When pq.size()>k, we pop() - isn't this incorrect because in that case we are losing the highest frequency elements? I think so because as per the comparator, the elements with the higher frequency (or alphabetically smaller in case of equal frequency) are inserted at the top in the priority queue.
  2. In case of a priority queue, when we implement our own comparator, we have to pass a second parameter (denoting the Container to be used), but not required when we use the default comparator - why so? I mean, can't a default Container type be deduced automatically depending upon the type of values that I would be storing (first parameter, in this case pair<string, int>)?
  3. In case of pq.push(pa);, what exactly is the type of pa? I am wondering because pq contains vector<pair<string, int>>, but the dict only contains string (mapped to their frequencies in int). How does using an auto automatically map the string key to its int frequency for insertion into the priority queue?

道歉这么长的问题.谢谢您的帮助.

Apologies for such long questions. And thank you for your help.

推荐答案

  1. 您并没有真正取出频率最高的那些,因为已将顺序倒置到优先级队列中.实际上,最后有一个反向调用,以 right 顺序输出元素.请注意,在文档中已对此进行了明确说明.
  1. You are not really taking out the ones with the highest frequency because the ordered is reversed into the priority queue. Infact there is a reverse call at the end to output the elements in the right order. Note that is it made clear in the documentation.

来自文档

请注意,定义Compare参数以使其返回true如果它的第一个论点在弱点中先于第二个论点订购.但是因为优先级队列输出最大的元素首先,出现在"之前的元素是先于".实际上是最后输出.那是,队列的最前面包含最后一个"消息.元素根据比较所施加的弱排序.

Note that the Compare parameter is defined such that it returns true if its first argument comes before its second argument in a weak ordering. But because the priority queue outputs largest elements first, the elements that "come before" are actually output last. That is, the front of the queue contains the "last" element according to the weak ordering imposed by Compare.

  1. pa 的类型是在 unordered_map< string,int> :: value_type 中指定的类型,它是 std :: pair< const string,int>.实际上, unordered_map< K,V> :: value_type 的每个元素都是 std :: pair< const K,V> 的typedef,并且由于 priority_queue存储 std :: pair< string,int> 不会出现奇怪的事情.
  1. The type of pa is the type specified in unordered_map<string,int>::value_type which is std::pair<const string,int>. Infact each element of a unordered_map<K, V>::value_type, is a typedef for std::pair<const K, V> And since the priority_queue stores std::pair<string,int> there is not strange things appening.

希望这会有所帮助.

这篇关于查找前K个常用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆