查找前K个常用词 [英] Finding top K frequent words

查看：64 发布时间：2021/4/2 20:42:24 c++ algorithm

本文介绍了查找前K个常用词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解决Leetcode网站上的一个问题-找到前K个经常出现的单词.

I am trying to solve a question from Leetcode website - Finding top K frequent words.

给出一个非空的单词列表，返回k个最频繁的元素.
您的答案应按频率从高到低的顺序排序.如果两个单词具有相同的频率，则字母顺序较低的单词排在第一位.例如:如果输入为:["the"，"day"，"is"，"sunny"，"the"，"the"，"the"，"sunny"，"is"，"is"]]，k= 4，则输出应为:["the"，"is"，"sunny"，"day"].

Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first. E.g.: if the input is: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4, then the output should be: ["the", "is", "sunny", "day"].

一种受好评的解决方案如下:

One of the upvoted solutions is like below:

class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        unordered_map<string,int> dict;
        for(const string& s:words) dict[s]++;

        priority_queue<pair<string,int>, vector<pair<string,int>>, Comp> pq;
        for(const auto& pa:dict) {
            pq.push(pa);
            if(pq.size()>k) pq.pop();
        }    

        vector<string> result;
        while(!pq.empty()) {
            result.push_back(pq.top().first);
            pq.pop();
        }
        reverse(result.begin(),result.end());    
        return result;    
    }
private:
    struct Comp {
        Comp() {}
        ~Comp() {}
        bool operator()(const pair<string,int>& a, const pair<string,int>& b) {
            return a.second>b.second || (a.second==b.second && a.first<b.first);
        }
    };

};

我试图更好地理解它，并提出一些问题:

I am trying to understand it better and have a few questions:

当 pq.size()> k 时，我们 pop()-这不是错误的，因为在这种情况下，我们丢失了频率最高的元素?我认为是因为根据比较器，具有较高频率的元素(或在频率相等的情况下按字母顺序较小的元素)被插入优先级队列的顶部.
在优先级队列的情况下，当我们实现自己的比较器时，我们必须传递第二个参数(表示要使用的Container)，但是在使用默认比较器时不是必需的-为什么呢?我的意思是，不能根据我要存储的值的类型自动推断出 default 容器类型(第一个参数，在这种情况下为 pair< string，int> )?
如果是 pq.push(pa); ， pa 的类型到底是什么?我想知道，因为 pq 包含 vector< pair< string，int>> ，但是 dict 仅包含 string (在 int 中映射到其频率).如何使用 auto 自动将 string 键映射到其 int 频率以插入到优先级队列中?

When pq.size()>k, we pop() - isn't this incorrect because in that case we are losing the highest frequency elements? I think so because as per the comparator, the elements with the higher frequency (or alphabetically smaller in case of equal frequency) are inserted at the top in the priority queue.
In case of a priority queue, when we implement our own comparator, we have to pass a second parameter (denoting the Container to be used), but not required when we use the default comparator - why so? I mean, can't a default Container type be deduced automatically depending upon the type of values that I would be storing (first parameter, in this case pair<string, int>)?
In case of pq.push(pa);, what exactly is the type of pa? I am wondering because pq contains vector<pair<string, int>>, but the dict only contains string (mapped to their frequencies in int). How does using an auto automatically map the string key to its int frequency for insertion into the priority queue?

道歉这么长的问题.谢谢您的帮助.

Apologies for such long questions. And thank you for your help.

查找前K个常用词 [英] Finding top K frequent words

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

查找前K个常用词 [英] Finding top K frequent words

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭