算法:更好地计算单词列表的频率 [英] Algorithm: A Better Way To Calculate Frequencies of a list of words

查看:241
本文介绍了算法:更好地计算单词列表的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题其实很简单,但我想在跳转到编码之前听到一些想法。给出每一行中有一个单词的文件,计算大多数n个频繁数字。

This question is actually quite simple yet I would like to hear some ideas before jumping into coding. Given a file with a word in each line, calculating most n frequent numbers.

我心中弹出的第一个但不幸的是使用 std :: map 。我知道C ++的人会说, unordered_map 将是非常合理的。

The first and unfortunately only thing that pops up in my mind use to use a std::map. I know fellow C++'ers will say that unordered_map would be so much reasonable.

我想知道是否有任何东西可以添加到算法方面,或者这只是基本上选择最佳数据结构胜利类型的问题。我已经通过互联网进行了搜索,并读取该哈希表,优先级队列可能会提供一个运行时间为 O(n)的算法,但是我认为这将是复杂的,以实现

I would like to know if anything could be added to the algorithm side or this is just basically 'whoever picks the best data structure wins' type of question. I've searched it over the internet and read that hash table and a priority queue might provide an algorithm with O(n) running time however I assume it will be to complex to implement

任何想法?

推荐答案

用于此任务的最佳数据结构是Trie :

The best data structure to use for this task is a Trie:

http://en.wikipedia.org/wiki/Trie

它将胜过计数字符串的哈希表。

It will outperform a hash table for counting strings.

这篇关于算法:更好地计算单词列表的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆