用于有效地返回哈希表的顶部K个条目的数据结构(地图,字典) [英] Data structure for efficiently returning the top-K entries of a hash table (map, dictionary)
问题描述
这是一个描述:
它的运行方式是一个常规的地图, get
,把
和删除
方法,但是有一个 getTopKEntries(int k)
顶部的K元素按键键入:
对于我的具体用例,我在结构中添加,删除和调整很多值,但是在任何一个时刻有大约500-1000个元素;我想要有效地返回前10个键的条目。
- 我调用
put
和删除
方法 许多 次。 - code> getTopKEntries 方法。
- 我调用
put
和$>
- 我调用
getTopKEntries
方法。
- ...
- 我调用
我希望O(1) get
, put
和删除
操作,而对于 getTopKEntries
仅依赖于K,而不是依赖于地图的大小。
那么,有效地返回地图的顶部K元素的数据结构是什么? / p>
我的其他问题是类似的,但是返回地图的 所有 元素的情况,按键排序。
如果它有帮助,键和值都是4字节的整数。
p>二进制搜索树(即C ++中的 std :: map
)听起来像完美的结构:它已经是字典顺序排列的,即一个简单的按顺序遍历将按升序生成元素。因此,迭代第一个 k 元素将直接产生顶级的 k 元素。
此外,由于您预见很多删除操作,哈希表不会很适合:删除操作会破坏散列表的负载因子特征,这导致运行时的快速恶化。
Here's a description:
It operates like a regular map with get
, put
, and remove
methods, but has a getTopKEntries(int k)
method to get the top-K elements, sorted by the key:
For my specific use case, I'm adding, removing, and adjusting a lot of values in the structure, but at any one time there's approximately 500-1000 elements; I want to return the entries for the top 10 keys efficiently.
- I call the
put
andremove
methods many times. - I call the
getTopKEntries
method. - I call the
put
andremove
methods some more times. - I call the
getTopKEntries
method. - ...
I'm hoping for O(1) get
, put
, and remove
operations, and for getTopKEntries
to be dependent only on K, not on the size of the map.
So what's a data structure for efficiently returning the top-K elements of a map?
My other question is similar, but is for the case of returning all elements of a map, sorted by the key.
If it helps, both the keys and values are 4-byte integers.
A binary search tree (i.e. std::map
in C++) sounds like the perfect structure: it’s already lexicographically ordered, i.e. a simple in-order traversal will yield the elements in ascending order. Hence, iterating over the first k elements will yield the top k elements directly.
Additionally, since you foresee a lot of "remove" operations, a hash table won’t be well-suited anyway: remove operations destroy the load factor characteristics of hash tables which leads to a rapid deterioration of the runtime.
这篇关于用于有效地返回哈希表的顶部K个条目的数据结构(地图,字典)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!