始终保持最佳元素的数据结构 [英] Data structure that always keeps n-best elements
问题描述
我需要一个数据结构,它始终保存到目前为止插入的最大项目(没有特定顺序)的 n
。
I need a data structure that always holds the n
largest items inserted so far (in no particular order).
所以,如果 n
是3,我们可以在下面的会话中插入几个数字,内容的容器更改:
So, if n
is 3, we could have the following session where I insert a few numbers and the content of the container changes:
[] // now insert 1
[1] // now insert 0
[1,0] // now insert 4
[1,0,4] // now insert 3
[1,4,3] // now insert 0
[1,4,3] // now insert 3
[4,3,3]
你得到这个想法数据结构的名称是什么?实现这个的最好方法是什么?或者是在某些图书馆中?
You get the idea. What's the name of the data structure? What's the best way to implement this? Or is this in some library?
我正在考虑使用一个容器,它的元素具有 priority_queue
委托),它使用反向比较,所以 pop
将删除最小的元素。因此, insert
函数首先检查要插入的新元素是否大于最小值。如果是这样,那么我们把最小的值抛出来并推新元素。
I am thinking to use a container that has a priority_queue
for its elements (delegation), which uses the reverse comparison, so pop
will remove the smallest element. So the insert
function first checks if the new element to be inserted is greater than the smallest. If so, we throw that smallest out and push the new element.
(我有一个 C ++
,但问题是语言不可知论。)
(I have a C++
implementation in mind, but the question is language-agnostic nevertheless.)
推荐答案
您想要的具体数据结构可能是 / i>。原始数据结构只是一个数组;为了方便起见,说它是N = 2 ^ n个元素的大小,并且你想保持最大的N-1个元素。
The specific datastructure you want is probably the implicit heap. The raw datastructure is just an array; for convenience, say that it is N=2^n elements in size, and that you want to maintain the largest N-1 elements.
这个想法是对待数组(称为A)为深度为n的完整二叉树:
The idea is to treat the array (call it A) as a complete binary tree of depth n:
- 忽略A [0];将A [1]作为每个节点A [k]的根节点
- ,孩子是A [2 * k]和A [2 * k + 1]
- 节点A [N / 2..N-1]是叶子
- ignore A[0]; treat A[1] as the root node
- for each node A[k], the children are A[2*k] and A[2*k+1]
- nodes A[N/2..N-1] are the leaves
为了将树保持为堆,您需要确保每个节点小于(或等于)它的孩子。这被称为堆条件:
To maintain the tree as a "heap", you need to ensure that each node is smaller than (or equal to) its children. This is called the "heap condition":
- A [k]< = A [2 * k]
- A [k] <= A [2 * k + 1]
- A[k] <= A[2*k]
- A[k] <= A[2*k+1]
要使用堆来维护最大的N个元素:
To use the heap to maintain the largest N elements:
- 请注意,根A [1]是堆中最小的元素。
- 将每个新元素(x)与根相比较:如果它较小(x
- 否则,将新元素插入到堆中,如下所示:
- 从堆中删除根(A [1],最小元素) ,并拒绝它
- 将其替换为新元素(A [1]:= x)
- 现在,恢复堆条件:
- 如果x小于或等于其两个子项,则
- 否则,将最小子代交换x
- 在每个新位置重复测试和交换直到满足堆条件
- note that the root A[1] is the smallest element in the heap.
- compare each new element (x) to the root: if it is smaller (x<A[1]), reject it.
- otherwise, insert the new element into the heap, as follows:
- remove the root (A[1], the smallest element) from the heap, and reject it
- replace it with the new element (A[1]:= x)
- now, restore the heap condition:
- if x is less than or equal to both of its children, you're done
- otherwise, swap x with the smallest child
- repeat the test&swap at each new position until the heap condition is met
基本上这将导致任何替换元素过滤树,直到它达到自然的地方。这最多需要n = log2(N)个步骤,这与你可以得到的一样好。此外,树的隐式形式允许非常快速的实现;现有的有界优先级队列库最有可能使用隐式堆栈。
Basically, this will cause any replacement element to "filter up" the tree until it achieves its natural place. This will take at most n=log2(N) steps, which is as good as you can get. Also, the implicit form of the tree allows a very fast implementation; existing bounded-priority-queue libraries will most likely use an implicit heap.
这篇关于始终保持最佳元素的数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 如果x小于或等于其两个子项,则
- 从堆中删除根(A [1],最小元素) ,并拒绝它