产生顶的k值 [英] generate top k values

查看:158
本文介绍了产生顶的k值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我要确保,如果我是最有效地这样做。我有大小N的浮点值的值都为0和1之间的数组。

I have a problem and I want to make sure if I am doing it most efficiently. I have an array A of float values of size N. The values are all between 0 and 1.

我要找到顶尖的k值可以是最多三个数字从A那么,前k列表可以的产物 有个体数从A,两个数字或三个数字商品的商品从A

I have to find top k values which can be a product of a maximum of three numbers from A. So, the top-k list can have individual numbers from A, product of two numbers or product of three numbers from A.

所以,这就是我现在怎么做。我可以在desecding顺序O(Nlogk)时间前k个数字。然后创建一个 最大堆和最大尺寸3即最佳值初始化它,如果我重新present的k值作为B的有序数组(降序) 并通过其在该数组索引号码,我插入,它们在索引(0),(0,1)和(0,1,2)的数字。接下来,我就堆进行提取和 每当我提取大小ž(Z号的产品)的价值,我所设定的下一个可能的大小ž数,即进行更换 如果假设(2,4)提取出来,我可以用(3,4)和(2,5)取代它。做提取k次才能得到结果。

So, this is how I am doing it now. I can get top-k numbers in desecding order in O(Nlogk) time. I then create a max-heap and initialize it with best values of maximum size 3 i.e. if I represent the sorted array(descending) of k values as B and the numbers by its index in that array, I insert numbers which are at index (0), (0,1) and (0,1,2). Next, I perform extract on heap and whenever I extract a size z (product of z numbers) value, I replace it with the set of next possible size z numbers i.e. if suppose (2,4) is extracted, I can replace it with (3,4) and (2,5). And do extract k times to get results.

需要更好的想法,如果你有。 感谢所有。

Need better ideas if you have. Thanks all.

推荐答案

如果我理解正确,你需要得到k可由1,从你的清单2或3个元素,而所有这些数值相乘生产数量最多是浮动介于0和1点数。

if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.

很明显,你只需要在列表中考虑k个最高的数字。其余的可以马上丢弃。您可以使用您为O(n日志k)的算法,让他们,再按照排序顺序(我假设你的清单并不preordered)。为了简化问题,你现在可以把他们的对数,并尽量让这些数字,而不是最大化产品的原问题的总和。这可能有点加速。

It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn't preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.

现在(考虑对数presentation),所有的数字是负的,因此增加更多的人在一起只会创造更多的负数。

Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.

让我们把第k人数最多A1 ...阿克。我们可以进一步降低的问题,现在假定存在也号A0,具有在日志重新presentation在原始重新presentation值0和1;那么问题是枚举前k 3元组(X,Y,Z在{A0,...,阿克})与约束是x≥ Y'GEQ; z和的z,其中, A0。让我们表示3元组由[I,J,n]和在该元组被S [I,J,n]的元素的总和。要报告的第一个元素是很明显[0,0,1],即相当于在原来的问题制定的名单上的单#1的值。

Let's call the k highest numbers A1...Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,...,Ak}) with the constraint that x ≥ y ≥ z and that z < A0. Let's denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.

我们用最大堆在原有的制定;我们推三元堆,用它们的和(S [...])作为排序键。算法开始时,按[0,0,0]堆。然后:

We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[...]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:

answer = []
for m in 0 .. k:
  top = heap.pop()
  answer.append(sum(top))
  (i,j,n) = top # explode the tuple
  if (n < k - 1):
      heap.push((i,j,n+1))
  if (j == n):
      heap.push((i,j+1,j+1))
      if (i == j):
          heap.push((i+1,i+1,i+1))

目前结束时,答案包含K + 1个元素,它们中的第一个是[0,0,0],它必须被丢弃。

At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.

让被指定为-1,-3,-8,-9。则算法进行这样的:

Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:

Heap
Top          Rest (shown in order)

[ 0, 0, 0] | 
[ 0, 0,-1] | [ 0,-1,-1] [-1,-1,-1]
[ 0,-1,-1] | [-1,-1,-1] [ 0,-1,-3] [ 0,-3,-3]
[-1,-1,-1] | [-1,-1,-2] [ 0,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-2] | [ 0,-1,-3] [-1,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-3] | [-1,-1,-3] [ 0,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-3] | [ 0,-1,-4] [-1,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-4] | [-1,-2,-2] [-1,-1,-4] [ 0,-1,-5] [-2,-2,-2] [ 0,-3,-3]
...
etc.

这个算法的好处是,它不枚举重复和堆的大小是O(K);明白为什么,看到该算法在每次迭代中添加元素的最大堆(通常较少),因此k次叠后不能有多于堆2K元素。

The nice thing about this algorithm is that it doesn't enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.

这给了那么运行时间为O(n日志K + K数K)= O((N + K)登录K)。

This gives then running time O(n log k + k log k) = O((n + k) log k).

这篇关于产生顶的k值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆