按总和顺序生成k个元素子集的算法 [英] Algorithm to generate k element subsets in order of their sum

查看:132
本文介绍了按总和顺序生成k个元素子集的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个未排序的大集合 n 个整数(例如其中 2 ^ 20 个),并且想要生成每个元素都具有 k 个元素的子集(其中 k 很小,例如 5 )按其总和递增的顺序,最有效的方法是什么?

If I have an unsorted large set of n integers (say 2^20 of them) and would like to generate subsets with k elements each (where k is small, say 5) in increasing order of their sums, what is the most efficient way to do so?

为什么要以这种方式生成这些子集,是因为我想找到具有满足特定条件的最小和的k元素子集,并且因此,我将条件应用于生成的每个k元素子集。

此外,算法的复杂度是多少?

Also, what would be the complexity of the algorithm?

这里有一个类似的问题:用于获取列表的每个可能子集的算法(按其产品顺序),而无需构建和排序有关生成子集的整个列表(即生成器)按产品顺序排列,但由于集合 n

There is a similar question here: Algorithm to get every possible subset of a list, in order of their product, without building and sorting the entire list (i.e Generators) about generating subsets in order of their product, but it wouldn't fit my needs due to the extremely large size of the set n

我打算在Mathematica中实现该算法,但也可以在C ++或Python中实现。

I intend to implement the algorithm in Mathematica, but could do it in C++ or Python too.

推荐答案

如果需要的属性小潜艇的ts(将其称为 P )非常普遍,一种概率方法可能效果很好:

If your desired property of the small subsets (call it P) is fairly common, a probabilistic approach may work well:


  1. n 个整数排序(对于数百万个整数,即ram的10s至100s MB,这应该不成问题),然后求和 k-1 最小。称此总计偏移量

  2. 生成随机的 k 子集(例如,通过采样 k 个随机数,mod n )并检查它的 P -ness。

  3. 在比赛中,记下子集的总和。从中减去偏移量可以找到等效总和的任何 k 子集中最大元素的上限。

  4. 将您的 n 个整数限制为小于或等于此界限的整数。

  5. 重复(转到2),直到在固定的迭代次数内未找到匹配项为止。

  1. Sort the n integers (for millions of integers i.e. 10s to 100s of MB of ram, this should not be a problem), and sum the k-1 smallest. Call this total offset.
  2. Generate a random k-subset (say, by sampling k random numbers, mod n) and check it for P-ness.
  3. On a match, note the sum-total of the subset. Subtract offset from this to find an upper bound on the largest element of any k-subset of equivalent sum-total.
  4. Restrict your set of n integers to those less than or equal to this bound.
  5. Repeat (goto 2) until no matches are found within some fixed number of iterations.

请注意,初始排序为 O(n log n)。在第4步中隐含的二进制搜索为 O(log n)

Note the initial sort is O(n log n). The binary search implicit in step 4 is O(log n).

显然,如果 P 如此稀有,以至于随机击球不可能获得匹配,这对你没有好处。

Obviously, if P is so rare that random pot-shots are unlikely to get a match, this does you no good.

这篇关于按总和顺序生成k个元素子集的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆