确定性地将集合分箱为子集 [英] Binning a set into subsets deterministically
问题描述
我有一组无序的 n
个唯一的正整数.我想将它划分为 ceil(n/k)
最多 k
个数字(k <
ceil(n/k)
输出集,而不依赖于输入的固定顺序).
I have an unordered set of n
unique, positive integers. I want to partition it into ceil(n / k)
unordered sets of up to k
numbers (k << n
) in a deterministic way (I want to get a sequence of the same ceil(n / k)
output sets without depending on a fixed order of the input).
有没有一种方法可以在对集合进行排序和对结果序列进行分区方面具有更高的算法复杂度?
Is there a way to do so that has superior algorithmic complexity to sorting the set and partitioning the resulting sequence?
推荐答案
这是一种平均性能为 O(n)
的方法,它击败了 O(n log(n))代码>.
Here is an approach whose average performance is O(n)
, which beats O(n log(n))
.
选择 m
和 r <米<n
,其中r = ceil(n/k)
是桶的数量.对于每个数字,将其散列并将其粘贴到 m
个包中.(我故意使用袋子和桶,袋子更小.)这是一个 O(n)
任务.现在我们可以通过遍历 m
个包来构建我们的 r
桶,当一个桶装满时,我们快速选择 m
中的一个> 袋子.遍历那些包是O(m)
,把它们放入r
是O(n)
,然后拆分r-1
在带有 quickselect 的边界上的包是 O(r * (n/m))
,它肯定包含在 O(n)
中.
Pick m
with r < m < n
, where r = ceil(n / k)
is the number of buckets. For each number, hash it and stick it into one of m
bags. (I'm deliberately using bags vs buckets, with bags smaller.) This is a O(n)
task. Now we can construct our r
buckets by running through the m
bags, and when fill a bucket up up, we quickselect a single one of the m
bags. Running through those bags is O(m)
, putting them into the r
is O(n)
, and splitting the r-1
bags on a boundary with quickselect is O(r * (n/m))
which is definitely contained within O(n)
.
这篇关于确定性地将集合分箱为子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!