不能从数组中的数字之和形成的最小数字 [英] Smallest number that can not be formed from sum of numbers from array

查看:24
本文介绍了不能从数组中的数字之和形成的最小数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是在亚马逊面试时问我的 -

This problem was asked to me in Amazon interview -

给定一个正整数数组,你必须找到不能由数组中数字之和组成的最小正整数.

Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.

示例:

Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }


我所做的是:


What i did was :

  1. 对数组进行排序
  2. 计算前缀和
  3. 反转 sum 数组并检查下一个元素是否小于 1大于总和,即 A[j]<=(sum+1).如果不是这样,那么答案会成为 sum+1

但这是 nlog(n) 解决方案.

But this was nlog(n) solution.

面试官对此不满意,在不到 O(n log n) 的时间内问了一个解决方案.

Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.

推荐答案

考虑区间 [2i .. 2i+1 - 1] 中的所有整数.并假设所有小于 2i 的整数都可以由给定数组中的数字之和形成.还假设我们已经知道 C,它是 2i 以下所有数字的总和.如果 C >= 2i+1 - 1,则此区间内的每个数都可以表示为给定数的和.否则我们可以检查区间 [2i .. C + 1] 是否包含给定数组中的任何数字.如果没有这样的数字,我们搜索的就是C+1.

Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.

这是一个算法的草图:

  1. 对于每个输入的数字,判断它属于哪个区间,并更新对应的和:S[int_log(x)] += x.
  2. 计算数组 S 的前缀和:foreach i: C[i] = C[i-1] + S[i].
  3. 过滤数组 C 以仅保留值低于 2 的下一次幂的条目.
  4. 再次扫描输入数组并注意哪个区间 [2i .. C + 1] 包含至少一个输入数字:i = int_log(x) - 1;B[i] |= (x <= C[i] + 1).
  5. 找到第 3 步未过滤掉的第一个区间,并且 B[] 的相应元素未在第 4 步中设置.
  1. For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
  2. Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
  3. Filter array C to keep only entries with values lower than next power of 2.
  4. Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
  5. Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.

如果不清楚为什么我们可以应用第 3 步,这里是证据.选择 2i 和 C 之间的任意一个数,然后按降序依次减去 2i 以下的所有数.最终我们得到一些比最后减去的数字少的数字或零.如果结果为零,只需将所有减去的数字加在一起,我们就得到了所选数字的表示.如果结果是非零且小于最后减去的数字,则该结果也小于 2i,因此它是可表示的"并且没有任何减去的数字用于其表示.当我们将这些减去的数字加回来时,我们就有了所选数字的表示.这也表明,我们可以通过直接跳转到 C 的 int_log 来一次跳过几个间隔,而不是一个一个地过滤间隔.

If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.

时间复杂度由函数int_log()决定,它是整数对数或数字中最高设置位的索引.如果我们的指令集包含整数对数或任何等价物(计算前导零,或浮点数的技巧),则复杂度为 O(n).否则我们可以使用一些小技巧在 O(log log U) 中实现 int_log() 并获得 O(n * log log U) 时间复杂度.(这里 U 是数组中最大的数).

Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).

如果第 1 步(除了更新总和)也会更新给定范围内的最小值,则不再需要第 4 步.我们可以将 C[i] 与 Min[i+1] 进行比较.这意味着我们只需要单次遍历输入数组.或者我们可以不将这个算法应用于数组,而是应用于数字流.

If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.

几个例子:

Input:       [ 4 13  2  3  1]    [ 1  2  3  9]    [ 1  1  2  9]
int_log:       2  3  1  1  0       0  1  1  3       0  0  1  3

int_log:     0  1  2  3          0  1  2  3       0  1  2  3
S:           1  5  4 13          1  5  0  9       2  2  0  9
C:           1  6 10 23          1  6  6 15       2  4  4 13
filtered(C): n  n  n  n          n  n  n  n       n  n  n  n
number in
[2^i..C+1]:  2  4  -             2  -  -          2  -  -
C+1:              11                7                5

对于多精度输入数字,此方法需要 O(n * log M) 时间和 O(log M) 空间.其中 M 是数组中最大的数.仅读取所有数字就需要相同的时间(在最坏的情况下,我们需要它们的每一位).

For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).

仍然可以将这个结果改进为 O(n * log R),其中 R 是该算法找到的值(实际上,它的输出敏感变体).此优化所需的唯一修改是不是一次处理整数,而是逐位处理它们:第一遍处理每个数字的低位(如位 0..63),第二遍 - 下一位(如64..127)等.我们可以在找到结果后忽略所有高阶位.这也将空间需求降低到 O(K) 数,其中 K 是机器字中的位数.

Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.

这篇关于不能从数组中的数字之和形成的最小数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆