简单的面试问题变得更难了:给定数字 1..100,找到缺失的数字(s),正好给定 k 缺失 [英] Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

查看:32
本文介绍了简单的面试问题变得更难了:给定数字 1..100,找到缺失的数字(s),正好给定 k 缺失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不久前我有一次有趣的工作面试经历.这个问题开始很简单:

<块引用>

Q1:我们有一个包含数字 123、...、 的包100.每个数字只出现一次,所以有 100 个数字.现在从袋子中随机挑选一个数字.找到丢失的号码.

当然,我以前听说过这个面试问题,所以我很快就回答了:

<块引用>

A1:嗯,1 + 2 + 3 + … + N 的数字之和是 (N+1)(N/2)(参见维基百科:算术级数之和).对于 N = 100,总和为 5050.

因此,如果袋子中存在所有数字,则总和将正好是 5050.因为少了一个数,总和会小于这个数,差就是那个数.所以我们可以在 O(N) 时间和 O(1) 空间找到缺失的数字.

此时我以为我做得很好,但突然间问题发生了意想不到的转变:

<块引用>

问题 2:这是正确的,但现在如果缺少 两个 数字,你会怎么做?

我以前从未见过/听说过/考虑过这种变化,所以我很恐慌,无法回答这个问题.面试官坚持要知道我的思维过程,所以我说也许我们可以通过与预期产品的比较来获得更多信息,或者也许从第一遍收集了一些信息之后再做第二遍等等,但我真的只是在拍摄在黑暗中,而不是实际上有一个明确的解决方案.

面试官确实试图鼓励我说第二个方程确实是解决问题的一种方法.在这一点上,我有点不高兴(因为事先不知道答案),并询问这是否是一种通用的(阅读:有用的")编程技术,或者这只是一个技巧/陷阱答案.

面试官的回答出乎我的意料:你可以概括这个技术来找到 3 个缺失的数字.实际上,您可以将其概括为查找 k 个缺失的数字.

<块引用>

Qk:如果包中正好缺少 k 个数字,您将如何有效地找到它?

这是几个月前的事了,我仍然无法弄清楚这是什么技术.显然有一个 Ω(N) 时间下限,因为我们必须至少扫描所有数字一次,但面试官坚持 TIMESPACE 求解技术的复杂性(减去 O(N) 时间输入扫描)定义为 k 而不是 N.

所以这里的问题很简单:

  • 您将如何解决问题 2?
  • 您将如何解决问题 3?
  • 你会如何解决Qk?

说明

  • 通常有 N 个数字来自 1..N,而不仅仅是 1..100.
  • 我不是在寻找明显的基于集合的解决方案,例如使用 位集,通过指定位的值对每个数字的存在/不存在进行编码,因此在额外空间中使用 O(N) 位.我们无法承受任何与 N 成正比的额外空间.
  • 我也不是在寻找明显的排序优先方法.这和基于集合的方法在采访中值得一提(它们很容易实现,并且取决于 N,可以非常实用).我正在寻找圣杯解决方案(实施起来可能实用,也可能不实用,但仍然具有所需的渐进特性).

同样,当然你必须扫描O(N)中的输入,但你只能捕获少量信息(根据k定义,而不是N),然后必须以某种方式找到 k 个缺失的数字.

解决方案

这里总结了 Dimitris Andreou's link.>

记住 i 次幂的总和,其中 i=1,2,..,k.这减少了求解方程组的问题

a1 + a2 + ... + ak = b1

a12 + a22 + ... + ak2 = b2

...

a1k + a2k + ... + akk = bk

使用 牛顿恒等式,知道 bi 允许计算

c1 = a1 + a2 + ... ak

c2 = a1a2 + a1a3+ ... + ak-1ak

...

ck = a1a2 ... ak

如果您展开多项式 (xa1)...(xak),系数将正好是 c1, ..., ck - 参见 Viète 的公式.由于每个多项式因子都是唯一的(多项式环是一个欧几里德域),这意味着一个i 是唯一确定的,直到排列.

这就证明了记忆力足以恢复数字.对于常数 k,这是一个很好的方法.

然而,当 k 变化时,计算 c1,...,ck 的直接方法是非常昂贵的,因为例如ck 是所有缺失数字的乘积,大小为 n!/(n-k)!.为了克服这个问题,在 Zq 字段中执行 计算,其中 q 是一个质数使得 n <= q <2n - 它存在于 Bertrand's postulate.证明不需要改变,因为公式仍然成立,多项式的因式分解仍然是唯一的.您还需要一种对有限域进行因式分解的算法,例如通过 BerlekampCantor-Zassenhaus.

常数 k 的高级伪代码:

  • 计算给定数字的 i 次幂
  • 减去以得到未知数的 i 次幂的总和.将总和称为 bi.
  • 使用牛顿恒等式计算 bi 的系数;称它们为 ci.基本上,c1 = b1;c2 = (c1b1 - b2)/2;有关确切公式,请参阅维基百科
  • 分解多项式 xk-c1xk-1 + ... + ck.
  • 多项式的根是所需的数字 a1, ..., ak.

对于变化的 k,找到一个质数 n <= q <;2n 使用例如Miller-Rabin,并执行所有数字减少模 q 的步骤.

此答案的先前版本指出,可以使用特征 2 (q=2^(log n) 的有限域来代替 Zq,其中 q 是素数).事实并非如此,因为牛顿公式需要除以 k 以内的数字.

I had an interesting job interview experience a while back. The question started really easy:

Q1: We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.

I've heard this interview question before, of course, so I very quickly answered along the lines of:

A1: Well, the sum of the numbers 1 + 2 + 3 + … + N is (N+1)(N/2) (see Wikipedia: sum of arithmetic series). For N = 100, the sum is 5050.

Thus, if all numbers are present in the bag, the sum will be exactly 5050. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number in O(N) time and O(1) space.

At this point I thought I had done well, but all of a sudden the question took an unexpected turn:

Q2: That is correct, but now how would you do this if TWO numbers are missing?

I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.

The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.

The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.

Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?

This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N) time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N) time input scan) is defined in k not N.

So the question here is simple:

  • How would you solve Q2?
  • How would you solve Q3?
  • How would you solve Qk?

Clarifications

  • Generally there are N numbers from 1..N, not just 1..100.
  • I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using O(N) bits in additional space. We can't afford any additional space proportional to N.
  • I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).

So again, of course you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.

解决方案

Here's a summary of Dimitris Andreou's link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a1 + a2 + ... + ak = b1

a12 + a22 + ... + ak2 = b2

...

a1k + a2k + ... + akk = bk

Using Newton's identities, knowing bi allows to compute

c1 = a1 + a2 + ... ak

c2 = a1a2 + a1a3 + ... + ak-1ak

...

ck = a1a2 ... ak

If you expand the polynomial (x-a1)...(x-ak) the coefficients will be exactly c1, ..., ck - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means ai are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c1,...,ck is prohibitely expensive, since e.g. ck is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Zq field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

  • Compute i-th powers of given numbers
  • Subtract to get sums of i-th powers of unknown numbers. Call the sums bi.
  • Use Newton's identities to compute coefficients from bi; call them ci. Basically, c1 = b1; c2 = (c1b1 - b2)/2; see Wikipedia for exact formulas
  • Factor the polynomial xk-c1xk-1 + ... + ck.
  • The roots of the polynomial are the needed numbers a1, ..., ak.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Zq, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.

这篇关于简单的面试问题变得更难了:给定数字 1..100,找到缺失的数字(s),正好给定 k 缺失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆