找到K的非重复元素的列表"小"额外的空间 [英] Find the k non-repeating elements in a list with "little" additional space

查看:161
本文介绍了找到K的非重复元素的列表"小"额外的空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原始问题语句是这个:

鉴于32位无符号整数数组中的每个数字除他们三人恰好出现两次(这恰好出现一次),发现这三个数字在O(n)的使用O(1)额外的空间时间。输入数组是只读的。如果有k个例外,而不是3'

Given an array of 32bit unsigned integers in which every number appears exactly twice except three of them (which appear exactly once), find those three numbers in O(n) time using O(1) extra space. The input array is read-only. What if there are k exceptions instead of 3?

这很容易在来解决这个Ο(1)时间和Ο(1)的空间,如果你接受输入限制的非常高的常数因子,因为(该阵列最多只能有2 33 项):

It's easy to solve this in Ο(1) time and Ο(1) space if you accept a very high constant factor because of the input restriction (the array can have at most 233 entries):

for i in lst:
    if sum(1 for j in lst if i == j) == 1:
        print i

所以,为了这个问题,让我们放下限制在位长度,集中力量,其中数字最多可以有 M 的更一般的问题位。

So, for the sake of this question, let's drop the restriction in bit length and concentrate on the more general problem where the numbers can have up to m bits.

泛化的算法对于k = 2 ,我脑子里想的是以下内容:

Generalizing an algorithm for k = 2, what I had in mind is the following:

  1. 在XOR那些 1 的最低显著位数字和那些具有 0 分开。如果这两个分区,所得到的值不为零,我们知道,我们已经进行分配不重复的号码分成两组,其中每一个具有至少一个部件
  2. 对于每一个这些群体,试图通过检查第二至少显著位等进一步划分为
  1. XOR those numbers with a least significant bit of 1 and those with a 0 separately. If for both of the partitions, the resulting value is not zero, we know that we have partitioned the non-repeating numbers into two groups, each of which has at least one member
  2. For each of those groups, try to partition it further by examining the second-least significant bit and so on

有一个特殊的情况需要考虑,虽然。如果划分的一组后,所述组之一的异或值均为零,我们不知道是否将所得的子组中的一个是空的或不是。在这种情况下,我的算法只是离开这个有点出入,并与下一个,这是不正确的继续,例如,它失败的输入 [0,1,2,3,4,5,6]

There is a special case to be considered, though. If after partitioning a group, the XOR values of one of the groups are both zero, we don't know whether one of the resulting sub-groups is empty or not. In this case my algorithm just leaves this bit out and continues with the next one, which is incorrect, for example it fails for the input [0,1,2,3,4,5,6].

现在的想法我不得不为计算元件不仅异或,也施加一定的函数(我选择后的值的异或 F(X)= 3×+ 1 点击这里)。请参阅下面叶夫根的答案一个反例为这个额外的检查。

Now the idea I had was to compute not only the XOR of the element, but also the XOR of the values after applying a certain function (I had chosen f(x) = 3x + 1 here). See Evgeny's answer below for a counter-example for this additional check.

现在虽然下面的算法是不正确的K> = 7 ,我还包括执行在这里给你一个想法:

Now although the below algorithm is not correct for k >= 7, I still include the implementation here to give you an idea:

def xor(seq):
  return reduce(lambda x, y: x ^ y, seq, 0)

def compute_xors(ary, mask, bits):
  a = xor(i for i in ary if i & mask == bits)
  b = xor(i * 3 + 1 for i in ary if i & mask == bits)
  return a if max(a, b) > 0 else None

def solve(ary, high = 0, mask = 0, bits = 0, old_xor = 0):
  for h in xrange(high, 32):
    hibit = 1 << h
    m = mask | hibit
    # partition the array into two groups
    x = compute_xors(ary, m, bits | hibit)
    y = compute_xors(ary, m, bits)
    if x is None or y is None:
      # at this point, we can't be sure if both groups are non-empty,
      # so we check the next bit
      continue
    mask |= hibit
    # we recurse if we are absolutely sure that we can find at least one
    # new value in both branches. This means that the number of recursions
    # is linear in k, rather then exponential.
    solve(ary, h + 1, mask, bits | hibit, x)
    solve(ary, h + 1, mask, bits, y)
    break
  else:
    # we couldn't find a partitioning bit, so we output (but 
    # this might be incorrect, see above!)
    print old_xor

# expects input of the form "10 1 1 2 3 4 2 5 6 7 10"
ary = map(int, raw_input().split())
solve(ary, old_xor=xor(ary))

这是我的分析,这code的 0的最坏情况下的时间复杂度(K *平方米* N),其中 N 输入元素的数量(异或为 O(M)和最多 K 分区操作才能成功)和空间复杂度 O(平方米)(因为 M 是最大递归深度和临时数可以是长度 M )。

From my analysis, this code has a worst-case time complexity of O(k * m² * n) where n is the number of input elements (XORing is O(m) and at most k partitioning operations can be successful) and space complexity O(m²) (because m is the maximum recursion depth and the temporary numbers can be of length m).

现在的问题是,当然,如果有一个的正确的,具有良好的渐近运行时有效的方法(我们假设 K&n种;&LT和 M&LT; n种这里为了完整性),这还需要一些额外的空间(例如,接近那种输入将不被接受,因为我们倒是需要至少 O(N)对于额外的空间,因为我们不能修改输入!)。

The question is of course if there is a correct, efficient approach with good asymptotic runtime (let's assume that k << n and m << n here for the sake of completeness), which also needs little additional space (for example, approaches that sort the input will not be accepted, because we'd need at least O(n) additional space for that, as we can't modify the input!).

编辑:现在,上面的算法被证明是不正确的,这当然会很高兴地看到它如何能作出正确的,可能是使它少了几分effient。空间的复杂性应该是 O(N * M)(即次线性输入比特的总数)。它会好起来的采取 K 作为额外的输入,如果使任务变得更加容易。

Now that the algorithm above is proven to be incorrect, it would of course be nice to see how it could be made correct, possibly by making it a bit less effient. Space complexity should be in o(n*m) (that is, sublinear in the total number of input bits). It would be okay to take k as an additional input if that makes the task easier.

推荐答案

一个概率的方法采取将使用的计数过滤

One probabilistic approach to take would be to use a counting filter.

的算法如下:

  1. 线性扫描阵列和更新计数过滤器。
  2. 线性扫描阵列和创建的所有元素的集合,肯定这是不是算2中的过滤器,这将是&LT; = K 真正的解决方案。 (在这种情况下,假阳性是看起来像他们独特元件没有)。
  3. 选择了哈希函数在新的基础和重复,直到我们都 K 解决方案。
  1. Linearly scan the array and 'update' the counting filter.
  2. Linearly scan the array and create a collection of all elements which aren't certainly of count 2 in the filter, this will be <= k of the real solutions. (The false positives in this case are unique elements which look like they aren't).
  3. Chose a new basis of hash functions and repeat until we have all k solutions.

本使用2米位(的独立ñ)。时间复杂度为更多地参与,但我们知道,任何给定的唯一元素是不是在步骤2中找到的概率约为(1 - E ^( - KN / M))^氏/ code>我们将解析到一个解决方案非常迅速,但不幸的是我们没有完全线性的 N

This uses 2m bits of space (independant of n). The time complexity is more involved, but knowing that the probability that any given unique element is not found in step 2 is approx (1 - e^(-kn/m))^k we will resolve to a solution very quickly, but unfortunatly we are not quite linear in n.

余AP preciate,这不能满足你的约束,因为它是超线性时间,并且是概率性的,但在给定初始条件可能不令人满意的这 方法可能是值得考虑的。

I appreciate that this doesn't satisfy your constraints as it is super-linear in time, and is probabilistic, but given the original conditions may not be satisfiable this approach may be worth considering.

这篇关于找到K的非重复元素的列表&QUOT;小&QUOT;额外的空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆