找出重复元件以阵列 [英] Finding out the duplicate element in an array

查看:149
本文介绍了找出重复元件以阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有大小为n的阵列和包含在阵列中的元件被在1和n-1,使得每个元素出现一次,只是一个元件发生一次以上。我们需要找到这个元素。

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.

尽管这是一个很常见问题,我还没有找到一个正确的答案。大多数建议是我应该加起来阵列中的所有元素,然后从中减去所有指标的总和,但是这不会工作,如果元素的数量是非常大的。它会溢出。也有一些有关使用异或门 DUP = DUP ^改编[I] ^ I ,这是我不明白。

Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.

我已经提出了这种算法是加法算法的增强,并会减少溢出的机会在很大程度上!

I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!

for i=0 to n-1
  begin :
    diff = A[i] - i;
    sum  = sum + diff;
  end

差异包含重复的元素,但使用这种方法我无法找出重复的元素的索引。对于我需要遍历数组再次这是不可取的。任何人都可以想出一个更好的解决方案,不涉及添加方法或异或方法的工作为O(n)?

diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?

推荐答案

有很多方法,你可以考虑一下这个问题,这取决于你的问题描述的约束。

There are many ways that you can think about this problem, depending on the constraints of your problem description.

如果你知道一个事实,即只有一个元素被复制,然后有很多方法来解决这个问题。一个特别聪明的解决方案是使用按位异或运算符。异或具有下面的有趣的性质:

If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:

  1. XOR是结合的,所以(X ^ Y)^ Z = X ^(Y ^ Z)
  2. XOR是可交换的:X ^ Y = Y ^ X
  3. XOR是其自身的逆:X ^ Y = 0当且仅当X = Y
  4. XOR具有零作为身份:X ^ 0 = X

属性(1)和(2)在这里的意思是,服用一组值的XOR时,它并不重要什么顺序应用的异或的元素。您认为合适的,你可以重新排序的元素或它们分组。物业(3)意味着如果你在一起XOR相同的值多次,你回来零和财产(4),意味着如果你XOR任何东西,和0,你回去你原来的号码。考虑到所有这些属性在一起,你会得到一个有趣的结论:如果你把一组号码的XOR,结果是该组中出现的次数为奇数所有号码的XOR。这样做的原因是,当你XOR一起出现的偶数次号码,可以打破这些数字的异或向上成一组对的。每对异或为0,所有这些零的(3),和第组合异或还给零自(4)。因此,即使多重所有的数字抵消了。

Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything and and 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.

要使用此解决原来的问题,做到以下几点。首先,异或一起列表中的所有的数字。这使显示的奇数倍,从而结束是所有从1到(n-1)的数字以外的重复所有数的异或。现在,异或此值与所有的号码从1到(n-1)的异或运算。这就使得取值范围为1的所有号码(N-1)是不是previously抵消抵消,留下只是重复的值。此外,这种运行在O(n)的时间,只使用O(1)空间,因为所有的值的异或安装到一个整数。

To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.

在你原来的职位,你认为工作原理是利用一个事实,即整数从1到n-1的总和为n(n-1)/ 2的另一种方法。但是,您担心,这会导致整数溢出而导致的问题。在大多数机器上,你是正确的,这将导致溢出,但(在大多数机器)这不是一个问题,因为使用固定precision整数,通常32位整数进行算术运算。当一个整数溢出时,得到的数字是没有意义的。相反,它只是如果你计算的实际结果,你会得到的值,然后下车一切,但最低32位。从数学上来说,这是被称为模运算,并在计算机中的操作完成模2 32 。更一般地,不过,让我们说,整数存储模K中一些固定ķ。

In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.

幸运的是,很多人都知道的算术规律和爱从正常的算术仍持有模运算。我们只是需要更precise用我们的术语。我们说x是全等至y模K(用x表示&当量; <子> K y)的,如果x和y离开相同余数时除以k分配。工作在物理机上的时候,因为当一个整数溢出发生在大多数硬件上,所产生的价值是一致的真值模数,其中k取决于字的大小,这是非常重要的。幸运的是,下列法律持有真正的模块化运算:

Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:

例如:

  1. 如果X'当量; <子> K Y和W&当量; <子> K Z,那么x + W&当量; <子> K Y + Z
  2. 如果X'当量; <子> K Y和W&当量; <子> K Z,那么XW&当量;。<子> K YZ
  1. If x ≡k y and w ≡k z, then x + w ≡k y + z
  2. If x ≡k y and w ≡k z, then xw ≡k yz.

这意味着,如果你想通过查找数组元素的总和,并减去预计总计算重复的值,一切都会迎刃而解,即使存在一个整数溢出,因为标准的算术仍然会产生在硬件的相同的值(模k)的。这就是说,你也可以使用XOR为基础的方法,它不需要在所有考虑溢出。 : - )

This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)

如果你不能保证一个元素是重复的,但可以修改元素的数组,,然后有一个美丽的算法找出重复的值。 <一href="http://stackoverflow.com/questions/5739024/finding-duplicates-in-on-time-and-o1-space/5739336#5739336">This早期的SO质疑介绍了如何做到这一点。直观地说,这个想法是,你可以尝试的顺序进行排序使用桶排序,其中元素的数组本身被回收持有该桶的空间为好。

If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.

如果你不能保证一个元素被复制,并且不能修改元素的数组,,然后问题就更难。这是一个典型的(硬!)面试中的问题,据说花了唐Knuth的24小时内解决。关键是要减少的问题,以周期调查通过从数字1处理数组作为一个功能Ñ​​到1-(N-1),然后寻找两个输入起作用的功能。然而,由此产生的算法,称为弗洛伊德的循环查找算法,极为美观简洁。有趣的是,这是同样的算法,你会用它来检测循环链表的线性时间和恒定的空间。我建议你​​看它,因为它周期性地出现在软件的采访。

If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to the function that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.

有关与分析,正确性证明和Python实现算法的完整描述,检查出的这个实现 ,解决了问题。

For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.

希望这有助于!

这篇关于找出重复元件以阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆