找出数组中的重复元素 [英] Finding out the duplicate element in an array

查看:44
本文介绍了找出数组中的重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个大小为 n 的数组,数组中包含的元素在 1 到 n-1 之间,这样每个元素出现一次,只有一个元素出现多次.我们需要找到这个元素.

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.

尽管这是一个非常常见的常见问题解答,但我仍然没有找到正确的答案.大多数建议是我应该将数组中的所有元素相加,然后从中减去所有索引的总和,但如果元素数量非常大,这将不起作用.它会溢出.也有关于使用异或门dup = dup ^ arr[i] ^ i的建议,我不清楚.

Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.

我提出了这个算法,它是加法算法的增强,将在很大程度上减少溢出的机会!

I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!

for i=0 to n-1
  begin :
    diff = A[i] - i;
    sum  = sum + diff;
  end

diff 包含重复元素,但使用此方法我无法找出重复元素的索引.为此,我需要再次遍历数组,这是不可取的.谁能想出一个更好的解决方案,不涉及加法或异或方法在 O(n) 中工作?

diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?

推荐答案

根据问题描述的限制,您可以通过多种方式思考此问题.

There are many ways that you can think about this problem, depending on the constraints of your problem description.

如果你知道只有一个元素被复制,那么有很多方法可以解决这个问题.一种特别聪明的解决方案是使用按位异或运算符.XOR 具有以下有趣的特性:

If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:

  1. XOR 是关联的,所以 (x ^ y) ^ z = x ^ (y ^ z)
  2. XOR 是可交换的:x ^ y = y ^ x
  3. XOR 是它自己的逆:x ^ y = 0 iff x = y
  4. XOR 以零为单位:x ^ 0 = x

此处的属性 (1) 和 (2) 意味着在对一组值进行 XOR 运算时,将 XOR 应用于元素的顺序无关紧要.您可以根据需要对元素重新排序或对它们进行分组.属性 (3) 意味着如果将相同的值多次异或,则会返回零,而属性 (4) 意味着如果将任何内容与 0 进行异或,则会返回原始数字.把所有这些性质放在一起,你会得到一个有趣的结果:如果你对一组数字进行异或,结果是该组中出现奇数次的所有数字的异或.这样做的原因是,当您对出现偶数次的数字进行异或运算时,您可以将这些数字的异或分解为一组对.每对通过 (3) 异或到 0,并且所有这些零的第一个组合异或通过 (4) 返回零.因此,所有偶数重数都抵消了.

Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.

要使用它来解决原始问题,请执行以下操作.首先,将列表中的所有数字异或.这给出了出现奇数次的所有数字的异或,最终是从 1 到 (n-1) 的所有数字,但重复项除外.现在,将该值与从 1 到 (n-1) 的所有数字的异或进行异或.然后,这会使 1 到 (n-1) 范围内的所有先前未被抵消的数字抵消,只留下重复的值.此外,这在 O(n) 时间内运行并且仅使用 O(1) 空间,因为所有值的 XOR 适合单个整数.

To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.

在您的原始帖子中,您考虑了另一种方法,该方法利用从 1 到 n-1 的整数之和为 n(n-1)/2 这一事实.但是,您担心这会导致整数溢出并导致问题.在大多数机器上,这会导致溢出是正确的,但(在大多数机器上)这不是问题,因为算术是使用固定精度整数完成的,通常是 32 位整数.当发生整数溢出时,产生的数字并非毫无意义.相反,它只是您计算实际结果时得到的值,然后丢弃除最低 32 位之外的所有内容.从数学上讲,这称为模算术,计算机中的运算是以 232 为模完成的.不过,更一般地说,我们假设整数存储在某个固定 k 的模 k 中.

In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 232. More generally, though, let's say that integers are stored modulo k for some fixed k.

幸运的是,您从普通算术中了解和喜爱的许多算术定律仍然适用于模算术.我们只需要更精确地使用我们的术语.如果 x 和 y 在除以 k 时留下相同的余数,我们说 x 与 y 模 k 一致(表示为 x &k y).这在物理机器上工作时很重要,因为当大多数硬件上发生整数溢出时,结果值与真值模 k 一致,其中 k 取决于字长.幸运的是,以下定律在模算术中适用:

Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:

例如:

  1. 如果 x ≡k y 和 w ≡k z,则 x + w ≡k y + z
  2. 如果 x ≡k y 和 w ≡k z,则 xw ≡k yz.
  1. If x ≡k y and w ≡k z, then x + w ≡k y + z
  2. If x ≡k y and w ≡k z, then xw ≡k yz.

这意味着,如果您想通过查找数组元素的总和并减去预期的总和来计算重复值,即使存在整数溢出,一切都会正常进行,因为标准算术仍然会产生硬件中的相同值(模 k).也就是说,您也可以使用基于 XOR 的方法,它根本不需要考虑溢出.:-)

This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)

如果你不能保证恰好有一个元素被复制,但你可以修改元素数组,那么有一个漂亮的算法来寻找重复的值.这个较早的SO问题描述了如何完成这个.直观地说,这个想法是您可以尝试使用 bucket sort 对序列进行排序,其中元素数组本身也被回收来为存储桶保留空间.

If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.

如果你不能保证只有一个元素被复制,并且你不能修改元素的数组,那么问题就难多了.这是一个经典的(而且很难!)面试问题,据报道,Don Knuth 花了 24 小时来解决.诀窍是通过将数组视为一个循环查找的实例来将问题减少到从数字 1-n 到 1-(n-1) 的函数,然后寻找该函数的两个输入.但是,生成的算法称为 Floyd 的循环查找算法,是非常漂亮和简单.有趣的是,它与您用来在线性时间和恒定空间中检测链表中的循环的算法相同.我建议查一下它,因为它会定期出现在软件面试中.

If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.

有关算法的完整描述以及分析、正确性证明和 Python 实现,请查看 这个实现解决了这个问题.

For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.

希望这有帮助!

这篇关于找出数组中的重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆