数组删除重复元素 [英] Array remove duplicate elements

查看:42
本文介绍了数组删除重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个未排序的数组,删除元素的所有重复项(如果存在)的最佳方法是什么?

I have an unsorted array, what is the best method to remove all the duplicates of an element if present?

例如:

a[1,5,2,6,8,9,1,1,10,3,2,4,1,3,11,3]

所以在该操作之后数组应该看起来像

so after that operation the array should look like

 a[1,5,2,6,8,9,10,3,4,11]

推荐答案

检查每个元素与每个其他元素

天真的解决方案是将每个元素与其他每个元素进行检查.这是一种浪费,会产生 O(n2) 解决方案,即使您只是前进".

Check every element against every other element

The naive solution is to check every element against every other element. This is wasteful and yields an O(n2) solution, even if you only go "forward".

更好的解决方案是对数组进行排序,然后检查每个元素到它旁边的元素以查找重复项.选择一个有效的排序,这是 O(n log n).

A better solution is sort the array and then check each element to the one next to it to find duplicates. Choose an efficient sort and this is O(n log n).

基于排序的解决方案的缺点是无法保持顺序.然而,一个额外的步骤可以解决这个问题.将所有条目(在唯一的排序数组中)放入一个哈希表,该表具有 O(1) 访问权限.然后迭代原始数组.对于每个元素,检查它是否在哈希表中.如果是,则将其添加到结果中并从哈希表中删除.您最终会得到一个结果数组,该数组的顺序与原始数组的顺序相同,每个元素的位置与其第一次出现的位置相同.

The disadvantage with the sort-based solution is order is not maintained. An extra step can take care of this however. Put all entries (in the unique sorted array) into a hashtable, which has O(1) access. Then iterate over the original array. For each element, check if it is in the hash table. If it is, add it to the result and delete it from the hash table. You will end up with a resultant array that has the order of the original with each element being in the same position as its first occurrence.

如果您正在处理某个固定范围的整数,您可以通过使用基数排序做得更好.例如,如果假设所有数字都在 0 到 1,000,000 的范围内,则可以分配大约 1,000,001 的位向量.对于原始数组中的每个元素,您可以根据其值设置相应的位(例如,值 13 会导致设置第 14 位).然后遍历原数组,检查是否在位向量中.如果是,将其添加到结果数组并从位向量中清除该位.这是 O(n),用空间换时间.

If you're dealing with integers of some fixed range you can do even better by using a radix sort. If you assume the numbers are all in the range of 0 to 1,000,000 for example, you can allocate a bit vector of some 1,000,001. For each element in the original array, you set the corresponding bit based on its value (eg a value of 13 results in setting the 14th bit). Then traverse the original array, check if it is in the bit vector. If it is, add it to the result array and clear that bit from the bit vector. This is O(n) and trades space for time.

这让我们找到了最好的解决方案:尽管有用,但实际上是一种干扰.创建一个具有 O(1) 访问权限的哈希表.遍历原始列表.如果它不在哈希表中,则将其添加到结果数组中并将其添加到哈希表中.如果它在哈希表中,则忽略它.

Which leads us to the best solution of all: the sort is actually a distraction, though useful. Create a hashtable with O(1) access. Traverse the original list. If it is not in the hashtable already, add it to the result array and add it to the hash table. If it is in the hash table, ignore it.

这是迄今为止最好的解决方案.那为什么剩下的呢?因为像这样的问题是关于使您拥有(或应该拥有)的知识适应问题,并根据您做出的假设将它们改进为解决方案.制定解决方案并理解其背后的想法远比重复提出解决方案有用得多.

This is by far the best solution. So why the rest? Because problems like this are about adapting knowledge you have (or should have) to problems and refining them based on the assumptions you make into a solution. Evolving a solution and understanding the thinking behind it is far more useful than regurgitating a solution.

此外,哈希表并不总是可用的.以嵌入式系统或空间非常有限的东西为例.您可以在少数操作码中实现快速排序,远少于任何哈希表.

Also, hash tables are not always available. Take an embedded system or something where space is VERY limited. You can implement an quick sort in a handful of opcodes, far fewer than any hash table could be.

这篇关于数组删除重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆