如何从Python中的列表中删除重复项 [英] How to remove duplicate entries from a list in python

查看：133 发布时间：2015/11/30 22:23:26 python algorithm sorting hashmap

本文介绍了如何从Python中的列表中删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

近日在接受采访时我被要求写一个python code从列表中删除所有重复的项目。

Recently in an interview I was asked to write a python code to remove all duplicate entries from a list.

例如：

Input List = {1,2,4,5,2,3,1}
Expected Output List = {4,5,3}

在上面的例子中，1的＆安培; 2的多次出现，他们应该被删除。而为了preservation是很重要的。这是提出的问题。

In the above example, 1's & 2's appear more than once and they should be removed. And order preservation is important. This was the question asked.

此外，他不想让我用内置的功能，如设置（），独特的（）等。他是我的测试算法和DS的技能，我猜。他已经在开始做这一点。

Again he did not want me to use inbuilt functions like set(), unique() etc. He was testing my algorithm and ds skills, i guess. He had made this clear in the start.

我想到了2做法当场。 1.）排序（n日志的复杂度（n））的 2）哈希表（更快的排序）

I thought of 2 approaches on the spot. 1.) Sorting (complexity of nlog(n)) 2.) HashTable (faster that sorting)

哈希表的方法：

arr = [1,2,4,5,2,3,1]

//function : to create a hash table with key = arr[i] & value = occurence count
def dataCountTable(arr):
    countTable = {}
    i = 0
    while i<len(arr) :
        if arr[i] in countTable : 
            countTable[arr[i]] += 1
        else :
        countTable[arr[i]] = 1
    i+=1
return countTable

//function : to remove duplicates using the arr & hash table
def rmvAllDuplicate(arr, countTable):
    outList = list()
    i = 0
    while i<len(arr) :
        if countTable[arr[i]] == 1 :
            outList.append(arr[i]);
    i+=1
return outList

print rmvAllDuplicate(arr, dataCountTable(arr))

面试官似乎没有pssed这个答案IM $ P $。它使我在寻找一个更好的aprooach。我could'nt找到一个。

The interviewer did not seem impressed with this answer. And it kept me searching for a better aprooach. I could'nt find one.

如果有人可以帮助我提高我的解决方案或建议一个新的，更好的解决方案，这将是很好！

If someone could help me improve my solution or suggest a new and a better solution, it will be nice!

谢谢！

推荐答案

虽然你的哈希表的实现可以更简洁，易读，习惯，用小提升速度，我怀疑是不是你的面试官感到失望

Although your hash-table implementation could be made more concise, readable, and idiomatic, with a small boost in speed, I suspect that isn't what your interviewer was disappointed in.

更可能的是，他推你，希望更好的解决方案，你会拿出一个说法，为什么你的解决方案是有效的优化，而是，你搜索了一会儿，就放弃了。

More likely, he pushed you for a better solution in hopes that you would come up with an argument why your solution was effectively optimal, but instead, you searched for a while and gave up.

那么，有多个东西来证明这里：

So, there are multiple things to prove here:

任何解决这个问题，就必须是O（n）的时间。
您的解决方案是（摊销，平均和 - 几乎总是）O（N）的时间。
在您的解决方案的时间复杂度的倍数是合理的。
任何解决这一问题的那O（N）时间必须0（M）的空间（其中，M是不同值的数目）。
您的解决方案是O（M）的空间。
在您的解决方案的空间复杂度的乘数是合理的。

即使是容易的，你不会拿出在接受采访时实际的严格证明。而其中的一些，你可能甚至无法做出令人信服的理由，但提高可能的异常，并承认在那里你挥舞着你的手就足够了。例如：

Even the easy ones you're not going to come up with an actual rigorous proof in an interview. And some of them, you may not even be able to make a compelling case—but raising possible exceptions and acknowledging where you're waving your hands may be sufficient. For example:

在Python的字典（和设置）的O（N）最坏情况下的时间;它只有O（1）摊销平均情况。有什么关于你的数据，将使其有可能为O要差一些（1）？或许不会，但是......如果这是有人想拒绝一个服务器的一部分，他们可以送他们想要的任何输入数据？
所有他给你小整数的值。是保证是真的吗？在这种情况下，不使用的字典为您计数，只要使用名单（范围（0，P）），其中 P 为最大数。然后，它的O（P）的空间，这听起来比0（M）更糟糕，除了乘数会小了很多，一个列表需要大约1/3的空间（只值，而不是哈希值，钥匙，和值），因此，如果 P＆LT;＆LT; M / 3 这是一个双赢。而且这也可能对速度的胜利，因为没有必要让哈希值。你还可以用 array.array 做得更好。
在一个Python哈希表是矫枉过正，用于存储两套和类型的字典用小的数值。可以自定义的哈希表切东西显著，还是不够的值得吗？

Python's dict (and set) has O(N) worst-case time; it's only O(1) amortized average-case. Is there anything about your data that would make it likely to be worse than O(1)? Probably not, but… what if this were part of a server that someone wanted to DoS, and they could send any input data they wanted?
All of the values he gave you were small integers. Is that guaranteed to be true? In that case, don't use a dict for your counts, just use list(range(0, P)) where P is the max number. Then it's O(P) space, which sounds worse than O(M), except that the multiplier will be a lot smaller—a list takes about 1/3rd the space (just values, rather than hashes, keys, and values), so if P << M/3 it's a win. And it's also probably a win on speed, because there's no need to keep hashing values. And you can do even better with array.array.
A Python hash table is overkill for storing both sets and dicts with small counts. Could a custom hash table cut things significantly, or not enough to be worth it?

这篇关于如何从Python中的列表中删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Python中的列表中删除重复项 [英] How to remove duplicate entries from a list in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从Python中的列表中删除重复项 [英] How to remove duplicate entries from a list in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭