如何从Python中的列表中删除重复项 [英] How to remove duplicate entries from a list in python

查看:133
本文介绍了如何从Python中的列表中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

近日在接受采访时我被要求写一个python code从列表中删除所有重复的项目。

Recently in an interview I was asked to write a python code to remove all duplicate entries from a list.

例如:

Input List = {1,2,4,5,2,3,1}
Expected Output List = {4,5,3}

在上面的例子中,1的&安培; 2的多次出现,他们应该被删除。而为了preservation是很重要的。这是提出的问题。

In the above example, 1's & 2's appear more than once and they should be removed. And order preservation is important. This was the question asked.

此外,他不想让我用内置的功能,如设置(),独特的()等。他是我的测试算法和DS的技能,我猜。他已经在开始做这一点。

Again he did not want me to use inbuilt functions like set(), unique() etc. He was testing my algorithm and ds skills, i guess. He had made this clear in the start.

我想到了2做法当场。 1.)排序(n日志的复杂度(n))的 2)哈希表(更快的排序)

I thought of 2 approaches on the spot. 1.) Sorting (complexity of nlog(n)) 2.) HashTable (faster that sorting)

哈希表的方法:

arr = [1,2,4,5,2,3,1]

//function : to create a hash table with key = arr[i] & value = occurence count
def dataCountTable(arr):
    countTable = {}
    i = 0
    while i<len(arr) :
        if arr[i] in countTable : 
            countTable[arr[i]] += 1
        else :
        countTable[arr[i]] = 1
    i+=1
return countTable

//function : to remove duplicates using the arr & hash table
def rmvAllDuplicate(arr, countTable):
    outList = list()
    i = 0
    while i<len(arr) :
        if countTable[arr[i]] == 1 :
            outList.append(arr[i]);
    i+=1
return outList

print rmvAllDuplicate(arr, dataCountTable(arr)) 

面试官似乎没有pssed这个答案IM $ P $。它使我在寻找一个更好的aprooach。我could'nt找到一个。

The interviewer did not seem impressed with this answer. And it kept me searching for a better aprooach. I could'nt find one.

如果有人可以帮助我提高我的解决方案或建议一个新的,更好的解决方案,这将是很好!

If someone could help me improve my solution or suggest a new and a better solution, it will be nice!

谢谢!

推荐答案

虽然你的哈希表的实现可以更简洁,易读,习惯,用小提升速度,我怀疑是不是你的面试官感到失望

Although your hash-table implementation could be made more concise, readable, and idiomatic, with a small boost in speed, I suspect that isn't what your interviewer was disappointed in.

更可能的是,他推你,希望更好的解决方案,你会拿出一个说法,为什么你的解决方案是有效的优化,而是,你搜索了一会儿,就放弃了。

More likely, he pushed you for a better solution in hopes that you would come up with an argument why your solution was effectively optimal, but instead, you searched for a while and gave up.

那么,有多个东西来证明这里:

So, there are multiple things to prove here:

  1. 任何解决这个问题,就必须是O(n)的时间。
  2. 您的解决方案是(摊销,平均和 - 几乎总是)O(N)的时间。
  3. 在您的解决方案的时间复杂度的倍数是合理的。
  4. 任何解决这一问题的那O(N)时间必须0(M)的空间(其中,M是不同值的数目)。
  5. 您的解决方案是O(M)的空间。
  6. 在您的解决方案的空间复杂度的乘数是合理的。

即使是容易的,你不会拿出在接受采访时实际的严格证明。而其中的一些,你可能甚至无法做出令人信服的理由,但提高可能的异常,并承认在那里你挥舞着你的手就足够了。例如:

Even the easy ones you're not going to come up with an actual rigorous proof in an interview. And some of them, you may not even be able to make a compelling case—but raising possible exceptions and acknowledging where you're waving your hands may be sufficient. For example:

  • 在Python的字典(和设置)的O(N)最坏情况下的时间;它只有O(1)摊销平均情况。有什么关于你的数据,将使其有可能为O要差一些(1)?或许不会,但是......如果这是有人想拒绝一个服务器的一部分,他们可以送他们想要的任何输入数据?
  • 所有他给你小整数的值。是保证是真的吗?在这种情况下,不使用的字典为您计数,只要使用名单(范围(0,P)),其中 P 为最大数。然后,它的O(P)的空间,这听起来比0(M)更糟糕,除了乘数会小了很多,一个列表需要大约1/3的空间(只值,而不是哈希值,钥匙,和值),因此,如果 P&LT;&LT; M / 3 这是一个双赢。而且这也可能对速度的胜利,因为没有必要让哈希值。你还可以用 array.array 做得更好。
  • 在一个Python哈希表是矫枉过正,用于存储两套和类型的字典用小的数值。可以自定义的哈希表切东西显著,还是不够的值得吗?
  • Python's dict (and set) has O(N) worst-case time; it's only O(1) amortized average-case. Is there anything about your data that would make it likely to be worse than O(1)? Probably not, but… what if this were part of a server that someone wanted to DoS, and they could send any input data they wanted?
  • All of the values he gave you were small integers. Is that guaranteed to be true? In that case, don't use a dict for your counts, just use list(range(0, P)) where P is the max number. Then it's O(P) space, which sounds worse than O(M), except that the multiplier will be a lot smaller—a list takes about 1/3rd the space (just values, rather than hashes, keys, and values), so if P << M/3 it's a win. And it's also probably a win on speed, because there's no need to keep hashing values. And you can do even better with array.array.
  • A Python hash table is overkill for storing both sets and dicts with small counts. Could a custom hash table cut things significantly, or not enough to be worth it?

这篇关于如何从Python中的列表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆