如何在保留顺序的同时从列表中删除重复项? [英] How do you remove duplicates from a list whilst preserving order?

查看:35
本文介绍了如何在保留顺序的同时从列表中删除重复项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 中是否有从列表中删除重复项同时保留顺序的内置函数?我知道我可以使用集合来删除重复项,但这会破坏原始顺序.我也知道我可以像这样滚动自己的:

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(感谢 放松代码示例.)

但如果可能的话,我想利用内置的或更 Pythonic 的习语.

But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.

相关问题:在 Python 中,从列表中删除重复项以便所有元素都是唯一的同时保持顺序的最快算法是什么?

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

推荐答案

这里有一些替代方案:http://www.peterbe.com/plog/uniqifiers-benchmark

最快的一个:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

为什么将 seen.add 分配给 seen_add 而不是只调用 seen.add?Python 是一种动态语言,每次迭代解析 seen.add 比解析局部变量成本更高.seen.add 可能在迭代之间发生了变化,并且运行时不够智能,无法排除这种情况.为了安全起见,它必须每次都检查对象.

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

如果你打算在同一个数据集上多次使用这个函数,也许你最好使用有序集:http://code.activestate.com/recipes/528878/

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) 每个操作的插入、删除和成员检查.

O(1) insertion, deletion and member-check per operation.

(小附加说明:seen.add() 总是返回 None,所以上面的 or 是仅作为尝试更新集合的一种方式,而不是逻辑测试的组成部分.)

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)

这篇关于如何在保留顺序的同时从列表中删除重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆