删除python列表中的重复项,但要记住索引 [英] Remove duplicates in python list but remember the index

查看:163
本文介绍了删除python列表中的重复项,但要记住索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何删除列表中的重复项,保持项目的原始顺序,并记住列表中任何项目的第一个索引?

How can I remove duplicates in a list, keep the original order of the items and remember the first index of any item in the list?

例如,从[1, 1, 2, 3]中删除重复项将产生[1, 2, 3],但我需要记住索引[0, 2, 3].

For example, removing the duplicates from [1, 1, 2, 3] yields [1, 2, 3] but I need to remember the indices [0, 2, 3].

我正在使用Python 2.7.

I am using Python 2.7.

推荐答案

使用enumerate跟踪索引,使用一个集合跟踪所看到的元素:

Use enumerate to keep track of the index and a set to keep track of element seen:

l = [1, 1, 2, 3]
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
    seen.add(ele)

如果您都想要:

inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)

或者如果您希望两者都在不同的列表中:

Or if you want both in different lists:

l = [1, 1, 2, 3]
inds, unq = [],[]
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
        unq.append(ele)
    seen.add(ele)

使用集合是迄今为止最好的方法:

Using a set is by far the best approach:

In [13]: l = [randint(1,10000) for _ in range(10000)]     

In [14]: %%timeit                                         
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
100 loops, best of 3: 3.08 ms per loop

In [15]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 442 ms per loop

In [16]: l = [randint(1,10000) for _ in range(100000)]      
In [17]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 10.3 s per loop

In [18]: %%timeit                                       
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
10 loops, best of 3: 22.6 ms per loop

因此,对于100k元素10.3秒vs 22.6 ms,如果您尝试使用更大的对象而又像[randint(1,100000) for _ in range(100000)]这样的较少重复的对象,您将有时间阅读一本书.创建两个列表比使用list.index稍慢一些,但仍然快几个数量级.

So for 100k elements 10.3 seconds vs 22.6 ms, if you try with anything larger with less dupes like [randint(1,100000) for _ in range(100000)] you will have time to read a book. Creating two lists is marginally slower but still orders of magnitude faster than using list.index.

如果您想一次获取一个值,可以使用一个生成器函数:

If you want to get a value at a time you can use a generator function:

def yield_un(l):
    seen = set()
    for i, ele in enumerate(l):
        if ele not in seen:
            yield (i,ele)
        seen.add(ele)

这篇关于删除python列表中的重复项,但要记住索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆