从Python列表中获取前n个唯一元素 [英] Getting first n unique elements from Python list

查看:368
本文介绍了从Python列表中获取前n个唯一元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个python列表,其中的元素可以重复.

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

我想从列表中获得第一个n个唯一元素. 因此,在这种情况下,如果我想要前5个唯一元素,它们将是:

I want to get the first n unique elements from the list. So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

我想出了一个使用生成器的解决方案:

I have come up with a solution using generators:

def iterate(itr, upper=5):

    count = 0
    for index, element in enumerate(itr):
        if index==0:
            count += 1
            yield element

        elif element not in itr[:index] and count<upper:
            count += 1
            yield element

使用中:

>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]

我怀疑这是最佳解决方案.有没有一种我可以实施的替代策略,可以用更加Python化和高效的方式编写它 方式吗?

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?

推荐答案

我将使用set来记住所见内容,并在seen足够时从生成器返回:

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]

def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable. 
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
        if e in seen:
            continue
        seen.add(e)
        yield e
        if len(seen) == N:
            return

k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))

输出:

[1,2,3,4]


根据 PEP-479 ,您应该从发电机return ,而不是raise StopIteration-感谢 @khelwood & @iBug 来发表评论-从来没有学过.


According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

使用3.6时,您会收到不赞成使用的警告,使用3.7时,它将给出RuntimeErrors:

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

您使用elif element not in itr[:index] and count<upper:的解决方案使用O(k)查找-将k作为切片的长度-使用集可将其减少为O(1)查找,但会占用更多内存,因为还必须保留该集.这是速度与内存之间的权衡-更好的是应用程序/数据依赖项.

Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

考虑[1,2,3,4,4,4,4,5][1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

对于6个唯一身份(在较长列表中):

For 6 uniques (in longer list):

  • 您将查找O(1)+O(2)+...+O(5001)
  • 我的set( {1,2,3,4,5,6})会具有5001*O(1)查找+内存
  • you would have lookups of O(1)+O(2)+...+O(5001)
  • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

这篇关于从Python列表中获取前n个唯一元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆