从Python列表中获取前n个唯一元素 [英] Getting first n unique elements from Python list

查看：368 发布时间：2020/11/13 3:31:21 python python-3.x set generator unique

本文介绍了从Python列表中获取前n个唯一元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个python列表，其中的元素可以重复.

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

我想从列表中获得第一个n个唯一元素. 因此，在这种情况下，如果我想要前5个唯一元素，它们将是:

I want to get the first n unique elements from the list. So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

我想出了一个使用生成器的解决方案:

I have come up with a solution using generators:

def iterate(itr, upper=5):

    count = 0
    for index, element in enumerate(itr):
        if index==0:
            count += 1
            yield element

        elif element not in itr[:index] and count<upper:
            count += 1
            yield element

使用中:

>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]

我怀疑这是最佳解决方案.有没有一种我可以实施的替代策略，可以用更加Python化和高效的方式编写它方式吗?

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?

推荐答案

我将使用set来记住所见内容，并在seen足够时从生成器返回:

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]

def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable. 
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
        if e in seen:
            continue
        seen.add(e)
        yield e
        if len(seen) == N:
            return

k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))

输出:

[1,2,3,4]

根据 PEP-479 ，您应该从发电机return ，而不是raise StopIteration-感谢 @khelwood & @iBug 来发表评论-从来没有学过.

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

使用3.6时，您会收到不赞成使用的警告，使用3.7时，它将给出RuntimeErrors:

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

您使用elif element not in itr[:index] and count<upper:的解决方案使用O(k)查找-将k作为切片的长度-使用集可将其减少为O(1)查找，但会占用更多内存，因为还必须保留该集.这是速度与内存之间的权衡-更好的是应用程序/数据依赖项.

Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

考虑[1,2,3,4,4,4,4,5]与[1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

对于6个唯一身份(在较长列表中):

For 6 uniques (in longer list):

您将查找O(1)+O(2)+...+O(5001)
我的set( {1,2,3,4,5,6})会具有5001*O(1)查找+内存

you would have lookups of O(1)+O(2)+...+O(5001)
mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

这篇关于从Python列表中获取前n个唯一元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Python列表中获取前n个唯一元素 [英] Getting first n unique elements from Python list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从Python列表中获取前n个唯一元素 [英] Getting first n unique elements from Python list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭