从Python列表中获取前n个唯一元素 [英] Getting first n unique elements from Python list
问题描述
我有一个python列表,其中的元素可以重复.
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
我想从列表中获得第一个n
个唯一元素.
因此,在这种情况下,如果我想要前5个唯一元素,它们将是:
I want to get the first n
unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
我想出了一个使用生成器的解决方案:
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
使用中:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
我怀疑这是最佳解决方案.有没有一种我可以实施的替代策略,可以用更加Python化和高效的方式编写它 方式吗?
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient way?
推荐答案
我将使用set
来记住所见内容,并在seen
足够时从生成器返回:
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
输出:
[1,2,3,4]
根据 PEP-479 ,您应该从发电机return
,而不是raise StopIteration
-感谢 @khelwood & @iBug 来发表评论-从来没有学过.
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
使用3.6时,您会收到不赞成使用的警告,使用3.7时,它将给出RuntimeErrors:
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
您使用elif element not in itr[:index] and count<upper:
的解决方案使用O(k)
查找-将k
作为切片的长度-使用集可将其减少为O(1)
查找,但会占用更多内存,因为还必须保留该集.这是速度与内存之间的权衡-更好的是应用程序/数据依赖项.
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
考虑[1,2,3,4,4,4,4,5]
与[1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
对于6个唯一身份(在较长列表中):
For 6 uniques (in longer list):
- 您将查找
O(1)+O(2)+...+O(5001)
- 我的
set( {1,2,3,4,5,6})
会具有5001*O(1)
查找+内存
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
这篇关于从Python列表中获取前n个唯一元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!