过滤项仅出现一次在一个非常大名单 [英] Filter items that only occurs once in a very large list

查看:94
本文介绍了过滤项仅出现一次在一个非常大名单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大名单(超过100万件),其中包含英文单词:

I have a large list(over 1,000,000 items), which contains english words:

tokens = ["today", "good", "computer", "people", "good", ... ]

我想获得一切只发生一次,在列表中的项目

I'd like to get all the items that occurs only once in the list

现在,我使用:

tokens_once = set(word for word in set(tokens) if tokens.count(word) == 1)

但它真的很慢。我怎么能做出这样快?

but it's really slow. how could I make this faster?

推荐答案

您遍历列表,然后你再这样做的每个元素,这使得O(N²)。如果你使用计数器更换你的计数,您通过独特的列表迭代一次在列表中,然后再次元素,这使得它在最坏的情况下,O(2N),即O(N)。

You iterate over a list and then for each element you do it again, which makes it O(N²). If you replace your count by a Counter, you iterate once over the list and then once again over the list of unique elements, which makes it, in the worst case, O(2N), i.e. O(N).

from collections import Counter

tokens = ["today", "good", "computer", "people", "good"]
single_tokens = [k for k, v in Counter(tokens).iteritems() if v == 1 ]
# single_tokens == ['today', 'computer', 'people']

这篇关于过滤项仅出现一次在一个非常大名单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆