我应该使用字典还是列表? [英] Should I use dict or list?
问题描述
我想遍历一个很大的二维列表:
I would like to loop through a big two dimension list:
authors = [["Bob", "Lisa"], ["Alice", "Bob"], ["Molly", "Jim"], ... ]
并获得一个列表,其中包含作者中出现的所有名称.
and get a list that contains all the names that occurs in authors.
当我遍历列表时,我需要一个容器来存储已经看到的名称,我想知道是否应该使用列表或字典:
When I loop through the list, I need a container to store names I've already seen, I'm wondering if I should use a list or a dict:
带有列表:
seen = []
for author_list in authors:
for author in author_list:
if not author in seen:
seen.append(author)
result = seen
带有字典:
seen = {}
for author_list in authors:
for author in author_list:
if not author in seen:
seen[author] = True
result = seen.keys()
哪个更快?还是有更好的解决方案?
which one is faster? or is there better solutions?
推荐答案
您真的想要set
.集比列表快,因为它们只能包含唯一元素,这使得它们可以实现为哈希表.哈希表允许在O(1)
时间内进行成员资格测试(if element in my_set
).这与列表形成对比,在列表中检查元素是否在列表中的唯一方法是依次检查列表中的每个元素(在O(n)
时间.)
You really want a set
. Sets are faster than lists because they can only contain unique elements, which allows them to be implemented as hash tables. Hash tables allow membership testing (if element in my_set
) in O(1)
time. This contrasts with lists, where the only way to check if an element is in the list is to check every element of the list in turn (in O(n)
time.)
A dict
与set
相似,两者都只允许唯一键,并且都实现为哈希表.它们都允许O(1)
成员资格测试.区别在于set
仅具有键,而dict
具有键和值(这是在此应用程序中不需要的额外开销.)
A dict
is similar to a set
in that both allow unique keys only, and both are implemented as hash tables. They both allow O(1)
membership testing. The difference is that a set
only has keys, while a dict
has both keys and values (which is extra overhead you don't need in this application.)
使用set
,并将嵌套的for循环替换为itertools.chain()
,以将2D列表展平为1D列表:
Using a set
, and replacing the nested for loop with an itertools.chain()
to flatten the 2D list to a 1D list:
import itertools
seen = set()
for author in itertools.chain(*authors):
seen.add(author)
或更短:
import itertools
seen = set( itertools.chain(*authors) )
对于大型列表,编辑(感谢@jamylak)可以提高内存效率:
Edit (thanks, @jamylak) more memory efficient for large lists:
import itertools
seen = set( itertools.chain.from_iterable(authors) )
列表列表中的示例:
Example on a list of lists:
>>> a = [[1,2],[1,2],[1,2],[3,4]]
>>> set ( itertools.chain(*a) )
set([1, 2, 3, 4])
P.S. :如果不是要查找所有唯一作者,而是要计数,请参阅与每个作者见面的次数,请使用collections.Counter
,这是一种优化的用于计数事物的特殊字典.
P.S. : If, instead of finding all the unique authors, you want to count the number of times you see each author, use a collections.Counter
, a special kind of dictionary optimised for counting things.
下面是对字符串中的字符进行计数的示例:
Here's an example of counting characters in a string:
>>> a = "DEADBEEF CAFEBABE"
>>> import collections
>>> collections.Counter(a)
Counter({'E': 5, 'A': 3, 'B': 3, 'D': 2, 'F': 2, ' ': 1, 'C': 1})
这篇关于我应该使用字典还是列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!