Python-将列表转换成字典以减少复杂性 [英] Python - convert list into dictionary in order to reduce complexity

查看:79
本文介绍了Python-将列表转换成字典以减少复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个很大的清单:

word_list = [elt.strip() for elt in open("bible_words.txt", "r").readlines()] 

//complexity O(n) --> proporcional to list length "n"

我了解到,用于构建dictionarieshash function允许lookup更快,就像这样:

word_dict = dict((elt, 1) for elt in word_list) 

// complexity O(l) ---> constant.

使用word_list,是否有一种最有效的方法,可以降低我的代码的复杂性?

解决方案

问题中的代码仅做一件事:将文件中的所有单词填充到列表中.复杂度为O(n).

将相同的单词填充到任何其他类型的容器中仍将至少具有O(n)复杂度,因为它必须从文件中读取所有单词,并且必须将所有单词放入容器中. /p>

dict有什么区别?

找出list中是否存在某物具有O(n)复杂度,因为该算法必须逐项遍历列表,并检查它是否为所需项.可以在位置0处找到该项目,该位置很快,也可以是最后一个项目(或根本不在列表中),这使其变为O(n).

dict中,数据组织在存储桶" 中.将key:value对保存到dict时,将计算密钥的hash,该数字用于标识存储数据的 bucket .稍后,在查找键时,再次计算hash(key)以标识存储桶,然后仅搜索该存储桶.通常每个降级只有一个key:value对,因此可以在O(1)中进行搜索.

有关更多信息,请参见有关python.org上的DictionaryKeys的文章.

set怎么样?

集合就像字典,只有键而没有值.问题包含以下代码:

word_dict = dict((elt, 1) for elt in word_list) 

那显然是一本不需要值的字典,所以集合会更合适.

顺便说一句,无需创建首先是列表的word_list并将其转换为setdict.可以跳过第一步:

set_of_words = {elt.strip() for elt in open("bible_words.txt", "r").readlines()}

有什么缺点吗?

总是;)

  • 一个集合没有重复项.因此,计算单词在单词集中的次数将永远不会返回2.如果需要,请不要使用单词集.

  • 未排序集合.无法检查哪个是集合中的第一个单词.如果需要,请不要使用集合.

  • 保存到集合中的对象必须是可哈希的,这意味着它们是不可变的.如果可以修改对象,则它的 hash 会更改,因此它将位于错误的存储桶中,并且搜索将失败.无论如何,strintfloattuple对象是不可变的,因此至少这些对象可以放入集合中.

  • 写入到集合可能比写入列表要慢一些.仍为O(n),但速度较慢的O(n),因为它必须计算哈希并组织到存储桶中,而列表只是将一项逐项转储.请参见下面的计时.

  • 从集合中读取所有内容也比从列表中读取所有内容要慢一些.

所有这些都适用于dictset.

一些带有计时的例子

列出与设置有关的文字:

>>> timeit.timeit('[n for n in range(1000000)]', number=10)
0.7802875302271843
>>> timeit.timeit('{n for n in range(1000000)}', number=10)
1.025623542189976

从列表中读取内容与设置内容:

>>> timeit.timeit('989234 in values', setup='values=[n for n in range(1000000)]', number=10)
0.19846207875508526
>>> timeit.timeit('989234 in values', setup='values={n for n in range(1000000)}', number=10)
3.5699193290383846e-06

因此,写入集合的速度似乎要慢30%,但是在有成千上万个项目的情况下,找到集合中的项目的速度要快数千倍.

Let's say I have a big list:

word_list = [elt.strip() for elt in open("bible_words.txt", "r").readlines()] 

//complexity O(n) --> proporcional to list length "n"

I have learned that hash function used for building up dictionaries allows lookup to be much faster, like so:

word_dict = dict((elt, 1) for elt in word_list) 

// complexity O(l) ---> constant.

using word_list, is there a most efficient way which is recommended to reduce the complexity of my code?

解决方案

The code from the question does just one thing: fills all words from a file into a list. The complexity of that is O(n).

Filling the same words into any other type of container will still have at least O(n) complexity, because it has to read all of the words from the file and it has to put all of the words into the container.

What is different with a dict?

Finding out whether something is in a list has O(n) complexity, because the algorithm has to go through the list item by item and check whether it is the sought item. The item can be found at position 0, which is fast, or it could be the last item (or not in the list at all), which makes it O(n).

In dict, data is organized in "buckets". When a key:value pair is saved to a dict, hash of the key is calculated and that number is used to identify the bucket into which data is stored. Later on, when the key is looked up, hash(key) is calculated again to identify the bucket and then only that bucket is searched. There is typically only one key:value pair per bucked, so the search can be done in O(1).

For more detils, see the article about DictionaryKeys on python.org.

How about a set?

A set is something like a dictionary with only keys and no values. The question contains this code:

word_dict = dict((elt, 1) for elt in word_list) 

That is obviously a dictionary which does not need values, so a set would be more appropriate.

BTW, there is no need to create a word_list which is a list first and convert it to set or dict. The first step can be skipped:

set_of_words = {elt.strip() for elt in open("bible_words.txt", "r").readlines()}

Are there any drawbacks?

Always ;)

  • A set does not have duplicates. So counting how many times a word is in the set will never return 2. If that is needed, don't use a set.

  • A set is not ordered. There is no way to check which was the first word in the set. If that is needed, don't use a set.

  • Objects saved to sets have to be hashable, which kind-of implies that they are immutable. If it was possible to modify the object, then its hash would change, so it would be in the wrong bucket and searching for it would fail. Anyway, str, int, float, and tuple objects are immutable, so at least those can go into sets.

  • Writing to a set is probably going to be a bit slower than writing to a list. Still O(n), but a slower O(n), because it has to calculate hashes and organize into buckets, whereas a list just dumps one item after another. See timings below.

  • Reading everything from a set is also going to be a bit slower than reading everything from a list.

All of these apply to dict as well as to set.

Some examples with timings

Writing to list vs. set:

>>> timeit.timeit('[n for n in range(1000000)]', number=10)
0.7802875302271843
>>> timeit.timeit('{n for n in range(1000000)}', number=10)
1.025623542189976

Reading from list vs. set:

>>> timeit.timeit('989234 in values', setup='values=[n for n in range(1000000)]', number=10)
0.19846207875508526
>>> timeit.timeit('989234 in values', setup='values={n for n in range(1000000)}', number=10)
3.5699193290383846e-06

So, writing to a set seems to be about 30% slower, but finding an item in the set is thousands of times faster when there are thousands of items.

这篇关于Python-将列表转换成字典以减少复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆