python字典中长(str)键的效率 [英] efficiency of long (str) keys in python dictionary

查看:196
本文介绍了python字典中长(str)键的效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析一些xml(带有一些python 3.4代码),并且想要从节点及其id属性中检索文本.例子: <li id="12345"> Some text here </li> 我当前的代码仅围绕文本构造(我现在添加ID,但之前不需要此代码).我遍历文本/句子列表,然后继续做一些事情.因此,我想到了以文本/句子为键,此id属性为值的方式制作字典.

I'm parsing some xml (with some python 3.4 code) and want to retrieve both the text from a node and its id attribute. Example: <li id="12345"> Some text here </li> My current code is structured around the text only (I'm now adding the id, but didn't need this before). I'm looping through a list of text/sentences, and then proceed to do some stuff. So I thought of making a dictionary with the text/sentence as key, and this id attribute as value.

但是,这并不十分有效.文本可以是整个段落,因此密钥很长.而id的长度始终是相当有限的(但是仍然是str类型,例如,一些字母字符后跟一些数字). 但是要使id为键,使文本为值,则需要重写一些代码.所有这些都不是很成问题,但这让我感到纳闷:与"ulp_887362487687678"这样的id作为关键字相比,将文本(可能是整个段落)作为关键字效率低下吗?

However, this doesn't feel very efficient. The text can be a whole paragraph, making the key very long. Whereas the id is always of a fairly limited length (but still of type str though, e.g. some alpha characters followed by some digits). But making the ids the key and the text the value requires some rewriting of the code. All not very problematic, but this just got me wondering: How inefficient would it be to have the text (potentially a whole paragraph) as key, compared to an id like "ulp_887362487687678" as key?

我可以制作两个反向字典(一个以id为键,另一个以文本为键),然后比较构造和查找以及全部.而且我还找到了一些有关密钥长度限制的主题(词典有密钥吗?长度限制?).但是我只是想知道您对此有何想法.您的字典中是否有这么长的str键,您肯定要避免,还是不是什么大不了的事情? 如果您可以分享一些专业人士的意见,那就太好了!

I can just make two reverse dictionaries (one with id as key, the other with text as key) and compare construction and lookup and all. And I've also found some topics on key length limit (Do Dictionaries have a key length limit?). But I'm merely wondering what your thoughts are on this. Is having such long str keys in your dict something that you definitely want to avoid, or is it not a very big deal? If you could share some pro's/con's, that would be great!

推荐答案

否,Python字符串长度几乎不会影响字典性能.字符串长度可能唯一影响的是hash()函数,该函数将键映射到哈希表插槽.

No, Python string length hardly has an impact on dictionary performance. The only influence the string length could have is on the hash() function used map the key to a hash table slot.

字符串长度对hash()的性能影响很小:

String length has very little impact on the performance of hash():

>>> import random
>>> from timeit import timeit
>>> from string import ascii_letters
>>> generate_text = lambda len: ''.join([random.choice(ascii_letters) for _ in xrange(len)])
>>> for i in range(8):
...     length = 10 + 10 ** i
...     testword = generate_text(length)
...     timing = timeit('hash(t)', 'from __main__ import testword as t')
...     print 'Length: {}, timing: {}'.format(length, timing)
... 
Length: 11, timing: 0.061537027359
Length: 20, timing: 0.0796310901642
Length: 110, timing: 0.0631730556488
Length: 1010, timing: 0.0606122016907
Length: 10010, timing: 0.0613977909088
Length: 100010, timing: 0.0607581138611
Length: 1000010, timing: 0.0672461986542
Length: 10000010, timing: 0.080118894577

我停止生成1000万个字符的字符串,因为我不必为等待笔记本电脑生成1亿个字符串而烦恼.

I stopped at generating a string of 10 million characters, because I couldn't be bothered waiting for my laptop to generate a 100 million character string.

时间几乎是恒定的,因为一旦计算出该值实际上就会缓存在字符串对象上.

The timings are pretty much constant, because the value is actually cached on the string object once computed.

这篇关于python字典中长(str)键的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆