嵌套字典或元组为关键字? [英] nested dictionaries or tuples for key?

查看:131
本文介绍了嵌套字典或元组为关键字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有这样的结构:

{'key1' : { 'key2' : { .... { 'keyn' : 'value' } ... } } }

使用python,尝试确定两种不同方法的优点/缺点:

Using python, I'm trying to determine advantages/disadvantages of two different approaches:

{'key1' : { 'key2' : { .... { 'keyn' : 'value' } ... } } } # A. nested dictionary
{('key1', 'key2', ...., 'keyn') : 'value'} # B. a dictionary with a tuple used like key

然后我有兴趣知道什么是最好的(A或B):

Then I'm interested to know, what is the best (A or B) in terms of:


  • 内存占用

  • 插入的复杂性 -



推荐答案

没有细节(这是高度实现依赖,反正可能无效的下一个天才来和调整字典y实现):

Without going into details (which are highly implementation-dependent anyway and may be invalidated by the next genius to come along and tweak the dictionary implementation):


  • 对于内存开销:每个对象有一些开销(例如, refcount和type;一个空的对象是8个字节,一个空的元组是28个字节),但哈希表需要存储哈希值,键值和值,通常使用比当前需要的更多的桶来避免冲突。另一方面,元组不能调整大小并且没有冲突,即N元组可以简单地将N个指针分配给所包含的对象并且被完成。这导致内存消耗的明显差异。

  • 对于查找和插入复杂性(这两方面在这方面是相同的):无论是字符串还是元组,CPython的dict实施,并非常有效地解决。更多的键(因为通过组合元组中的键来缩放关键空间)可能似乎增加了碰撞的可能性,更多的键也会导致更多的水桶(AFAIK,目前的实施尝试将负载因子保持在2/3之间),其中反过来又使碰撞不太可能。此外,您不需要更多的散列(对于元组哈希,还有一个函数调用和一些C级代码,但这是无效的)来获取一个值。

你看,性能上不会有任何明显的差异,虽然有些内存差异。后者不会显着,但我认为。单元素是140字节,十元组元组也是140字节(根据Python 3.2 sys.getsizeof )。所以即使(已经不切实际,说我的直觉)十层次的嵌套,你将有一个以上的一个kB的差异 - 如果嵌套的dicts有多个项目(取决于确切的负载因子),可能会更少。对于具有数百种这样的数据结构的内存的数据处理应用程序来说太多了,但是大多数对象不是经常被创建的。

You see, there shouldn't be any noticeable difference in performance, although some memory difference. The latter won't be notable though, I think. A one-element dict is 140 bytes, a ten-element tuple is 140 bytes as well (according to Python 3.2 sys.getsizeof). So even with the (already unrealistic, says my gut-feeling) ten-level nesting, you'd have slightly more than one kB of difference - possibly less if the nested dicts have multiple items (depends on the exact load factor). That's too much for a data-crunching application that has hundreds of such data structure im memory, but most objects aren't created that frequently.

你应该问自己哪个模型更适合您的问题。考虑到第二种方法要求您一次性提供一个可用值的所有键,而第二种方式则可以逐步获得。

You should simply ask yourself which model is more appropriate for your problem. Consider that the second way requires you to have all keys for a value available at once, while the second allows getting there incrementally.

这篇关于嵌套字典或元组为关键字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆