是否可以给出一个python dict的初始容量(并且它是有用的) [英] Is it possible to give a python dict an initial capacity (and is it useful)

查看:1251
本文介绍了是否可以给出一个python dict的初始容量(并且它是有用的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在填写一个约10,000,000个物品的蟒蛇。我对dict(或hashtables)的理解是,当太多的元素进入它们时,需要调整大小,这个操作花费很多时间。



有没有办法说一个python dict,你将至少存储n个项目,以便它可以从一开始就分配内存?或者这个优化不会对我的运行速度有好处吗?



(不,我没有检查到我的小脚本的缓慢是因为这个,我其实现在不会这样做,但这是Java将要做的,设置HashSet的初始容量)

解决方案

首先,我听说过传言,您可以在初始化时设置字典的大小,但我从未看到任何文档或PEP描述如何完成。



考虑到这一点,我对您的物品数量进行了分析,如下所述。虽然可能需要一些时间来调整字典的大小,但每次建议您不用担心时,建议您继续前进,至少要测试其表现。



关心我们确定调整大小是元素数量和调整大小的因素。字典将自动调整大小,当它是2/3的全部添加元素放在2/3标记。低于50,000个元素将会增加4倍,高于该数额2倍。使用您估计的10,000,000个元素(2 ^ 23和2 ^ 24之间),您的字典将自己调整大小15次(低于5万次, 8倍以上)。另一个调整大小将发生在11,100,000以上。



调整哈希表中当前元素的大小和替换确实需要一些时间,但是我想知道你是否会注意到,在附近的代码中继续。我只是把一个时间套件放在一起,比较了从2 ^ 3到2 ^ 24的字典大小的每个边界的五个位置的插入,边界的增加比非边界增加的平均0.4纳秒长。这是0.17%更长...可能是可以接受的。所有操作的最小值为0.2085微秒,最大值为0.2412微秒。



希望这是有见地的,如果您检查代码的性能,请跟进一个编辑!我的主要字典内部资源是由布兰登·罗德斯在PyCon 2010发表的精彩演讲:强大字典


I am filling a python dict with around 10,000,000 items. My understanding of dict (or hashtables) is that when too much elements get in them, the need to resize, an operation that cost quite some time.

Is there a way to say to a python dict that you will be storing at least n items in it, so that it can allocate memory from the start? Or will this optimization not do any good to my running speed?

(And no, I have not checked that the slowness of my small script is because of this, I actually wouldn't now how to do that. This is however something I would do in Java, set the initial capacity of the HashSet right)

解决方案

First off, I've heard rumor that you can set the size of a dictionary at initialization, but I have never seen any documentation or PEP describing how this would be done.

With this in mind I ran an analysis on your quantity of items, described below. While it may take some time to resize the dictionary each time I would recommend moving ahead without worrying about it, at least until you can test its performance.

The two rules that concern us in determining resizing is number of elements and factor of resizing. A dictionary will resize itself when it is 2/3 full on the addition of the element putting it over the 2/3 mark. Below 50,000 elements it will increase by a factor of 4, above that amount by a factor of 2. Using your estimate of 10,000,000 elements (between 2^23 and 2^24) your dictionary will resize itself 15 times (7 times below 50k, 8 times above). Another resize would occur just past 11,100,000.

Resizing and replacing the current elements in the hashtable does take some time, but I wonder if you'd notice it with whatever else you have going on in the code nearby. I just put together a timing suite comparing inserts at five places along each boundary from dictionary sizes of 2^3 through 2^24, and the "border" additions average 0.4 nanoseconds longer than the "non-border" additions. This is 0.17% longer... probably acceptable. The minimum for all operations was 0.2085 microseconds, and max was 0.2412 microseconds.

Hope this is insightful, and if you do check the performance of your code please follow-up with an edit! My primary resource for dictionary internals was the splendid talk given by Brandon Rhodes at PyCon 2010: The Mighty Dictionary

这篇关于是否可以给出一个python dict的初始容量(并且它是有用的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆