在Python中保留大字典会影响应用程序的性能 [英] Keeping large dictionary in Python affects application performance

查看:93
本文介绍了在Python中保留大字典会影响应用程序的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



这是我的测试代码,我是一个很难理解的(最终解决的)为什么在内存中有一个大字典可以创建其他的字典。 '使用

  import time 
def create_dict():
#return {x:[x] * 125 for x in xrange(0,100000)}
return {x:(x)* 125 for x in xrange(0,100000)}#UPDATED:使用元组而不是值列表


class Foo(object):
@staticmethod
def dict_init():
start = time.clock()
Foo.sample_dict = create_dict )
打印Foo中的dict_init取{0}秒.format(time.clock() - start)

如果__name__ =='__main__':
Foo.dict_init ()
for x in xrange(0,10):
start = time.clock()
create_dict()
print运行{0}花费{1}秒 .format(x,time.clock() - start)

如果我运行代码第一个初始在Foo中的zing sample_dict),然后在循环中创建相同的字典10次以上结果:

  foo中的dict_init拿了0.385263764287秒
运行0拿0.548807949139秒
运行1拿0.533209452471秒
运行2花费0.51916067636秒
运行3采取0.513130722575秒
运行4采取0.508272050029秒
运行5采取0.502263872177秒
运行6采取0.48867601998秒
运行7采取0.483109299676秒
运行8采取0.479019713488秒
运行9采取0.473174195256秒
[完成在5.6s]

但是,如果我没有在Foo中初始化sample_dict(注释掉Foo.dict_init ))我在循环中快速创建了20%的字典

 运行0取0.431378921359秒
运行1拿了0.423696636179秒
运行2拿了0.419630475616秒
运行3拿0.405130343806秒
运行4拿0.398099686921秒
运行5拿0.392837169802秒
运行6花费0.38799598399秒
运行7花费0.375133006408秒
运行8花费0.368755297573秒
运行9花费0.363273701371秒
[完成在4.0s]

我注意到,如果我通过调用gc.disable关闭Python的垃圾收集器)性能不仅提高〜5倍,而且在Foo中存储大字典没有什么不同。
这里是禁用垃圾回收的结果:

  foo中的dict_init取0.0696136982496秒
运行0取0.113533445358秒
运行1采取0.111091241489秒
运行2花费0.111151620212秒
运行3采取0.110655722831秒
运行4采取0.111807537706秒
运行5采取0.11097510318秒
运行6取0.110936170451秒
运行7取0.111074414632秒
运行8采取0.110678488579秒
运行9采取0.111011066463秒

所以我有两个问题:




  • 为什么禁用垃圾收集加速字典创建? li>
  • 如何实现甚至性能(使用Foo init和w / o)而不禁用垃圾收集



我很乐意对此有所了解。



谢谢!



更新:
Tim Peters提到我创建了可变对象,我决定尝试创建不可变的对象(在我的情况下为元组),并且瞧 - 结果更快得多(与使用gc和没有相同)

  foo中的dict_init取0.017769秒
运行0拿0.017547秒
运行1取0.013234秒
运行2花费0.012791秒
运行3花费0.013371秒
运行4花费0.013288秒
运行5花费0.013692秒
运行6花费0.013059秒
运行7花费0.013311秒
运行8花费0.013343秒
运行9花了0.013675秒

我明白,元组创建比列表创建快得多,但为什么有一个字典不可变对象不会影响垃圾回收所花费的时间?不可参考的对象不参与参考循环?



谢谢。



事实上,在我的现实情况下,将列表转换为元组解决了这个问题(从来没有必要有列表,只是没有想到使用元组),但是我仍然很好奇为什么它更快。 / p>

解决方案

词典创作在这里真的是一个红色的鲱鱼。在这种情况下,相关的字典创作在创建了十万个新的125元素列表。因为列表可以参与引用周期,创建了1250万个列表元素。CPython的循环垃圾收集必须在每次扫描包含dict的代码时进行检查。这些列表在字典中并不重要,它们只是重要的存在。



所以你的时间在很大程度上是Python循环垃圾收集的时间。你不断创造更多的dict并不重要,只是重要的是,你创建的新的可变对象(可能涉及到周期)要比破坏旧的可变对象快得多。那个(超过释放分配的分配)是什么触发CPython的循环gc)。



你可以做的不多,唉。通过详细划分的阶段的程序可以通过在持续时间内禁用循环gc 获益。不能猜测这是否适用于您。



啊,忘了一个: Foo 中的dict这么大的差异,因为 Foo 坚持。您创建的所有其他dicts在构造之后立即被丢弃(CPython的引用计数负责),因此不要添加循环gc所消耗的时间。但是, Foo 中的dict是因为 dict不会消失。改变你的循环以将新的命令附加到列表中,你会看到每个dict的时间增加,然后下降很多,然后再次上升等等。这反映了在Python的循环gc中移动到老一辈的人,所以慢慢扫描循环gc。它变得复杂; - )



更多细节



很难更精确,在特定情况下,循环gc取决于实现细节的山峰,这些实现细节可以 - 并且可以在版本中更改。



尽可能使用不可变对象的一般建议是基于一个相当深刻的---)了解如何循环gc在CPython中实现,以及它如何演变多年来。



这样的事情,今天 >(最新版本的Python 2和Python 3),大力努力免除某些元组和循环gc中的数字。这可能会改变。特殊套管这样的东西不是免费的,所以决定是否添加更多的这样的技巧总是一个困难的平衡行为。对于不可变的对象来说,这是一个更容易的决定,因此要向那些提出建议。



截止到2008年底为止,元组和Dict没有特殊情况,一个href =http://bugs.python.org/issue4688>在Python问题跟踪器。



而 - - 惊喜;-) - 在某些的罕见案例中,这可能会产生可怕的性能后果,这些情况已被此问题中更为特殊的内容修复一个好消息是,添加了一个 gc.is_tracked()函数。所以你可以至少调查循环gc是否要扫描一个特定的对象。以下是Python 2.7.5中的一些示例。不能保证他们总是这样工作:



Scalar对象(无内部指针)从不跟踪 - 它们不可能处于一个循环中: / p>

 >>> import gc 
>>> gc.is_tracked(4)
False
>>> gc.is_tracked(2323)
False

元组最初是<追踪:

 >>> t1 =([1],)
>>>> t2 =((1.),)
>>> gc.is_tracked(t1),gc.is_tracked(t2)
(True,True)

但是,如果循环gc运行并确定元组是不可变的一直下降,它将解除该元组:

  >>> gc.collect()
0
>>> gc.is_tracked(t1),gc.is_tracked(t2)
(True,False)

t2 无法做到这一点,使其参与一个循环,因为它和它的所有组件(在& on,一路下来)都是不可变的。但是 t1 仍然需要跟踪!因为 t1 [0] 是可变的, t1 可能

 >>> t1 
([1],)
>>> t1 [0] [0] = t1
>>>> t1
([([...],)],)

恰好用于dicts。

d = {1:[2]}
>>> gc.is_tracked(d)
True

由于该dict具有可变值, em>可以稍后成为一个循环的一部分,所以必须由循环gc跟踪。

 >> > d [1] [0] = d 
>>> d
{1:[{...}]}

但是没有跟踪的dict键和值被创建未追溯:

 >>> d = {1:2} 
>>> gc.is_tracked(d)
False

这是棘手的,因为这样的dict可以成为一个循环的一部分!

 >>> d [2] = d 
>>> gc.is_tracked(d)
True

检测不到这样的变化是不免费的。 p>

然后有以上的组合:

 >> ; d = {(1,2):(4,abc,5)} 
>>> gc.is_tracked(d)
True
>>> gc.collect()
3
>>> gc.is_tracked(d)
False

在这种情况下,首先跟踪d ,因为它的键和值(元组)首先被创建。但是循环gc第一次运行后,它确定键和值是不可变的一直下降,所以解开dict。



像我说的,它变得复杂; - )



BTW,


比列表创建快得多


创建列表应该稍慢一些。列表和元组在CPython中有非常相似的实现。元组需要更少的内存(对于非常短的序列,这可能是百分比重要的),并且比列表索引元组可能要快一些。但创作时间?一个 malloc()类函数(对于一个元组)与两个(对于一个列表)的区别,与元素数目无关。这对于非常短的序列可能是重要的,但不适用于大的序列。


I'm having some difficulties understanding (and ultimately solving) why having a large dictionary in memory makes creation of other dictionaries longer.

Here's the test code that I'm using

import time
def create_dict():
    # return {x:[x]*125 for x in xrange(0, 100000)}
    return  {x:(x)*125 for x in xrange(0, 100000)}  # UPDATED: to use tuples instead of list of values


class Foo(object):
    @staticmethod
    def dict_init():
        start = time.clock()
        Foo.sample_dict = create_dict()
        print "dict_init in Foo took {0} sec".format(time.clock() - start)

if __name__ == '__main__':
    Foo.dict_init()
    for x in xrange(0, 10):
        start = time.clock()
        create_dict()
        print "Run {0} took {1} seconds".format(x, time.clock() - start)

If I run the code as is (first initializing sample_dict in Foo) and then creating the same dictionary 10 more times in the loop I get the following results:

dict_init in Foo took 0.385263764287 sec
Run 0 took 0.548807949139 seconds
Run 1 took 0.533209452471 seconds
Run 2 took 0.51916067636 seconds
Run 3 took 0.513130722575 seconds
Run 4 took 0.508272050029 seconds
Run 5 took 0.502263872177 seconds
Run 6 took 0.48867601998 seconds
Run 7 took 0.483109299676 seconds
Run 8 took 0.479019713488 seconds
Run 9 took 0.473174195256 seconds
[Finished in 5.6s]

If, however, I do NOT initialize sample_dict in Foo (commenting out Foo.dict_init()) I'm getting almost 20% faster dictionary creation in the loop

Run 0 took 0.431378921359 seconds
Run 1 took 0.423696636179 seconds
Run 2 took 0.419630475616 seconds
Run 3 took 0.405130343806 seconds
Run 4 took 0.398099686921 seconds
Run 5 took 0.392837169802 seconds
Run 6 took 0.38799598399 seconds
Run 7 took 0.375133006408 seconds
Run 8 took 0.368755297573 seconds
Run 9 took 0.363273701371 seconds
[Finished in 4.0s]

I've noticed that if I turn OFF Python's garbage collector by calling gc.disable() performance not only improves ~5x but storing large dictionary in Foo doesn't make a difference. Here are results with garbage collection disabled:

dict_init in Foo took 0.0696136982496 sec
Run 0 took 0.113533445358 seconds
Run 1 took 0.111091241489 seconds
Run 2 took 0.111151620212 seconds
Run 3 took 0.110655722831 seconds
Run 4 took 0.111807537706 seconds
Run 5 took 0.11097510318 seconds
Run 6 took 0.110936170451 seconds
Run 7 took 0.111074414632 seconds
Run 8 took 0.110678488579 seconds
Run 9 took 0.111011066463 seconds

So I have 2 questions:

  • Why does disabling garbage collection speeds up dictionary creation
  • How to achieve even performance (with Foo init and w/o) without disable garbage collection

I would appreciate any insight on this.

Thank you!

UPDATED: After Tim Peters mentioned that I'm creating mutable objects, I've decided to try to create immutable objects (tuples in my case) and voila - much, much faster results (same with using gc and without)

dict_init in Foo took 0.017769 sec
Run 0 took 0.017547 seconds
Run 1 took 0.013234 seconds
Run 2 took 0.012791 seconds
Run 3 took 0.013371 seconds
Run 4 took 0.013288 seconds
Run 5 took 0.013692 seconds
Run 6 took 0.013059 seconds
Run 7 took 0.013311 seconds
Run 8 took 0.013343 seconds
Run 9 took 0.013675 seconds

I understand that tuple creation is much faster than list creation but why does having a dictionary of immutable objects doesn't affect time spent by garbage collection? Are immutable objects not involved in reference cycle?

Thank you.

P.S. As it happens, in my real-world scenario, converting list to tuples solved the problem (there was never a need to have lists, just didn't think of using tuples) but I'm still curious as to why it's faster.

解决方案

"Dictionary creation" is really a red herring here. What the dictionary creation does in this case that's relevant is that it creates a hundred thousand new 125-element lists. Because lists can be involved in reference cycles, that creates 12.5 million list elements CPython's cyclic garbage collection has to examine whenever it scans a generation containing a dict. It doesn't matter that these lists are in dictionaries, it only matters that they exist.

So what you're timing is largely the time consumed by Python's cyclic garbage collection. It doesn't particularly matter that you keep on creating more dicts, it only matters that you're creating new mutable objects (which can be involved in cycles) much faster than you're destroying old mutable objects. That (an excess of allocations over deallocations) is what triggers CPython's cyclic gc).

Not much you can do about it, alas. Programs that go through well-delineated phases of creating mounds of new objects can benefit by disabling cyclic gc for the duration. Can't guess whether that applies to you, though.

Ah, forgot one: the dict in Foo makes such a big difference because Foo "sticks around". All the other dicts you create are thrown away immediately after being constructed (CPython's reference counting is responsible for that), so don't add to the time consumed by cyclic gc. But the dict in Foo does, because that dict doesn't go away. Change your loop to append the new dicts to a list, and you'll see that the time goes up for each dict, then drops a lot, then goes up again, etc. That reflects dicts moving to older generations inside Python's cyclic gc, so getting scanned by cyclic gc less frequently. It gets complicated ;-)

More details?

It's hard to be more precise, because the performance of cyclic gc in specific cases depends on mountains of implementation details that can - and do - change across releases.

The general advice to use "immutable objects" when possible is based on a rather deep ;-) understanding of how cyclic gc is implemented in CPython, and how it's evolved over the years.

It so happens that today (most recent versions of Python 2 and Python 3), strong efforts are made to exempt certain tuples and dicts from cyclic gc. That may change. Special-casing such things isn't free, so deciding whether to add more tricks like this is always a difficult balancing act. It's an easier decision for immutable objects, hence the advice to move towards those.

Tuples and dicts weren't special-cased at all until very late 2008, as detailed in this from the Python issue tracker.

And - surprise ;-) - that turned out to have horrible performance consequences in some rare cases, which were fixed by more special-casing in this issue in mid 2012.

Some good news is that a gc.is_tracked() function was added so you can at least investigate whether cyclic gc is going to scan a specific object. Here are some examples under Python 2.7.5. There's no guarantee they'll always work this way:

"Scalar" objects (no internal pointers) are never tracked - it's impossible for them to be in a cycle:

>>> import gc
>>> gc.is_tracked(4)
False
>>> gc.is_tracked("2323")
False

Tuples are initially tracked:

>>> t1 = ([1],)
>>> t2 = ((1.),)
>>> gc.is_tracked(t1), gc.is_tracked(t2)
(True, True)

However, if cyclic gc runs and determines that a tuple is immutable "all the way down", it untracks that tuple:

>>> gc.collect()
0
>>> gc.is_tracked(t1), gc.is_tracked(t2)
(True, False)

There's nothing that can be done to t2 to make it participate in a cycle, because it, and all its components (on & on, all the way down) are immutable. But t1 still needs to be tracked! Because t1[0] is mutable, t1 may be part of a cycle later:

>>> t1
([1],)
>>> t1[0][0] = t1
>>> t1
([([...],)],)

A different policy happens to be used for dicts. They're created untracked, if possible:

>>> d = {1: [2]}
>>> gc.is_tracked(d)
True

Because that dict has a mutable value, it could become part of a cycle later, so must be tracked by cyclic gc.

>>> d[1][0] = d
>>> d
{1: [{...}]}

But a dict with untracked keys and values is created untracked:

>>> d = {1: 2}
>>> gc.is_tracked(d)
False

This is tricky, because such a dict can still become part of a cycle later!

>>> d[2] = d
>>> gc.is_tracked(d)
True

It's not free to detect such changes.

Then there are combinations of the above:

>>> d = {(1, 2): (4, "abc", 5)}
>>> gc.is_tracked(d)
True
>>> gc.collect()
3
>>> gc.is_tracked(d)
False

In this case, d is tracked at first, because its keys and values (the tuples) are created tracked at first. But after cyclic gc runs the first time, it determines that the keys and values are "immutable all the way down", so untracks the dict.

Like I said, it gets complicated ;-)

BTW,

I understand that tuple creation is much faster than list creation

It should be only slightly slower to create a list. Lists and tuples have very similar implementations in CPython. tuples require a little less memory (which can be significant, in percentage terms, for very short sequences), and it can be a little faster to index a tuple than a list. But creation time? It's the difference between one malloc()-like function (for a tuple) versus two (for a list), independent of the number of elements. That can be significant for very short sequences, but not for large ones.

这篇关于在Python中保留大字典会影响应用程序的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆