在Python中的OrderedDict vs Dict [英] OrderedDict vs Dict in python

查看:246
本文介绍了在Python中的OrderedDict vs Dict的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

蒂姆·彼得的回答中,有没有任何理由不使用有序字典,他说


OrderedDict是dict的子类。



这不是很慢,但至少现在,在通过一个

/ q / 25054003/1860929>特别问题,我使用 ipython 尝试了一些抽样检查,并且这两个都与早先的推理相矛盾:


  1. dict OrderedDict 的大小相同

  2. OrderedDict 上运行的时间比在 dict上运行的时间要多7-8倍code>(因此很慢)

有人可以向我解释我在推理中的错误吗? / p>




创建一个大的Dict和OrderedDict nd比较大小:



  import sys 
import random
from collections import OrderedDict

test_dict = {}
test_ordered_dict = OrderedDict()

在范围(10000)中:
test_dict [key] = random.random()
test_ordered_dict [key] = random.random()

sys.getsizeof(test_dict)
786712

sys.getsizeof(test_ordered_dict)
786712



使用%timeit 检查插入时间: / h3>

  import sys 
import random
从集合导入OrderedDict

def operate_on_dict r):
test_dict = {}
在范围(r)中:
test_dict [key] = random.random()

def operate_on_ordered_dict(r) :
test_ordered_dict = OrderedDict()
范围(r)中的键值:
test_ordered_dict [key] = random.random()

%x范围内的时间(100):operate_on_ordered_dict(100)
100个循环,最好3:9.24 ms每循环

%x在范围(100)中的时间:operating_on_dict(100)
1000循环,最好3:1.23 ms每循环


解决方案

我认为大小的问题是由于没有 __ sizeof __ 在Python 2.X中定义的方法执行 OrderedDict ,所以它只是回到dict的 __ sizeof __ 方法。 / p>

为了证明这一点,我在这里创建了一个类 A ,其中扩展了 list ,并添加了一个额外的方法 foo 来检查是否影响大小。

 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b $ b print'abcde'

>>> a = A(range(1000))
>>>> b = list(range(1000))

但是, sys.getsizeof

 >>> sys.getsizeof(a),sys.getsizeof(b)
(9120,9120)

当然 A 将会很慢,因为它的方法在Python中运行,而列表的方法将以纯C运行。

 >>> %% timeit 
... for _ in xrange(1000):
... a [_]
...
1000循环,最好为3:449μsper循环
>>>> %% timeit
for _ in xrange(1000):
b [_]
...
10000循环,最佳3:52μs每循环

这似乎是在Python 3中修复的,其中有一个定义明确的 __ sizeof __ 方法现在:

  def __sizeof __(self):
sizeof = _sys.getsizeof
n = len(self)+ 1# root
size = sizeof(self .__ dict__)#实例字典
size + = sizeof(self .__ map)* 2#internal dict和inherited dict
size + = sizeof(self .__ hardroot) * n#link对象
size + = sizeof(self .__ root)* n#代理对象
返回大小


In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says

OrderedDict is a subclass of dict.

It's not a lot slower, but at least doubles the memory over using a plain dict.

Now, while going through a particular question, I tried some sample checks using ipython and both of them contradict the earlier reasoning:

  1. both dict and OrderedDict are of same size
  2. operating on an OrderedDict takes easily around 7-8 times more time than operating on a dict (Hence a lot slower)

Can someone explain to me where I'm going wrong in my reasoning?


Create a large Dict and OrderedDict and compare sizes:

import sys
import random
from collections import OrderedDict

test_dict = {}
test_ordered_dict = OrderedDict()

for key in range(10000):
    test_dict[key] = random.random()
    test_ordered_dict[key] = random.random()

sys.getsizeof(test_dict)
786712

sys.getsizeof(test_ordered_dict)
786712

Check time taken for the insertions using %timeit:

import sys
import random
from collections import OrderedDict

def operate_on_dict(r):
    test_dict = {}
    for key in range(r):
        test_dict[key] = random.random()

def operate_on_ordered_dict(r):
    test_ordered_dict = OrderedDict()
    for key in range(r):
        test_ordered_dict[key] = random.random()

%timeit for x in range(100): operate_on_ordered_dict(100)
100 loops, best of 3: 9.24 ms per loop

%timeit for x in range(100): operate_on_dict(100)
1000 loops, best of 3: 1.23 ms per loop

解决方案

I think the problem with size is due to the fact that there's no __sizeof__ method defined in Python 2.X implementation of OrderedDict, so it simply falls back to dict's __sizeof__ method.

To prove this here I've created a class A here which extends list and also added an additional method foo to check if that affects the size.

class A(list):
    def __getitem__(self, k):
        return list.__getitem__(self, k)
    def foo(self):
        print 'abcde'

>>> a = A(range(1000))
>>> b = list(range(1000))

But still same size is returned by sys.getsizeof:

>>> sys.getsizeof(a), sys.getsizeof(b)
(9120, 9120)

Of course A is going to be slow because its methods are running in Python while list's method will run in pure C.

>>> %%timeit
... for _ in xrange(1000):
...     a[_]
... 
1000 loops, best of 3: 449 µs per loop
>>> %%timeit
for _ in xrange(1000):
    b[_]
... 
10000 loops, best of 3: 52 µs per loop

And this seems to be fixed in Python 3 where there's a well defined __sizeof__ method now:

def __sizeof__(self):
    sizeof = _sys.getsizeof
    n = len(self) + 1                       # number of links including root
    size = sizeof(self.__dict__)            # instance dictionary
    size += sizeof(self.__map) * 2          # internal dict and inherited dict
    size += sizeof(self.__hardroot) * n     # link objects
    size += sizeof(self.__root) * n         # proxy objects
    return size

这篇关于在Python中的OrderedDict vs Dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆