在Python中的OrderedDict vs Dict [英] OrderedDict vs Dict in python
问题描述
在蒂姆·彼得的回答中,有没有任何理由不使用有序字典,他说
OrderedDict是dict的子类。
/ q / 25054003/1860929>特别问题,我使用 ipython
尝试了一些抽样检查,并且这两个都与早先的推理相矛盾:
-
dict
和OrderedDict
的大小相同 - 在
OrderedDict
上运行的时间比在dict上运行的时间要多7-8倍code>(因此很慢)
有人可以向我解释我在推理中的错误吗? / p>
创建一个大的Dict和OrderedDict nd比较大小:
import sys
import random
from collections import OrderedDict
test_dict = {}
test_ordered_dict = OrderedDict()
在范围(10000)中:
test_dict [key] = random.random()
test_ordered_dict [key] = random.random()
sys.getsizeof(test_dict)
786712
sys.getsizeof(test_ordered_dict)
786712
使用%timeit
检查插入时间: / h3>
import sys
import random
从集合导入OrderedDict
def operate_on_dict r):
test_dict = {}
在范围(r)中:
test_dict [key] = random.random()
def operate_on_ordered_dict(r) :
test_ordered_dict = OrderedDict()
范围(r)中的键值:
test_ordered_dict [key] = random.random()
%x范围内的时间(100):operate_on_ordered_dict(100)
100个循环,最好3:9.24 ms每循环
%x在范围(100)中的时间:operating_on_dict(100)
1000循环,最好3:1.23 ms每循环
import sys
import random
从集合导入OrderedDict
def operate_on_dict r):
test_dict = {}
在范围(r)中:
test_dict [key] = random.random()
def operate_on_ordered_dict(r) :
test_ordered_dict = OrderedDict()
范围(r)中的键值:
test_ordered_dict [key] = random.random()
%x范围内的时间(100):operate_on_ordered_dict(100)
100个循环,最好3:9.24 ms每循环
%x在范围(100)中的时间:operating_on_dict(100)
1000循环,最好3:1.23 ms每循环
我认为大小的问题是由于没有 __ sizeof __
在Python 2.X中定义的方法执行 OrderedDict
,所以它只是回到dict的 __ sizeof __
方法。 / p>
为了证明这一点,我在这里创建了一个类 A
,其中扩展了 list
,并添加了一个额外的方法
foo
来检查是否影响大小。
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b $ b print'abcde'
>>> a = A(range(1000))
>>>> b = list(range(1000))
但是, sys.getsizeof
:
>>> sys.getsizeof(a),sys.getsizeof(b)
(9120,9120)
当然 A
将会很慢,因为它的方法在Python中运行,而列表的方法将以纯C运行。
>>> %% timeit
... for _ in xrange(1000):
... a [_]
...
1000循环,最好为3:449μsper循环
>>>> %% timeit
for _ in xrange(1000):
b [_]
...
10000循环,最佳3:52μs每循环
这似乎是在Python 3中修复的,其中有一个定义明确的 __ sizeof __
方法现在:
def __sizeof __(self):
sizeof = _sys.getsizeof
n = len(self)+ 1# root
size = sizeof(self .__ dict__)#实例字典
size + = sizeof(self .__ map)* 2#internal dict和inherited dict
size + = sizeof(self .__ hardroot) * n#link对象
size + = sizeof(self .__ root)* n#代理对象
返回大小
In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says
OrderedDict is a subclass of dict.
It's not a lot slower, but at least doubles the memory over using a plain dict.
Now, while going through a particular question, I tried some sample checks using ipython
and both of them contradict the earlier reasoning:
- both
dict
andOrderedDict
are of same size - operating on an
OrderedDict
takes easily around 7-8 times more time than operating on adict
(Hence a lot slower)
Can someone explain to me where I'm going wrong in my reasoning?
Create a large Dict and OrderedDict and compare sizes:
import sys
import random
from collections import OrderedDict
test_dict = {}
test_ordered_dict = OrderedDict()
for key in range(10000):
test_dict[key] = random.random()
test_ordered_dict[key] = random.random()
sys.getsizeof(test_dict)
786712
sys.getsizeof(test_ordered_dict)
786712
Check time taken for the insertions using %timeit
:
import sys
import random
from collections import OrderedDict
def operate_on_dict(r):
test_dict = {}
for key in range(r):
test_dict[key] = random.random()
def operate_on_ordered_dict(r):
test_ordered_dict = OrderedDict()
for key in range(r):
test_ordered_dict[key] = random.random()
%timeit for x in range(100): operate_on_ordered_dict(100)
100 loops, best of 3: 9.24 ms per loop
%timeit for x in range(100): operate_on_dict(100)
1000 loops, best of 3: 1.23 ms per loop
I think the problem with size is due to the fact that there's no __sizeof__
method defined in Python 2.X implementation of OrderedDict
, so it simply falls back to dict's __sizeof__
method.
To prove this here I've created a class A
here which extends list
and also added an additional method foo
to check if that affects the size.
class A(list):
def __getitem__(self, k):
return list.__getitem__(self, k)
def foo(self):
print 'abcde'
>>> a = A(range(1000))
>>> b = list(range(1000))
But still same size is returned by sys.getsizeof
:
>>> sys.getsizeof(a), sys.getsizeof(b)
(9120, 9120)
Of course A
is going to be slow because its methods are running in Python while list's method will run in pure C.
>>> %%timeit
... for _ in xrange(1000):
... a[_]
...
1000 loops, best of 3: 449 µs per loop
>>> %%timeit
for _ in xrange(1000):
b[_]
...
10000 loops, best of 3: 52 µs per loop
And this seems to be fixed in Python 3 where there's a well defined __sizeof__
method now:
def __sizeof__(self):
sizeof = _sys.getsizeof
n = len(self) + 1 # number of links including root
size = sizeof(self.__dict__) # instance dictionary
size += sizeof(self.__map) * 2 # internal dict and inherited dict
size += sizeof(self.__hardroot) * n # link objects
size += sizeof(self.__root) * n # proxy objects
return size
这篇关于在Python中的OrderedDict vs Dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!