OrderedDict vs defaultdict vs dict [英] OrderedDict vs defaultdict vs dict
问题描述
在python的库中,我们现在有两个Python实现字典,它们分别位于本地 dict
之外的子类 dict
类型。
In python's library, we now have two Python Implementation of dictionaries which subclasses dict
over and above the native dict
type.
Python的倡导者一直喜欢使用 dict.setdefault defaultdict
code>尽可能。即使 doc 也引用这种技术更简单,比使用dict.setdefault()的等效技术更快:
Python's advocates have always preferred to defaultdict
over using dict.setdefault
where possible. Even the doc quotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():
以类似的方式,由于字典不维护顺序,使用 OrderedDict
超过使用 dict
后跟排序的项是首选,有可能替代使用。
In similar ways, as dictionaries does not maintain order, using OrderedDict
over using dict
followed by sorting the items is preferred when ever possible for the alternative usage.
在上述两种情况下,代码绝对清晰,但是以性能损失为代价。
In both the above case, the code is definitely cleaner but at the cost of performance penalty.
在回答并评论基于项目的python唯一列表,当我使用 dict
时,我绊倒了性能损失c $ c> defaultdict 和 OrderedDict
。数据的大小也似乎并不重要,性能优势 dict
解决方案已经超过其他。
While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dict
when using defaultdict
and OrderedDict
. It also seems the size of the data is also not immaterial to the performance advantage dict
solution has over others.
我相信应该有一个 - 最好只有一个 - 显而易见的方式。
什么是首选方式?
I believe There should be one-- and preferably only one --obvious way to do it.
, so what is the preferred way?
推荐答案
没有一个单一的答案,而不是一个真实而唯一的dict。在许多变量中,它取决于:
There is not one single answer and not one true and only dict. Among many variables, it depends on:
- 数据集的大小;
- 唯一的数量键;
- defaultdict的底层工厂的速度;
- OrderDict与某些后续订购步骤的速度。
- Size of the data set;
- Number of unique keys;
- Speed of the underlying factory for defaultdict;
- Speed of OrderDict vs some later ordering step.
我概括的是,但是这里有一些一般性:
I am loathe to generalize, but here are some generalities:
- 语句
这种技术比使用dict.setdefault()
的等效技术更简单和更快,只是平常错误。这取决于数据; -
setdefault
使用小数据集更快更简单; -
defaultdict
对于具有更均匀密钥集的较大数据集更快; -
setdefault
具有更多异构密钥集的优势; - 这些结果对于Python 3与Python 2是不同的;
-
在所有情况下,OrderedDict
比依赖于顺序和顺序的算法不是重建或排序不容易。
- The statement
This technique is simpler and faster than an equivalent technique using dict.setdefault()
is just flat wrong. It depends on the data; setdefault
is faster and simpler with small data sets;defaultdict
is faster for larger data sets with more homogenous key sets;setdefault
has an advantage with more heterogeneous key sets;- these results are different for Python 3 vs Python 2;
OrderedDict
is slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort.
唯一的事实是:它取决于所有这三种技术都很有用。
The only truth: It Depends! All three technique are useful.
以下是一些时间代码:
from __future__ import print_function
from collections import defaultdict
from collections import OrderedDict
try:
t=unichr(100)
except NameError:
unichr=chr
def f1(li):
'''defaultdict'''
d = defaultdict(list)
for k, v in li:
d[k].append(v)
return d.items()
def f2(li):
'''setdefault'''
d={}
for k, v in li:
d.setdefault(k, []).append(v)
return d.items()
def f3(li):
'''OrderedDict'''
d=OrderedDict()
for k, v in li:
d.setdefault(k, []).append(v)
return d.items()
if __name__ == '__main__':
import timeit
import sys
print(sys.version)
few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)'
for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]:
for f in [f1,f2,f3]:
s = few*m
res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
print(st)
s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)]
res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
print(st)
print()
Python 2.7结果:
Python 2.7 result:
2.7.5 (default, Aug 25 2013, 00:04:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]
defaultdict: 10.20 micro sec/call (25 elements, 3 keys)
defaultdict: 21.08 micro sec/call (25 elements, 25 keys)
setdefault: 13.41 micro sec/call (25 elements, 3 keys)
setdefault: 18.24 micro sec/call (25 elements, 25 keys)
OrderedDict: 49.47 micro sec/call (25 elements, 3 keys)
OrderedDict: 102.16 micro sec/call (25 elements, 25 keys)
defaultdict: 28.28 micro sec/call (100 elements, 3 keys)
defaultdict: 79.78 micro sec/call (100 elements, 100 keys)
setdefault: 45.68 micro sec/call (100 elements, 3 keys)
setdefault: 68.66 micro sec/call (100 elements, 100 keys)
OrderedDict: 117.78 micro sec/call (100 elements, 3 keys)
OrderedDict: 343.17 micro sec/call (100 elements, 100 keys)
defaultdict: 1123.60 micro sec/call (5,000 elements, 3 keys)
defaultdict: 4250.44 micro sec/call (5,000 elements, 5,000 keys)
setdefault: 2089.86 micro sec/call (5,000 elements, 3 keys)
setdefault: 3803.03 micro sec/call (5,000 elements, 5,000 keys)
OrderedDict: 4399.16 micro sec/call (5,000 elements, 3 keys)
OrderedDict: 16279.14 micro sec/call (5,000 elements, 5,000 keys)
defaultdict: 5609.39 micro sec/call (25,000 elements, 3 keys)
defaultdict: 25351.60 micro sec/call (25,000 elements, 25,000 keys)
setdefault: 10267.00 micro sec/call (25,000 elements, 3 keys)
setdefault: 24091.51 micro sec/call (25,000 elements, 25,000 keys)
OrderedDict: 22091.98 micro sec/call (25,000 elements, 3 keys)
OrderedDict: 94028.00 micro sec/call (25,000 elements, 25,000 keys)
Python 3.3结果:
Python 3.3 result:
3.3.2 (default, May 21 2013, 11:50:47)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
defaultdict: 8.58 micro sec/call (25 elements, 3 keys)
defaultdict: 21.18 micro sec/call (25 elements, 25 keys)
setdefault: 10.42 micro sec/call (25 elements, 3 keys)
setdefault: 14.58 micro sec/call (25 elements, 25 keys)
OrderedDict: 45.43 micro sec/call (25 elements, 3 keys)
OrderedDict: 92.69 micro sec/call (25 elements, 25 keys)
defaultdict: 20.47 micro sec/call (100 elements, 3 keys)
defaultdict: 77.48 micro sec/call (100 elements, 100 keys)
setdefault: 34.22 micro sec/call (100 elements, 3 keys)
setdefault: 54.86 micro sec/call (100 elements, 100 keys)
OrderedDict: 107.37 micro sec/call (100 elements, 3 keys)
OrderedDict: 318.98 micro sec/call (100 elements, 100 keys)
defaultdict: 714.70 micro sec/call (5,000 elements, 3 keys)
defaultdict: 3892.92 micro sec/call (5,000 elements, 5,000 keys)
setdefault: 1502.91 micro sec/call (5,000 elements, 3 keys)
setdefault: 2888.08 micro sec/call (5,000 elements, 5,000 keys)
OrderedDict: 3912.95 micro sec/call (5,000 elements, 3 keys)
OrderedDict: 14863.02 micro sec/call (5,000 elements, 5,000 keys)
defaultdict: 3649.02 micro sec/call (25,000 elements, 3 keys)
defaultdict: 22313.17 micro sec/call (25,000 elements, 25,000 keys)
setdefault: 7447.28 micro sec/call (25,000 elements, 3 keys)
setdefault: 18426.88 micro sec/call (25,000 elements, 25,000 keys)
OrderedDict: 19202.17 micro sec/call (25,000 elements, 3 keys)
OrderedDict: 85946.45 micro sec/call (25,000 elements, 25,000 keys)
这篇关于OrderedDict vs defaultdict vs dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!