OrderedDict vs defaultdict vs dict [英] OrderedDict vs defaultdict vs dict

查看:87
本文介绍了OrderedDict vs defaultdict vs dict的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python的库中,我们现在有两个Python实现字典,它们分别位于本地 dict 之外的子类 dict 类型。

In python's library, we now have two Python Implementation of dictionaries which subclasses dict over and above the native dict type.

Python的倡导者一直喜欢使用 dict.setdefault defaultdict code>尽可能。即使 doc 也引用这种技术更简单,比使用dict.setdefault()的等效技术更快:

Python's advocates have always preferred to defaultdict over using dict.setdefault where possible. Even the doc quotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():

以类似的方式,由于字典不维护顺序,使用 OrderedDict 超过使用 dict 后跟排序的项是首选,有可能替代使用。

In similar ways, as dictionaries does not maintain order, using OrderedDict over using dict followed by sorting the items is preferred when ever possible for the alternative usage.

在上述两种情况下,代码绝对清晰,但是以性能损失为代价。

In both the above case, the code is definitely cleaner but at the cost of performance penalty.

在回答并评论基于项目的python唯一列表,当我使用 dict 时,我绊倒了性能损失c $ c> defaultdict 和 OrderedDict 。数据的大小也似乎并不重要,性能优势 dict 解决方案已经超过其他。

While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dict when using defaultdict and OrderedDict. It also seems the size of the data is also not immaterial to the performance advantage dict solution has over others.

我相信应该有一个 - 最好只有一个 - 显而易见的方式。什么是首选方式?

I believe There should be one-- and preferably only one --obvious way to do it., so what is the preferred way?

推荐答案

没有一个单一的答案,而不是一个真实而唯一的dict。在许多变量中,它取决于:

There is not one single answer and not one true and only dict. Among many variables, it depends on:


  1. 数据集的大小;

  2. 唯一的数量键;

  3. defaultdict的底层工厂的速度;

  4. OrderDict与某些后续订购步骤的速度。

  1. Size of the data set;
  2. Number of unique keys;
  3. Speed of the underlying factory for defaultdict;
  4. Speed of OrderDict vs some later ordering step.

我概括的是,但是这里有一些一般性:

I am loathe to generalize, but here are some generalities:


  1. 语句这种技术比使用dict.setdefault()的等效技术更简单和更快,只是平常错误。这取决于数据;

  2. setdefault 使用小数据集更快更简单;

  3. defaultdict 对于具有更均匀密钥集的较大数据集更快;

  4. setdefault 具有更多异构密钥集的优势;

  5. 这些结果对于Python 3与Python 2是不同的;

  6. 在所有情况下,OrderedDict 比依赖于顺序和顺序的算法不是重建或排序不容易。

  1. The statement This technique is simpler and faster than an equivalent technique using dict.setdefault() is just flat wrong. It depends on the data;
  2. setdefault is faster and simpler with small data sets;
  3. defaultdict is faster for larger data sets with more homogenous key sets;
  4. setdefault has an advantage with more heterogeneous key sets;
  5. these results are different for Python 3 vs Python 2;
  6. OrderedDict is slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort.

唯一的事实是:它取决于所有这三种技术都很有用。

The only truth: It Depends! All three technique are useful.

以下是一些时间代码:

from __future__ import print_function
from collections import defaultdict
from collections import OrderedDict

try:
    t=unichr(100)
except NameError:
    unichr=chr

def f1(li):
    '''defaultdict'''
    d = defaultdict(list)
    for k, v in li:
        d[k].append(v)
    return d.items()

def f2(li):
    '''setdefault'''
    d={}
    for k, v in li:
        d.setdefault(k, []).append(v)
    return d.items()

def f3(li):
    '''OrderedDict'''
    d=OrderedDict()
    for k, v in li:
        d.setdefault(k, []).append(v)
    return d.items()      


if __name__ == '__main__':
    import timeit
    import sys
    print(sys.version)
    few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
    fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)'
    for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]:
        for f in [f1,f2,f3]:
            s = few*m
            res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
            st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
            print(st)
            s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)]
            res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
            st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
            print(st)            
        print() 

Python 2.7结果:

Python 2.7 result:

2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]
 defaultdict:      10.20 micro sec/call (25 elements, 3 keys)
 defaultdict:      21.08 micro sec/call (25 elements, 25 keys)
  setdefault:      13.41 micro sec/call (25 elements, 3 keys)
  setdefault:      18.24 micro sec/call (25 elements, 25 keys)
 OrderedDict:      49.47 micro sec/call (25 elements, 3 keys)
 OrderedDict:     102.16 micro sec/call (25 elements, 25 keys)

 defaultdict:      28.28 micro sec/call (100 elements, 3 keys)
 defaultdict:      79.78 micro sec/call (100 elements, 100 keys)
  setdefault:      45.68 micro sec/call (100 elements, 3 keys)
  setdefault:      68.66 micro sec/call (100 elements, 100 keys)
 OrderedDict:     117.78 micro sec/call (100 elements, 3 keys)
 OrderedDict:     343.17 micro sec/call (100 elements, 100 keys)

 defaultdict:    1123.60 micro sec/call (5,000 elements, 3 keys)
 defaultdict:    4250.44 micro sec/call (5,000 elements, 5,000 keys)
  setdefault:    2089.86 micro sec/call (5,000 elements, 3 keys)
  setdefault:    3803.03 micro sec/call (5,000 elements, 5,000 keys)
 OrderedDict:    4399.16 micro sec/call (5,000 elements, 3 keys)
 OrderedDict:   16279.14 micro sec/call (5,000 elements, 5,000 keys)

 defaultdict:    5609.39 micro sec/call (25,000 elements, 3 keys)
 defaultdict:   25351.60 micro sec/call (25,000 elements, 25,000 keys)
  setdefault:   10267.00 micro sec/call (25,000 elements, 3 keys)
  setdefault:   24091.51 micro sec/call (25,000 elements, 25,000 keys)
 OrderedDict:   22091.98 micro sec/call (25,000 elements, 3 keys)
 OrderedDict:   94028.00 micro sec/call (25,000 elements, 25,000 keys)

Python 3.3结果:

Python 3.3 result:

3.3.2 (default, May 21 2013, 11:50:47) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
 defaultdict:       8.58 micro sec/call (25 elements, 3 keys)
 defaultdict:      21.18 micro sec/call (25 elements, 25 keys)
  setdefault:      10.42 micro sec/call (25 elements, 3 keys)
  setdefault:      14.58 micro sec/call (25 elements, 25 keys)
 OrderedDict:      45.43 micro sec/call (25 elements, 3 keys)
 OrderedDict:      92.69 micro sec/call (25 elements, 25 keys)

 defaultdict:      20.47 micro sec/call (100 elements, 3 keys)
 defaultdict:      77.48 micro sec/call (100 elements, 100 keys)
  setdefault:      34.22 micro sec/call (100 elements, 3 keys)
  setdefault:      54.86 micro sec/call (100 elements, 100 keys)
 OrderedDict:     107.37 micro sec/call (100 elements, 3 keys)
 OrderedDict:     318.98 micro sec/call (100 elements, 100 keys)

 defaultdict:     714.70 micro sec/call (5,000 elements, 3 keys)
 defaultdict:    3892.92 micro sec/call (5,000 elements, 5,000 keys)
  setdefault:    1502.91 micro sec/call (5,000 elements, 3 keys)
  setdefault:    2888.08 micro sec/call (5,000 elements, 5,000 keys)
 OrderedDict:    3912.95 micro sec/call (5,000 elements, 3 keys)
 OrderedDict:   14863.02 micro sec/call (5,000 elements, 5,000 keys)

 defaultdict:    3649.02 micro sec/call (25,000 elements, 3 keys)
 defaultdict:   22313.17 micro sec/call (25,000 elements, 25,000 keys)
  setdefault:    7447.28 micro sec/call (25,000 elements, 3 keys)
  setdefault:   18426.88 micro sec/call (25,000 elements, 25,000 keys)
 OrderedDict:   19202.17 micro sec/call (25,000 elements, 3 keys)
 OrderedDict:   85946.45 micro sec/call (25,000 elements, 25,000 keys)

这篇关于OrderedDict vs defaultdict vs dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆