Python timeit令人惊讶的结果:Counter()vs defaultdict()vs dict() [英] Surprising results with Python timeit: Counter() vs defaultdict() vs dict()

查看:138
本文介绍了Python timeit令人惊讶的结果:Counter()vs defaultdict()vs dict()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在timeit上获得了非常令人惊讶的结果,有人可以告诉我我做错了什么吗?我正在使用Python 2.7.

I obtained very surprising results with timeit, can someone tell me if I am doing something wrong ? I am using Python 2.7.

这是文件speedtest_init.py的内容:

This is the contents of file speedtest_init.py:

import random

to_count = [random.randint(0, 100) for r in range(60)]

这些是speedtest.py的内容:

These are the contents of speedtest.py:

__author__ = 'BlueTrin'

import timeit

def test_init1():
    print(timeit.timeit('import speedtest_init'))

def test_counter1():
    s = """\
    d = defaultdict(int);
    for i in speedtest_init.to_count:
        d[i] += 1
    """
    print(timeit.timeit(s, 'from collections import defaultdict; import speedtest_init;'))

def test_counter2():
    print(timeit.timeit('d = Counter(speedtest_init.to_count);', 'from collections import Counter; import speedtest_init;'))


if __name__ == "__main__":
    test_init1()
    test_counter1()
    test_counter2()

控制台输出为:

C:\Python27\python.exe C:/Dev/codility/chlorum2014/speedtest.py
2.71501962931
65.7090444503
91.2953839048

Process finished with exit code 0

我认为默认情况下timeit()运行代码的1000000倍,因此我需要将时间除以1000000,但是令人惊讶的是,Counter比defaultdict()慢.

I think by default timeit() runs 1000000 times the code, so I need to divide the times by 1000000, but what is surprising is that the Counter is slower than the defaultdict().

那是预期的吗?

使用dict也比defaultdict(int)更快:

Also using a dict is faster than a defaultdict(int):

def test_counter3():
    s = """\
    d = {};
    for i in speedtest_init.to_count:
        if i not in d:
            d[i] = 1
        else:
            d[i] += 1
    """
    print(timeit.timeit(stmt=s, setup='from collections import defaultdict; import speedtest_init;')

最后一个版本比defaultdict(int)快,这意味着除非您更关心可读性,否则应使用dict()而不是defaultdict().

this last version is faster than the defaultdict(int) meaning that unless you care more about readability you should use the dict() rather than the defaultdict().

推荐答案

是的,这是预期的; Counter() 构造函数使用Counter.update(),而Counter.update()使用self.get()加载初始值,而不是依赖__missing__.

Yes, this is expected; the Counter() constructor uses Counter.update() which uses self.get() to load initial values rather than rely on __missing__.

此外,defaultdict __missing__工厂完全用C代码处理,尤其是当使用诸如int()之类的类型,而该类型本身是在C中实现的.Counter源是纯Python,因此方法需要Python框架才能执行.

Moreover, the defaultdict __missing__ factory is handled entirely in C code, especially when using another type like int() that is itself implemented in C. The Counter source is pure Python and as such the Counter.__missing__ method requires a Python frame to execute.

由于dict.get()仍在C语言中处理,因此对于Counter()而言,构造方法是更快的方法,只要您使用Counter.update()所使用的相同技巧并为self.get查找指定别名作为本地优先项:

Because dict.get() is still handled in C, the constructor approach is the faster approach for a Counter(), provided you use the same trick Counter.update() uses and alias the self.get lookup as a local first:

>>> import timeit
>>> import random
>>> to_count = [random.randint(0, 100) for r in range(60)]
>>> timeit.timeit('for i in to_count: c[i] += 1',
...               'from collections import Counter; from __main__ import to_count; c = Counter()',
...               number=10000)
0.2510359287261963
>>> timeit.timeit('for i in to_count: c[i] = c_get(i, 0) + 1',
...               'from collections import Counter; from __main__ import to_count; c = Counter(); c_get = c.get',
...               number=10000)
0.20978617668151855

defaultdictCounter都是基于其功能而非性能而构建的有用的类.不依赖__missing__钩子可以更快:

Both defaultdict and Counter are helpful classes built for their functionality, not their performance; not relying on the __missing__ hook can be faster still:

>>> timeit.timeit('for i in to_count: d[i] = d_get(i, 0) + 1',
...               'from __main__ import to_count; d = {}; d_get = d.get',
...               number=10000)
0.11437392234802246

这是使用别名dict.get()方法以实现最大速度的常规词典.但是然后,您还必须重新实现CounterCounter.most_common()方法的bag行为. defaultdict用例远远超出了计算范围.

That's a regular dictionary using an aliased dict.get() method for maximum speed. But then you'll also have to re-implement the bag behaviour of Counter, or the Counter.most_common() method. The defaultdict use cases go way beyond counting.

在Python 3.2中,通过添加一个处理这种情况的C库,更新Counter()可以提高速度.请参见问题10667 .在Python 3.4上进行测试后,Counter()构造函数现在击败了别名dict.get的情况:

In Python 3.2, updating a Counter() got a speed boost by adding a C library that handles this case; see issue 10667. Testing on Python 3.4, the Counter() constructor now beats the aliased dict.get case:

>>> timeit.timeit('Counter(to_count)',
...               'from collections import Counter; from __main__ import to_count',
...               number=100000)
0.8332311600097455
>>> timeit.timeit('for i in to_count: d[i] = d_get(i, 0) + 1',
...               'from __main__ import to_count; d = {}; d_get = d.get',
...               number=100000)
0.961191965994658

这篇关于Python timeit令人惊讶的结果:Counter()vs defaultdict()vs dict()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆