了解性能差异 [英] Understanding performance difference

查看:47
本文介绍了了解性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

回答这个问题 我遇到了一个有趣的情况 2 类似的代码片段的表现完全不同.我在这里询问只是为了了解其原因并提高我对此类情况的直觉.

Answering this question I faced an interesting situation 2 similar code snippets performed quite differently. I'm asking here just to understand the reason for that and to improve my intuition for such cases.

我将改编 Python 2.7 的代码片段(在 Python 3 中,性能差异相同).

I'll adapt code snippets for Python 2.7 (In Python 3 the performance differences are the same).

from collections import OrderedDict
from operator import itemgetter
from itertools import izip

items = OrderedDict([('a', 10), ('b', 9), ('c', 4), ('d', 7), ('e', 3), ('f', 0), ('g', -5), ('h', 9)])

def f1():
    return min(items, key=items.get)

def f2():
    return min(items.iteritems(), key=itemgetter(1))[0]


from timeit import Timer
N = 100000

print(Timer(stmt='f1()', setup='from __main__ import f1').timeit(number = N))
print(Timer(stmt='f2()', setup='from __main__ import f2').timeit(number = N))

输出:

0.603327797248
1.21580172899

第一个解决方案必须在 OrderedDictionary 中查找以获得每个 keyvalue.第二种解决方案只是遍历 OrderedDictionary 键值对,这些键值对必须打包成元组.

The first solution has to make lookups in OrderedDictionary to get value for each key. Second solution just iterates through OrderedDictionary key-value pairs, which have to be packed into tuples.

第二种解决方案慢了 2 倍.

The second solution is 2 times slower.

这是为什么?

我最近观看了这个视频,其中Raymond Hettinger 说 Python 倾向于重用元组,所以没有额外的分配.

I recently watched this video, where Raymond Hettinger says that Python tends to reuse tuples, so no extra allocations.

那么,这个性能问题归结为什么呢?

So, what does this performance issue boil down to?

我想详细说明我为什么这么问.

I want to elaborate a bit on why I'm asking.

第一个解决方案有字典查找.它意味着取 key 哈希,然后通过这个哈希找到 bin,然后从那个 bin 中获取键(希望不会有键冲突),然后获取与该键关联的值.

The first solution has dictionary lookup. It implies taking key hash, then finding bin by this hash, then getting key from that bin (hopefully there will be no key collisions), and then getting value associated with that key.

第二种解决方案只是遍历所有 bin 并生成这些 bin 中的所有键.它一个接一个地遍历所有的 bin,而没有计算要采用哪个 bin 的开销.是的,它必须访问与这些键关联的值,但该值距离键只有一步之遥,而第一个解决方案必须通过 hash-bin-key-value 链才能在需要时获取该值.每个解决方案都必须获取值,第一个通过 hash-bin-key-value 链获取它,第二个在访问键时通过另一个指针获取它.第二种解决方案的唯一开销是它必须将此值与键一起临时存储在元组中.事实证明,这种存储是开销的主要原因.鉴于存在所谓的元组重用"(参见上面提到的视频),我仍然不完全理解为什么会这样.

The second solution just goes through all bins and yields all the keys in those bins. It goes through all the bins one-by-one without an overhead of calculation which bin to take. Yes, it has to access values associated with those keys, but the value is only one step from the key, while the first solution has to go through hash-bin-key-value chain to get the value when it needs it. Each solution has to get the value, the first one gets it through hash-bin-key-value chain, the second gets it following one more pointer when accessing key. The only overhead of the second solution is it has to store this value in the tuple temporary along with the key. It turns out that this storing is the major reason for the overhead. Still I don't fully understand why is it so, given the fact there is so-called "tuple reuse" (see the video mentioned above).

在我看来,第二种解决方案必须将值与键一起保存,但它避免了我们必须进行 hash-bin-key 计算和访问以获取该键的值.

To my mind, the second solution has to save value along with the key, but it avoid us of having to make hash-bin-key calculation and access to get value for that key.

推荐答案

性能差异主要是由OrderedDict造成的.
OrderedDict 使用了 dictget__getitem__,但重新定义了自己的 __iter__iteritems.

The performance differences is mainly caused by OrderedDict.
OrderedDict uses dict's get and __getitem__, but redefined its own __iter__ and iteritems.


    def __iter__(self):
        'od.__iter__()  iter(od)'
        # Traverse the linked list in order.
        root = self.__root
        curr = root[1]                                  # start at the first node
        while curr is not root:
            yield curr[2]                               # yield the curr[KEY]
            curr = curr[1]                              # move to next node

    def iteritems(self):
        'od.iteritems -> an iterator over the (key, value) pairs in od'
        for k in self:
            yield (k, self[k])

看看我们发现了什么:self[k].
您的第二个解决方案并没有帮助我们避免 hash-bin-key 计算.而迭代器由 dict 生成,更准确地说,items.iteritems().next() 如果 itemsdict,则不会计算.

Look at what we found: self[k].
Your second solution did not help us avoid the hash-bin-key calculation. While the iterator generated by dict, more precisely, items.iteritems().next() if items is a dict, does not make that calculation.

此外,iteritems 也更昂贵.

from timeit import Timer
N = 1000

d = {i:i for i in range(10000)}

def f1():
    for k in d: pass

def f2():
    for k in d.iterkeys(): pass

def f3():
    for v in d.itervalues(): pass

def f4():
    for t in d.iteritems(): pass

print(Timer(stmt='f1()', setup='from __main__ import f1').timeit(number=N))
print(Timer(stmt='f2()', setup='from __main__ import f2').timeit(number=N))
print(Timer(stmt='f3()', setup='from __main__ import f3').timeit(number=N))
print(Timer(stmt='f4()', setup='from __main__ import f4').timeit(number=N))

输出

0.256800375467
0.265079360645
0.260599391822
0.492333103788

对比 iterkeys' dictiter_iternextkeyitervalues' dictiter_iternextvalue, iteritems'dictiter_iternextitem 有其他部分.

Comparing to iterkeys' dictiter_iternextkey and itervalues' dictiter_iternextvalue, iteritems'dictiter_iternextitem has additional parts.


    if (result->ob_refcnt == 1) {
        Py_INCREF(result);
        Py_DECREF(PyTuple_GET_ITEM(result, 0));
        Py_DECREF(PyTuple_GET_ITEM(result, 1));
    } else {
        result = PyTuple_New(2);
        if (result == NULL)
            return NULL;
    }
    di->len--;
    key = ep[i].me_key;
    value = ep[i].me_value;
    Py_INCREF(key);
    Py_INCREF(value);
    PyTuple_SET_ITEM(result, 0, key);
    PyTuple_SET_ITEM(result, 1, value);

我认为创建元组会降低性能.

I think that tuple creation could decrease the performance.

Python 确实倾向于重用元组.
tupleobject.c 显示

Python indeed tends to reuse tuples.
tupleobject.c shows

/* Speed optimization to avoid frequent malloc/free of small tuples */
#ifndef PyTuple_MAXSAVESIZE
#define PyTuple_MAXSAVESIZE     20  /* Largest tuple to save on free list */
#endif
#ifndef PyTuple_MAXFREELIST
#define PyTuple_MAXFREELIST  2000  /* Maximum number of tuples of each size to save */
#endif

这种优化只是意味着 Python 不会从头开始构建一些元组.但是还有很多工作要做.

This optimization just means Python does not build some tuples from scratch. But there is still a lot of work to be done.

如果将OrderedDict 替换为dict,我认为第二种解决方案总体上稍微好一些.
Python 字典是使用哈希表实现的.所以查找速度很快.查找的平均时间复杂度为 O(1),而最差的是 O(n)1.第一个解决方案的平均时间复杂度与第二个解决方案的时间复杂度相同.它们都是 O(n).因此,第二种方案没有任何优势,有时甚至更慢,尤其是在输入数据较小的情况下.在这种情况下,iteritems 造成的额外成本无法得到补偿.

If OrderedDict is replaced by dict, I think the second solution is slightly better in general.
Python dictionaries are implemented using hash tables. So the lookup is fast. The average time complexity of lookup is O(1), while the worst is O(n)1. The average time complexity of your first solution is the same as the time complexity of your second solution. They are both O(n). Therefore, the second solution has no advantages or is even slower sometimes, especially when the input data is small. In that case, the extra cost caused by iteritems could not be compensated.

from collections import OrderedDict
from operator import itemgetter
from timeit import Timer
from random import randint, random

N = 100000
xs = [('a', 10), ('b', 9), ('c', 4), ('d', 7), ('e', 3), ('f', 0), ('g', -5), ('h', 9)]

od = OrderedDict(xs)
d = dict(xs)

def f1od_min():
    return min(od, key=od.get)

def f2od_min():
    return min(od.iteritems(), key=itemgetter(1))[0]

def f1d_min():
    return min(d, key=d.get)

def f2d_min():
    return min(d.iteritems(), key=itemgetter(1))[0]

def f1od():
    for k in od: pass

def f2od():
    for t in od.iteritems(): pass

def f1d():
    for k in d: pass

def f2d():
    for t in d.iteritems(): pass

print 'min'
print(Timer(stmt='f1od_min()', setup='from __main__ import f1od_min').timeit(number=N))
print(Timer(stmt='f2od_min()', setup='from __main__ import f2od_min').timeit(number=N))
print(Timer(stmt='f1d_min()', setup='from __main__ import f1d_min').timeit(number=N))
print(Timer(stmt='f2d_min()', setup='from __main__ import f2d_min').timeit(number=N))
print
print 'traverse'
print(Timer(stmt='f1od()', setup='from __main__ import f1od').timeit(number=N))
print(Timer(stmt='f2od()', setup='from __main__ import f2od').timeit(number=N))
print(Timer(stmt='f1d()', setup='from __main__ import f1d').timeit(number=N))
print(Timer(stmt='f2d()', setup='from __main__ import f2d').timeit(number=N))

输出

min
0.398274431527
0.813040903243
0.185168156847
0.249574387248    <-- dict/the second solution

traverse
0.251634216081
0.642283865687
0.0565099754298
0.0958057518483

然后用

N = 50
xs = [(x, randint(1, 100)) for x in range(100000)]

输出

min
1.5148923257
3.47020082161
0.712828585756
0.70823812803    <-- dict/the second solution

traverse
0.975989336634
2.92283956481
0.127676073356
0.253622387762

现在用

N = 10
xs = [(random(), random()) for x in range(1000000)]

输出

min
6.23311265817
10.702984667
4.32852708934
2.87853889251    <-- dict/the second solution

traverse
2.06231783648
9.49360449443
1.33297618831
1.73723008092

最后,第二个解决方案开始大放异彩.

Finally, the second solution starts to shine.

第一种解决方案的最坏情况:哈希冲突

The worse case for the first solution: hash collision

N = 10000
xs = [(2 ** (32 + x) - 2 ** x + 1, 1) for x in range(100)]
# hash(2 ** (32 + x) - 2 ** x + 1) is always 1

输出

min
2.44175265292    <-- lookup is slow
2.76424538594    <-- lookup is slow
2.26508627493    <-- lookup is slow
0.199363955475

traverse
0.200654482623
2.59635966303    <-- lookup is slow
0.0454684184722
0.0733798569371

<小时>

1 为 dict 对象列出的平均案例时间假设对象的哈希函数足够健壮,可以使冲突不常见.平均情况假设参数中使用的键是从所有键集中随机均匀选择的.请参阅时间复杂性.


1 The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys. See TimeComplexity.

这篇关于了解性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆