为什么在 Python 中处理排序数组并不比未排序数组快? [英] Why is processing a sorted array not faster than an unsorted array in Python?

查看:43
本文介绍了为什么在 Python 中处理排序数组并不比未排序数组快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这篇文章中 为什么处理一个排序数组比随机数组快,说明分支预测是排序数组性能提升的原因.

In this post Why is processing a sorted array faster than random array, it says that branch predicton is the reason of the performance boost in sorted arrays.

但我只是尝试了使用 Python 的示例;我认为排序数组和随机数组之间没有区别(我尝试了 bytearray 和 array;并使用 line_profile 来分析计算).

But I just tried the example using Python; and I think there is no difference between sorted and random arrays (I tried both bytearray and array; and use line_profile to profile the computation).

我错过了什么吗?

这是我的代码:

from array import array
import random
array_size = 1024
loop_cnt = 1000
# I also tried 'array', and it's almost the same
a = bytearray(array_size)
for i in xrange(array_size):
    a.append(random.randint(0, 255))
#sorted                                                                         
a = sorted(a)
@profile
def computation():
    sum = 0
    for i in xrange(loop_cnt):
        for j in xrange(size):
            if a[j] >= 128:
                sum += a[j]

computation()
print 'done'

推荐答案

我可能错了,但我看到链接问题和您的示例之间存在根本区别:Python 解释字节码,C++ 编译为本机代码.

I may be wrong, but I see a fundamental difference between the linked question and your example: Python interprets bytecode, C++ compiles to native code.

if 直接转换为 cmp/jl 序列的 C++ 代码中,CPU 分支预测器可以将其视为单个预测点",特定于该周期.

In the C++ code that if translates directly to a cmp/jl sequence, that can be considered by the CPU branch predictor as a single "prediction spot", specific to that cycle.

在 Python 中,比较实际上是几个函数调用,所以有 (1) 更多的开销和 (2) 我认为执行该比较的代码是一个函数,用于解释器中的每个其他整数比较 - 所以它是一个预测点"并非特定于当前块,这使分支预测器更难正确猜测.

In Python that comparison is actually several function calls, so there's (1) more overhead and (2) I suppose the code that performs that comparison is a function into the interpreter used for every other integer comparison - so it's a "prediction spot" not specific to the current block, which gives the branch predictor a much harder time to guess correctly.

编辑:此外,如这篇论文中所述,还有更多解释器内部的间接分支,因此 Python 代码中的这种优化无论如何都可能被解释器本身的分支错误预测所掩盖.

Edit: also, as outlined in this paper, there are way more indirect branches inside an interpreter, so such an optimization in your Python code would probably be buried anyway by the branch mispredictions in the interpreter itself.

这篇关于为什么在 Python 中处理排序数组并不比未排序数组快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆