Python 内置 sum 函数与 for 循环性能 [英] Python built-in sum function vs. for loop performance

查看:104
本文介绍了Python 内置 sum 函数与 for 循环性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到 Python 的内置 sum 函数在对 1 000 000 个整数列表求和时大约比 for 循环快 3 倍:

导入时间定义 sum1():s = 0对于我在范围内(1000000):s += i返回def sum2():返回总和(范围(1000000))打印 'For Loop Sum:', timeit.timeit(sum1, number=10)打印'内置总和:', timeit.timeit(sum2, number=10)# 印刷:# For 循环总和:0.751425027847# 内置总和:0.266746997833

这是为什么?sum 是如何实现的?

解决方案

速度差异实际上大于 3 倍,但是您通过首先创建一个包含 100 万个整数的巨大内存列表来减慢任一版本的速度.将其与计时赛分开:

<预><代码>>>>导入时间>>>def sum1(lst):... s = 0...对于我在 lst:... s += i... 返回 s...>>>def sum2(lst):...返回总和(lst)...>>>值 = 范围(1000000)>>>timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)3.457869052886963>>>timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)0.6696369647979736

现在速度差已经上升到5倍以上.

for 循环作为解释的 Python 字节码执行.sum() 完全在 C 代码中循环.解释的字节码和 C 代码之间的速度差异很大.

此外,如果 C 代码可以将总和保留在 C 类型中,则确保不会创建新的 Python 对象;这适用于 intfloat 结果.

反汇编后的 Python 版本是这样做的:

<预><代码>>>>导入文件>>>定义 sum1():... s = 0...对于我在范围内(1000000):... s += i... 返回 s...>>>dis.dis(sum1)2 0 LOAD_CONST 1 (0)3 STORE_FAST 0 (s)3 6 SETUP_LOOP 30(到 39)9 LOAD_GLOBAL 0(范围)12 LOAD_CONST 2 (1000000)15 CALL_FUNCTION 118 GET_ITER>>19 FOR_ITER 16(到 38)22 STORE_FAST 1 (i)4 25 LOAD_FAST 0 (s)28 LOAD_FAST 1 (i)31 INPLACE_ADD32 STORE_FAST 0 (s)35 绝对跳跃 19>>38 POP_BLOCK5 >>39 LOAD_FAST 0 (s)42 RETURN_VALUE

除了解释器循环比 C 慢之外,INPLACE_ADD 将创建一个新的整数对象(过去 255,CPython 将小的 int 对象缓存为单例).

您可以在Python mercurial 代码存储库,但它在注释中明确指出:

/* 通过在 C 中保存临时总和而不是新的 Python 对象来快速添加.假设所有输入都是相同的类型.如果假设失败,则默认到更一般的例程.*/

I noticed that Python's built-in sum function is roughly 3x faster than a for loop when summing a list of 1 000 000 integers:

import timeit

def sum1():
    s = 0
    for i in range(1000000):
        s += i
    return s

def sum2():
    return sum(range(1000000))

print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)

# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833

Why is that? How is sum implemented?

解决方案

The speed difference is actually greater than 3 times, but you slow down either version by first creating a huge in-memory list of 1 million integers. Separate that out of the time trials:

>>> import timeit
>>> def sum1(lst):
...     s = 0
...     for i in lst:
...         s += i
...     return s
... 
>>> def sum2(lst):
...     return sum(lst)
... 
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736

The speed difference has risen to over 5 times now.

A for loop is executed as interpreted Python bytecode. sum() loops entirely in C code. The speed difference between interpreted bytecode and C code is large.

In addition, the C code makes sure not to create new Python objects if it can keep the sum in C types instead; this works for int and float results.

The Python version, disassembled, does this:

>>> import dis
>>> def sum1():
...     s = 0
...     for i in range(1000000):
...         s += i
...     return s
... 
>>> dis.dis(sum1)
  2           0 LOAD_CONST               1 (0)
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              30 (to 39)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               2 (1000000)
             15 CALL_FUNCTION            1
             18 GET_ITER            
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_FAST                1 (i)
             31 INPLACE_ADD         
             32 STORE_FAST               0 (s)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK           

  5     >>   39 LOAD_FAST                0 (s)
             42 RETURN_VALUE        

Apart from the interpreter loop being slower than C, the INPLACE_ADD will create a new integer object (past 255, CPython caches small int objects as singletons).

You can see the C implementation in the Python mercurial code repository, but it explicitly states in the comments:

/* Fast addition by keeping temporary sums in C instead of new Python objects.
   Assumes all inputs are the same type.  If the assumption fails, default
   to the more general routine.
*/

这篇关于Python 内置 sum 函数与 for 循环性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆