循环/列表理解与其他方法的速度/效率比较 [英] Speed/efficiency comparison for loop vs list comprehension vs other methods

查看:70
本文介绍了循环/列表理解与其他方法的速度/效率比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用下面提供的示例代码,我想更好地了解python速度如何根据我构造给定函数的方式而变化.

Using the example codes I give below, I would like to have a better understanding of how python speed varies depending on how I structure a given function.

我定义的示例函数的工作方式如下:给定两个字符串,它们返回它们不同的位数.我们假定assert len(s1) == len(s2)始终为真.

The example functions I define work as follow: given two strings they return the number of digits that they differ. We assume that assert len(s1) == len(s2) is always true.

第一个函数使用列表推导.

First function uses a list comprehension.

def h_dist1(s1,s2):
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

第二个函数使用经典的for循环.

Second function uses a classic for loop.

def h_dist2(s1,s2):
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

第二个代码的复杂性显然是O(N),其中len(s1)=len(s2)=N.

The complexity of the second code is clearly O(N) where len(s1)=len(s2)=N.

与示例有关的问题:有没有更好的方法来定义此特定功能? h_dist1的复杂性是什么?

Example related question: Is there a better way to define this particular function? What's the complexity of h_dist1?

一般问题:通常,定义函数的最佳方式(就可读性,速度,效率,更pythonic而言)类似于上面示例中给出的方式(即需要遍历字符串/数组/等)?而且,最重要的是,为什么一种最快/最有效的方法?

General question: What's, in general, the best (in terms of: readability, speed, efficiency, more pythonic) way of defining a function that is similar to the ones given in the example above (i.e. that need to loop over a string/array/etc)? And, most important, why is a particular way the most fast/efficient?

注意,我一直在寻找类似的问题,但没有发现任何具体问题,例如在此处中,HYRY表示,要加快代码的开发速度,应在for循环中使用1.局部变量,在2.中使用列表理解. 但是我仍然不明白为什么.当然,任何其他Q/A的引用都欢迎.

Note I have looked for similar questions but I haven't found anything specific, e.g. in here HYRY says that to speed up a code one should use 1. local variables in for loop and 2. use list comprehension. But I still don't understand why. Of course, any reference to other Q/A is welcome.

推荐答案

不要太快地写下卑微的for循环.如果您实际上不需要列表,例如在这种情况下,标准的for循环可能比使用列表理解要快.当然,它具有较少的内存开销.

Don't be too quick to write off the humble for loop. If you don't actually need a list, like in this case, a standard for loop can be faster than using a list comprehension. And of course it has less memory overheads.

这是一个执行时序测试的程序;可以轻松修改它以添加更多测试.

Here's a program to perform timing tests; it can be easily modified to add more tests.

#!/usr/bin/env python

''' Time various implementations of string diff function

    From http://stackoverflow.com/q/28581218/4014959

    Written by PM 2Ring 2015.02.18
'''

from itertools import imap, izip, starmap
from operator import ne

from timeit import Timer
from random import random, seed

def h_dist0(s1,s2):
    ''' For loop '''
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

def h_dist1(s1,s2):
    ''' List comprehension '''
    return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])

def h_dist2(s1,s2):
    ''' Generator expression '''
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

def h_dist3(s1,s2):
    ''' Generator expression with if '''
    return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)

def h_dist3a(s1,s2):
    ''' Generator expression with izip '''
    return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

def h_dist4(s1,s2):
    ''' imap '''
    return sum(imap(ne, s1, s2))

def h_dist5(s1,s2):
    ''' starmap '''
    return sum(starmap(ne, izip(s1, s2)))

funcs = [
    h_dist0,
    h_dist1,
    h_dist2,
    h_dist3,
    h_dist3a,
    h_dist4,
    h_dist5,
]

# ------------------------------------

def check_full():
    print 'Testing all functions with strings of length', len(s1)
    for func in funcs:
        print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))

def check():
    print 'Testing all functions with strings of length', len(s1)
    print [func(s1, s2) for func in funcs], '\n'

def time_test(loops=10000, reps=3):
    ''' Print timing stats for all the functions '''
    slen = len(s1)
    print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)

    for func in funcs:
        #Get function name and docstring
        fname = func.func_name
        fdoc = func.__doc__

        print '\n%s:%s' % (fname, fdoc)
        t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
        results = t.repeat(reps, loops)
        results.sort()
        print results
    print '\n' + '- '*30 + '\n'

def make_strings(n, r=0.5):
    print 'r:', r
    s1 = 'a' * n
    s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
    return s1, s2

# ------------------------------------

seed(37)

s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)

s1, s2 = make_strings(10)
check()
time_test(50000)

s1, s2 = make_strings(1000)
check()
time_test(1000)

下面的结果来自在Linux上运行Python 2.6.6的32位2GHz奔腾4.

The results below are from a 32 bit 2GHz Pentium 4 running Python 2.6.6 on Linux.

输出

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]

h_dist1: List comprehension 
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]

h_dist2: Generator expression 
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]

h_dist3: Generator expression with if 
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]

h_dist3a: Generator expression with izip 
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]

h_dist4: imap 
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]

h_dist5: starmap 
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]

h_dist1: List comprehension 
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]

h_dist2: Generator expression 
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]

h_dist3: Generator expression with if 
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]

h_dist3a: Generator expression with izip 
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]

h_dist4: imap 
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]

h_dist5: starmap 
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]

h_dist1: List comprehension 
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]

h_dist2: Generator expression 
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]

h_dist3: Generator expression with if 
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]

h_dist3a: Generator expression with izip 
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]

h_dist4: imap 
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]

h_dist5: starmap 
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5] 

Length = 10, Loops = 50000, Repetitions = 3

h_dist0: For loop 
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]

h_dist1: List comprehension 
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]

h_dist2: Generator expression 
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]

h_dist3: Generator expression with if 
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]

h_dist3a: Generator expression with izip 
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]

h_dist4: imap 
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]

h_dist5: starmap 
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506] 

Length = 1000, Loops = 1000, Repetitions = 3

h_dist0: For loop 
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]

h_dist1: List comprehension 
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]

h_dist2: Generator expression 
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]

h_dist3: Generator expression with if 
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]

h_dist3a: Generator expression with izip 
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]

h_dist4: imap 
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]

h_dist5: starmap 
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

这篇关于循环/列表理解与其他方法的速度/效率比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆