Python 字符串“join"比“+"快(?),但这里有什么问题? [英] Python string 'join' is faster (?) than '+', but what's wrong here?

查看:57
本文介绍了Python 字符串“join"比“+"快(?),但这里有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在之前的帖子中询问了最有效的大规模动态字符串连接方法,有人建议我使用 join 方法,这是最好、最简单和最快的方法(正如大家所说的那样)).但是当我在玩字符串连接时,我发现了一些奇怪的(?)结果.我确定一定有什么事情发生,但我不能完全理解.这是我所做的:

I asked the most efficient method for mass dynamic string concatenation in an earlier post and I was suggested to use the join method, the best, simplest and fastest method to do so (as everyone said that). But while I was playing with string concatenations, I found some weird(?) results. I'm sure something is going on but I can't not get it quite. Here is what I did:

我定义了这些函数:

import timeit
def x():
    s=[]
    for i in range(100):
        # Other codes here...
        s.append("abcdefg"[i%7])
    return ''.join(s)

def y():
    s=''
    for i in range(100):
        # Other codes here...
        s+="abcdefg"[i%7]
    return s

def z():
    s=''
    for i in range(100):
        # Other codes here...
        s=s+"abcdefg"[i%7]
    return s

def p():
    s=[]
    for i in range(100):
        # Other codes here...
        s+="abcdefg"[i%7]
    return ''.join(s)

def q():
    s=[]
    for i in range(100):
        # Other codes here...
        s = s + ["abcdefg"[i%7]]
    return ''.join(s)

我试图在整个函数中保持其他东西(连接除外)几乎相同.然后我用以下结果进行了测试(在 Windows 32 位机器上使用 Python 3.1.1 IDLE):

I have tried to keep other things (except the concatenation) almost same throughout the functions. Then I tested with the following with results in comment (using Python 3.1.1 IDLE on Windows 32 bit machine):

timeit.timeit(x) # 31.54912480500002
timeit.timeit(y) # 23.533029429999942 
timeit.timeit(z) # 22.116181330000018
timeit.timeit(p) # 37.718607439999914
timeit.timeit(q) # 108.60377576499991

这意味着它表明 strng = strng + dyn_strng 是最快的.虽然时间上的差异并不那么显着(除了最后一个),但我想知道为什么会发生这种情况.那是因为我使用的是 Python 3.1.1 并且提供 '+' 是最有效的吗?我应该使用+"代替 join 吗?或者,我做了一些非常愚蠢的事情吗?还是什么?请解释清楚.

That means it shows that strng = strng + dyn_strng is the fastest. Though the difference in times are not that significant (except the last one), but I wanna know why this is happening. Is that because I am using Python 3.1.1 and that provides '+' as most efficient? Should I use '+' as an alternative to join? Or, have I done something extremely silly? Or what? Please explain clearly.

推荐答案

我从专家在这里发布的答案中找到了答案.Python 字符串连接(和计时测量)取决于这些(就我所见):

I have figured out the answer from the answers posted here by experts. Python string concatenation (and timing measurements) depends on these (as far as I've seen):

  • 串联次数
  • 字符串的平均长度
  • 函数调用次数

我已经构建了一个与这些相关的新代码.感谢 Peter S Magnusson、sepp2k、hughdbrown、David Wolever 和其他人指出我之前遗漏的重要观点.另外,在这段代码中,我可能错过了一些东西.所以,我非常感谢任何指出我们错误、建议、批评等的回复.毕竟,我是来学习的.这是我的新代码:

I have built a new code that relates these. Thanks to Peter S Magnusson, sepp2k, hughdbrown, David Wolever and others for indicating important points I had missed earlier. Also, in this code I might have missed something. So, I highly appreciate any replies pointing our errors, suggestions, criticisms etc. After all, I am here for learning. Here is my new code:

from timeit import timeit

noc = 100
tocat = "a"
def f_call():
    pass

def loop_only():
    for i in range(noc):
        pass

def concat_method():
    s = ''
    for i in range(noc):
        s = s + tocat

def list_append():
    s=[]
    for i in range(noc):
        s.append(tocat)
    ''.join(s)

def list_append_opt():
    s = []
    zap = s.append
    for i in range(noc):
        zap(tocat)
    ''.join(s)

def list_comp():
    ''.join(tocat for i in range(noc))

def concat_method_buildup():
    s=''

def list_append_buildup():
    s=[]

def list_append_opt_buildup():
    s=[]
    zap = s.append

def function_time(f):
    return timeit(f,number=1000)*1000

f_callt = function_time(f_call)

def measure(ftuple,n,tc):
    global noc,tocat
    noc = n
    tocat = tc
    loopt = function_time(loop_only) - f_callt
    buildup_time = function_time(ftuple[1]) -f_callt if ftuple[1] else 0
    total_time = function_time(ftuple[0])
    return total_time, total_time - f_callt - buildup_time - loopt*ftuple[2]

functions ={'Concat Method\t\t':(concat_method,concat_method_buildup,True),
            'List append\t\t\t':(list_append,list_append_buildup,True),
            'Optimized list append':(list_append_opt,list_append_opt_buildup,True),
            'List comp\t\t\t':(list_comp,0,False)}

for i in range(5):
    print("\n\n%d concatenation\t\t\t\t10'a'\t\t\t\t 100'a'\t\t\t1000'a'"%10**i)
    print('-'*80)
    for (f,ft) in functions.items():
        print(f,"\t|",end="\t")
        for j in range(3):
            t = measure(ft,10**i,'a'*10**j)
            print("%.3f %.3f |" % t,end="\t")
        print()

这是我得到的.[在时间列中显示了两次(缩放):第一次是总函数执行时间,第二次是实际(?)连接时间.我已经扣除了函数调用时间、函数建立时间(初始化时间)和迭代时间.这里我在考虑一个没有循环就不能完成的情况(多说里面的语句).]

And here is what I have got. [In the time column two times (scaled) are shown: first one is the total function execution time, and the second time is the actual(?) concatenation time. I have deducted the function calling time, function buildup time(initialization time), and iteration time. Here I am considering a case where it can't be done without loop (say more statement inside).]

1 concatenation                 1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   2.310 2.168       |  2.298 2.156       |  2.304 2.162
Optimized list append   |   1.069 0.439       |  1.098 0.456       |  1.071 0.413
Concat Method           |   0.552 0.034       |  0.541 0.025       |  0.565 0.048
List append             |   1.099 0.557       |  1.099 0.552       |  1.094 0.552


10 concatenations                1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   3.366 3.224       |  3.473 3.331       |  4.058 3.916
Optimized list append   |   2.778 2.003       |  2.956 2.186       |  3.417 2.639
Concat Method           |   1.602 0.943       |  1.910 1.259       |  3.381 2.724
List append             |   3.290 2.612       |  3.378 2.699       |  3.959 3.282


100 concatenations               1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   15.900 15.758     |  17.086 16.944     |  20.260 20.118
Optimized list append   |   15.178 12.585     |  16.203 13.527     |  19.336 16.703
Concat Method           |   10.937 8.482      |  25.731 23.263     |  29.390 26.934
List append             |   20.515 18.031     |  21.599 19.115     |  24.487 22.003


1000 concatenations               1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   134.507 134.365   |  143.913 143.771   |  201.062 200.920
Optimized list append   |   112.018 77.525    |  121.487 87.419    |  151.063 117.059
Concat Method           |   214.329 180.093   |  290.380 256.515   |  324.572 290.720
List append             |   167.625 133.619   |  176.241 142.267   |  205.259 171.313


10000 concatenations              1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   1309.702 1309.560 |  1404.191 1404.049 |  2912.483 2912.341
Optimized list append   |   1042.271 668.696  |  1134.404 761.036  |  2628.882 2255.804
Concat Method           |   2310.204 1941.096 |  2923.805 2550.803 |  STUCK    STUCK
List append             |   1624.795 1251.589 |  1717.501 1345.137 |  3182.347 2809.233

总结一下我为我做出的决定:

To sum up all these I have made this decisions for me:

  1. 如果您有可用的字符串列表,字符串 'join' 方法是最好的,并且最快.
  2. 如果你可以使用列表理解,这也是最简单和快速的.
  3. 如果您需要 1 到 10 个连接(平均)长度为 1 到 100,列表append, '+' 都需要相同的(几乎,注意时间是按比例缩放的)时间.
  4. 优化后的列表追加看起来很大多数情况下都很好.
  5. 当#concatenation 或字符串长度增加时,'+' 开始占用更多和更多的时间.请注意,对于 100'a' 的 10000 次连接,我的电脑卡住了!
  6. 如果您使用列表追加和加入"永远,你一直都是安全的(亚历克斯指出马泰利).
  7. 但在某些情况下说,你在哪里需要接受用户输入并打印'Hello user's world!',使用'+'是最简单的.我想建立一个清单并加入这种情况,如 x = input("Enter user name:") 然后 x.join(["Hello ","'s world!"]) 比 "Hello %s's world!"%x or "你好+x+"的世界
  8. Python 3.1 有所改进串联性能.但在一些实现与 Jython 一样,+"的效率较低.
  9. 过早优化是根本万恶之源(专家的说法).最多的时间你不需要优化.所以,不要在愿望上浪费时间优化(除非您正在编写一个大型或计算项目,其中每个微/毫秒很重要.
  10. 使用这些信息并填写无论你喜欢采取什么方式情况下考虑.
  11. 如果你真的需要优化,使用分析器,找到瓶颈并尝试优化这些.

最后,我正在尝试更深入地学习python.所以,我的观察中出现错误(error)的情况并不少见.因此,请对此发表评论并建议我是否走错了路线.感谢大家的参与.

Finally, I am trying to learn python more deeply. So, it is not unusual that there will be mistakes (error) in my observations. So, comment on this and suggest me if I am taking a wrong route. Thanks to all for participating.

这篇关于Python 字符串“join"比“+"快(?),但这里有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆