Python 字符串“join"比“+"快(?),但这里有什么问题? [英] Python string 'join' is faster (?) than '+', but what's wrong here?
问题描述
我在之前的帖子中询问了最有效的大规模动态字符串连接方法,有人建议我使用 join 方法,这是最好、最简单和最快的方法(正如大家所说的那样)).但是当我在玩字符串连接时,我发现了一些奇怪的(?)结果.我确定一定有什么事情发生,但我不能完全理解.这是我所做的:
I asked the most efficient method for mass dynamic string concatenation in an earlier post and I was suggested to use the join method, the best, simplest and fastest method to do so (as everyone said that). But while I was playing with string concatenations, I found some weird(?) results. I'm sure something is going on but I can't not get it quite. Here is what I did:
我定义了这些函数:
import timeit
def x():
s=[]
for i in range(100):
# Other codes here...
s.append("abcdefg"[i%7])
return ''.join(s)
def y():
s=''
for i in range(100):
# Other codes here...
s+="abcdefg"[i%7]
return s
def z():
s=''
for i in range(100):
# Other codes here...
s=s+"abcdefg"[i%7]
return s
def p():
s=[]
for i in range(100):
# Other codes here...
s+="abcdefg"[i%7]
return ''.join(s)
def q():
s=[]
for i in range(100):
# Other codes here...
s = s + ["abcdefg"[i%7]]
return ''.join(s)
我试图在整个函数中保持其他东西(连接除外)几乎相同.然后我用以下结果进行了测试(在 Windows 32 位机器上使用 Python 3.1.1 IDLE):
I have tried to keep other things (except the concatenation) almost same throughout the functions. Then I tested with the following with results in comment (using Python 3.1.1 IDLE on Windows 32 bit machine):
timeit.timeit(x) # 31.54912480500002
timeit.timeit(y) # 23.533029429999942
timeit.timeit(z) # 22.116181330000018
timeit.timeit(p) # 37.718607439999914
timeit.timeit(q) # 108.60377576499991
这意味着它表明 strng = strng + dyn_strng 是最快的.虽然时间上的差异并不那么显着(除了最后一个),但我想知道为什么会发生这种情况.那是因为我使用的是 Python 3.1.1 并且提供 '+' 是最有效的吗?我应该使用+"代替 join 吗?或者,我做了一些非常愚蠢的事情吗?还是什么?请解释清楚.
That means it shows that strng = strng + dyn_strng is the fastest. Though the difference in times are not that significant (except the last one), but I wanna know why this is happening. Is that because I am using Python 3.1.1 and that provides '+' as most efficient? Should I use '+' as an alternative to join? Or, have I done something extremely silly? Or what? Please explain clearly.
推荐答案
我从专家在这里发布的答案中找到了答案.Python 字符串连接(和计时测量)取决于这些(就我所见):
I have figured out the answer from the answers posted here by experts. Python string concatenation (and timing measurements) depends on these (as far as I've seen):
- 串联次数
- 字符串的平均长度
- 函数调用次数
我已经构建了一个与这些相关的新代码.感谢 Peter S Magnusson、sepp2k、hughdbrown、David Wolever 和其他人指出我之前遗漏的重要观点.另外,在这段代码中,我可能错过了一些东西.所以,我非常感谢任何指出我们错误、建议、批评等的回复.毕竟,我是来学习的.这是我的新代码:
I have built a new code that relates these. Thanks to Peter S Magnusson, sepp2k, hughdbrown, David Wolever and others for indicating important points I had missed earlier. Also, in this code I might have missed something. So, I highly appreciate any replies pointing our errors, suggestions, criticisms etc. After all, I am here for learning. Here is my new code:
from timeit import timeit
noc = 100
tocat = "a"
def f_call():
pass
def loop_only():
for i in range(noc):
pass
def concat_method():
s = ''
for i in range(noc):
s = s + tocat
def list_append():
s=[]
for i in range(noc):
s.append(tocat)
''.join(s)
def list_append_opt():
s = []
zap = s.append
for i in range(noc):
zap(tocat)
''.join(s)
def list_comp():
''.join(tocat for i in range(noc))
def concat_method_buildup():
s=''
def list_append_buildup():
s=[]
def list_append_opt_buildup():
s=[]
zap = s.append
def function_time(f):
return timeit(f,number=1000)*1000
f_callt = function_time(f_call)
def measure(ftuple,n,tc):
global noc,tocat
noc = n
tocat = tc
loopt = function_time(loop_only) - f_callt
buildup_time = function_time(ftuple[1]) -f_callt if ftuple[1] else 0
total_time = function_time(ftuple[0])
return total_time, total_time - f_callt - buildup_time - loopt*ftuple[2]
functions ={'Concat Method\t\t':(concat_method,concat_method_buildup,True),
'List append\t\t\t':(list_append,list_append_buildup,True),
'Optimized list append':(list_append_opt,list_append_opt_buildup,True),
'List comp\t\t\t':(list_comp,0,False)}
for i in range(5):
print("\n\n%d concatenation\t\t\t\t10'a'\t\t\t\t 100'a'\t\t\t1000'a'"%10**i)
print('-'*80)
for (f,ft) in functions.items():
print(f,"\t|",end="\t")
for j in range(3):
t = measure(ft,10**i,'a'*10**j)
print("%.3f %.3f |" % t,end="\t")
print()
这是我得到的.[在时间列中显示了两次(缩放):第一次是总函数执行时间,第二次是实际(?)连接时间.我已经扣除了函数调用时间、函数建立时间(初始化时间)和迭代时间.这里我在考虑一个没有循环就不能完成的情况(多说里面的语句).]
And here is what I have got. [In the time column two times (scaled) are shown: first one is the total function execution time, and the second time is the actual(?) concatenation time. I have deducted the function calling time, function buildup time(initialization time), and iteration time. Here I am considering a case where it can't be done without loop (say more statement inside).]
1 concatenation 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 2.310 2.168 | 2.298 2.156 | 2.304 2.162
Optimized list append | 1.069 0.439 | 1.098 0.456 | 1.071 0.413
Concat Method | 0.552 0.034 | 0.541 0.025 | 0.565 0.048
List append | 1.099 0.557 | 1.099 0.552 | 1.094 0.552
10 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 3.366 3.224 | 3.473 3.331 | 4.058 3.916
Optimized list append | 2.778 2.003 | 2.956 2.186 | 3.417 2.639
Concat Method | 1.602 0.943 | 1.910 1.259 | 3.381 2.724
List append | 3.290 2.612 | 3.378 2.699 | 3.959 3.282
100 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 15.900 15.758 | 17.086 16.944 | 20.260 20.118
Optimized list append | 15.178 12.585 | 16.203 13.527 | 19.336 16.703
Concat Method | 10.937 8.482 | 25.731 23.263 | 29.390 26.934
List append | 20.515 18.031 | 21.599 19.115 | 24.487 22.003
1000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 134.507 134.365 | 143.913 143.771 | 201.062 200.920
Optimized list append | 112.018 77.525 | 121.487 87.419 | 151.063 117.059
Concat Method | 214.329 180.093 | 290.380 256.515 | 324.572 290.720
List append | 167.625 133.619 | 176.241 142.267 | 205.259 171.313
10000 concatenations 1'a' 10'a' 100'a'
------------------- ---------------------- ------------------- ----------------
List comp | 1309.702 1309.560 | 1404.191 1404.049 | 2912.483 2912.341
Optimized list append | 1042.271 668.696 | 1134.404 761.036 | 2628.882 2255.804
Concat Method | 2310.204 1941.096 | 2923.805 2550.803 | STUCK STUCK
List append | 1624.795 1251.589 | 1717.501 1345.137 | 3182.347 2809.233
总结一下我为我做出的决定:
To sum up all these I have made this decisions for me:
- 如果您有可用的字符串列表,字符串 'join' 方法是最好的,并且最快.
- 如果你可以使用列表理解,这也是最简单和快速的.
- 如果您需要 1 到 10 个连接(平均)长度为 1 到 100,列表append, '+' 都需要相同的(几乎,注意时间是按比例缩放的)时间.
- 优化后的列表追加看起来很大多数情况下都很好.
- 当#concatenation 或字符串长度增加时,'+' 开始占用更多和更多的时间.请注意,对于 100'a' 的 10000 次连接,我的电脑卡住了!
- 如果您使用列表追加和加入"永远,你一直都是安全的(亚历克斯指出马泰利).
- 但在某些情况下说,你在哪里需要接受用户输入并打印'Hello user's world!',使用'+'是最简单的.我想建立一个清单并加入这种情况,如 x = input("Enter user name:") 然后 x.join(["Hello ","'s world!"]) 比 "Hello %s's world!"%x or "你好+x+"的世界
- Python 3.1 有所改进串联性能.但在一些实现与 Jython 一样,+"的效率较低.
- 过早优化是根本万恶之源(专家的说法).最多的时间你不需要优化.所以,不要在愿望上浪费时间优化(除非您正在编写一个大型或计算项目,其中每个微/毫秒很重要.
- 使用这些信息并填写无论你喜欢采取什么方式情况下考虑.
- 如果你真的需要优化,使用分析器,找到瓶颈并尝试优化这些.
最后,我正在尝试更深入地学习python.所以,我的观察中出现错误(error)的情况并不少见.因此,请对此发表评论并建议我是否走错了路线.感谢大家的参与.
Finally, I am trying to learn python more deeply. So, it is not unusual that there will be mistakes (error) in my observations. So, comment on this and suggest me if I am taking a wrong route. Thanks to all for participating.
这篇关于Python 字符串“join"比“+"快(?),但这里有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!