为什么''.join()在Python中比+ =更快? [英] Why is ''.join() faster than += in Python?

查看:164
本文介绍了为什么''.join()在Python中比+ =更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够在线上(在Stack Overflow等网站上)找到大量有关在Python中使用++=进行级联的效率非常低下的不良信息.

I'm able to find a bevy of information online (on Stack Overflow and otherwise) about how it's a very inefficient and bad practice to use + or += for concatenation in Python.

我似乎找不到为什么+=如此低效.

I can't seem to find WHY += is so inefficient. Outside of a mention here that "it's been optimized for 20% improvement in certain cases" (still not clear what those cases are), I can't find any additional information.

从更高技术层面上讲,正在使''.join()优于其他Python串联方法的事情是什么?

What is happening on a more technical level that makes ''.join() superior to other Python concatenation methods?

推荐答案

假设您有这段代码可以从三个字符串中构建一个字符串:

Let's say you have this code to build up a string from three strings:

x = 'foo'
x += 'bar'  # 'foobar'
x += 'baz'  # 'foobarbaz'

在这种情况下,Python首先需要分配和创建'foobar',然后才能分配和创建'foobarbaz'.

In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.

因此,对于每个被调用的+=,都需要将字符串的全部内容以及所添加的内容复制到一个全新的内存缓冲区中.换句话说,如果要连接N个字符串,则需要分配大约N个临时字符串,并且第一个子字符串将被复制大约N次.最后一个子字符串仅被复制一次,但平均每个子字符串被复制~N/2次.

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

使用.join,Python可以发挥许多技巧,因为不需要创建中间字符串. CPython 弄清楚它需要多少内存,然后分配正确大小的缓冲区.最后,它将每个片段复制到新缓冲区中,这意味着每个片段仅被复制一次.

With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.

在某些情况下,还有其他可行的方法可能会导致+=的更好性能.例如.如果内部字符串表示形式实际上是 rope ,或者运行时实际上是足够聪明,以某种方式找出临时字符串对程序毫无用处,并对其进行优化.

There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.

但是,CPython肯定不会可靠地进行这些优化(尽管对于

However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.

这篇关于为什么''.join()在Python中比+ =更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆