为什么 ''.join() 在 Python 中比 += 快? [英] Why is ''.join() faster than += in Python?

查看:14
本文介绍了为什么 ''.join() 在 Python 中比 += 快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在网上(在 StackOverflow 和其他方面)找到大量关于使用 ++= 用于 Python 中的连接.

I'm able to find a bevy of information online (on Stack Overflow and otherwise) about how it's a very inefficient and bad practice to use + or += for concatenation in Python.

我似乎无法找到为什么 += 如此低效.除了提到 此处,它已针对在某些情况下提高了 20%"(仍然不清楚这些情况是什么),我找不到任何其他信息.

I can't seem to find WHY += is so inefficient. Outside of a mention here that "it's been optimized for 20% improvement in certain cases" (still not clear what those cases are), I can't find any additional information.

在技术层面上发生了什么使 ''.join() 优于其他 Python 连接方法?

What is happening on a more technical level that makes ''.join() superior to other Python concatenation methods?

推荐答案

假设您有以下代码从三个字符串构建一个字符串:

Let's say you have this code to build up a string from three strings:

x = 'foo'
x += 'bar'  # 'foobar'
x += 'baz'  # 'foobarbaz'

在这种情况下,Python首先需要分配和创建'foobar',然后才能分配和创建'foobarbaz'.

In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.

因此,对于每个被调用的 +=,字符串的全部内容以及添加到其中的任何内容都需要复制到一个全新的内存缓冲区中.换句话说,如果你有 N 个字符串要加入,你需要分配大约 N 个临时字符串,第一个子字符串被复制 ~N 次.最后一个子串只被复制一次,但平均每个子串被复制 ~N/2 次.

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

使用 .join,Python 可以玩很多技巧,因为不需要创建中间字符串.CPython 计算它需要多少内存,然后分配一个正确大小的缓冲区.最后,它将每个片段复制到新的缓冲区中,这意味着每个片段只复制一次.

With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.

在某些情况下,还有其他可行的方法可以为 += 带来更好的性能.例如.如果内部字符串表示实际上是 rope 或如果运行时实际上足够聪明,以某种方式弄清楚临时字符串对程序没有用处并优化它们.

There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.

然而,CPython 当然不能可靠地进行这些优化(尽管它可能用于 少数极端情况)并且由于它是最常用的实现,因此许多最佳实践都基于对 CPython 有效的方法.拥有一套标准化的规范还可以让其他实现也更容易专注于优化工作.

However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.

这篇关于为什么 ''.join() 在 Python 中比 += 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆