为什么天真的字符串连接在一定长度以上会变成平方? [英] Why does naive string concatenation become quadratic above a certain length?

查看:84
本文介绍了为什么天真的字符串连接在一定长度以上会变成平方?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过重复的字符串连接构建字符串是一种反模式,但是我仍然很好奇为什么在字符串长度超过大约10 ** 6时,其性能会从线性变为二次:

 #这将花费线性时间,其中n为优化
#,二次时间为非优化
导入时间
start =时间。 perf_counter()
s =''
for i in range(n):
s + ='a'
total_time = time.perf_counter()-开始
time_per_iteration = total_time / n

例如,在我的计算机上(Windows 10,python 3.6.1):




  • 10 ** 4< n < 10 ** 6 time_per_iteration 几乎完全恒定在170±10 µs

  • c $ c> 10 ** 6< n , time_per_iteration 几乎是线性的,在 n == 10 ** 7



time_per_iteration 的线性增长等同于<$ c $的二次增长c> total_time



线性复杂度由优化重用原始存储空间。但是我希望线性性能可以无限期地继续下去,而不是在某个时候切换到二次曲线。



我的问题是基于此评论。出于某些奇怪的原因,运行

  python -m timeit -s s = for i in range(10 ** 7):s + ='a' 

耗时极长(比二次数要长得多),因此我从来没有从 timeit 得到实际的计时结果。因此,我改为使用上述简单循环来获取性能数字。



更新:



我的问题可能是以及标题为类似清单的追加如何具有 O(1)的性能而又不会过度分配? 。通过观察小型字符串上的常量 time_per_iteration ,我认为字符串优化必须过度分配。但是 realloc 在扩展小型内存块时在避免内存复制方面非常成功。方案

最后,平台C分配器(如 malloc())是内存的最终来源。当CPython尝试重新分配字符串空间以扩展其大小时,实际上是由系统C realloc()决定发生的事情的细节。如果字符串开头很短,则系统分配器很可能会找到与其相邻的未使用的内存,因此扩展大小只是C库分配器更新一些指针的问题。但是重复几次(再次取决于平台C分配器的详细信息)后,它用完空间。那时, realloc()将需要将整个字符串复制到一个更大的全新可用内存块中。这就是二次行为的来源。



请注意,例如,增加Python列表会面临相同的权衡。但是,列表是为设计的而设计的,因此可以增长,因此CPython故意要求的存储量超过当时的实际需要。这种超额分配的数量随着列表的增长而增加,足以使 realloc()很少需要复制整个列表。但是字符串优化并不会整体化,从而使 realloc()需要更频繁地复制。


Building a string through repeated string concatenation is an anti-pattern, but I'm still curious why its performance switches from linear to quadratic after string length exceeds approximately 10 ** 6:

# this will take time linear in n with the optimization
# and quadratic time without the optimization
import time
start = time.perf_counter()
s = ''
for i in range(n):
    s += 'a'
total_time = time.perf_counter() - start
time_per_iteration = total_time / n

For example, on my machine (Windows 10, python 3.6.1):

  • for 10 ** 4 < n < 10 ** 6, the time_per_iteration is almost perfectly constant at 170±10 µs
  • for 10 ** 6 < n, the time_per_iteration is almost perfectly linear, reaching 520 µs at n == 10 ** 7.

Linear growth in time_per_iteration is equivalent to quadratic growth in total_time.

The linear complexity results from the optimization in the more recent CPython versions (2.4+) that reuse the original storage if no references remain to the original object. But I expected the linear performance to continue indefinitely rather than switch to quadratic at some point.

My question is based made on this comment. For some odd reason running

python -m timeit -s"s=''" "for i in range(10**7):s+='a'"

takes incredibly long time (much longer than quadratic), so I never got the actual timing results from timeit. So instead, I used a simple loop as above to obtain performance numbers.

Update:

My question might as well have been titled "How can a list-like append have O(1) performance without over-allocation?". From observing constant time_per_iteration on small-size strings, I assumed the string optimization must be over-allocating. But realloc is (unexpectedly to me) quite successful at avoiding memory copy when extending small memory blocks.

解决方案

In the end, the platform C allocators (like malloc()) are the ultimate source of memory. When CPython tries to reallocate string space to extend its size, it's really the system C realloc() that determines the details of what happens. If the string is "short" to begin with, chances are decent the system allocator finds unused memory adjacent to it, so extending the size is just a matter of the C library's allocator updating some pointers. But after repeating this some number of times (depending again on details of the platform C allocator), it will run out of space. At that point, realloc() will need to copy the entire string so far to a brand new larger block of free memory. That's the source of quadratic-time behavior.

Note, e.g., that growing a Python list faces the same tradeoffs. However, lists are designed to be grown, so CPython deliberately asks for more memory than is actually needed at the time. The amount of this overallocation scales up as the list grows, enough to make it rare that realloc() needs to copy the whole list-so-far. But the string optimizations do not overallocate, making cases where realloc() needs to copy far more frequent.

这篇关于为什么天真的字符串连接在一定长度以上会变成平方?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆