迭代字符串追加的时间复杂度实际上是 O(n^2) 还是 O(n)? [英] Is the time-complexity of iterative string append actually O(n^2), or O(n)?

查看:20
本文介绍了迭代字符串追加的时间复杂度实际上是 O(n^2) 还是 O(n)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解决 CTCI 之外的问题.

I am working on a problem out of CTCI.

第一章的第三题让你取一个字符串,比如

The third problem of chapter 1 has you take a string such as

'约翰史密斯先生'

并要求您用 %20 替换中间空格:

and asks you to replace the intermediary spaces with %20:

'Mr%20John%20Smith'

作者在 Python 中提供了这个解决方案,称之为 O(n):

The author offers this solution in Python, calling it O(n):

def urlify(string, length):
    '''function replaces single spaces with %20 and removes trailing spaces'''
    counter = 0
    output = ''
    for char in string:
        counter += 1
        if counter > length:
            return output
        elif char == ' ':
            output = output + '%20'
        elif char != ' ':
            output = output + char
    return output

我的问题:

我知道从左到右扫描实际字符串是 O(n).但是 Python 中的字符串不是不可变的吗?如果我有一个字符串并使用 + 运算符向其中添加另一个字符串,它是否会分配必要的空间,复制原始字符串,然后复制附加字符串?

I understand that this is O(n) in terms of scanning through the actual string from left to right. But aren't strings in Python immutable? If I have a string and I add another string to it with the + operator, doesn't it allocate the necessary space, copy over the original, and then copy over the appending string?

如果我有 n 个长度为 1 的字符串的集合,那么需要:

If I have a collection of n strings each of length 1, then that takes:

1 + 2 + 3 + 4 + 5 + ... + n = n(n+1)/2

O(n^2) 时间,是吗?还是我对 Python 处理追加的方式有误?

or O(n^2) time, yes? Or am I mistaken in how Python handles appending?

或者,如果您愿意教我如何钓鱼:我将如何为自己找到这个?我尝试 Google 官方来源的尝试没有成功.我找到了 https://wiki.python.org/moin/TimeComplexity 但这没有字符串上的任何东西.

Alternatively, if you'd be willing to teach me how to fish: How would I go about finding this out for myself? I've been unsuccessful in my attempts to Google an official source. I found https://wiki.python.org/moin/TimeComplexity but this doesn't have anything on strings.

推荐答案

在 CPython(Python 的标准实现)中,有一个实现细节通常是 O(n),在 字节码评估循环为 ++= 调用的代码code> 带有两个字符串操作数.如果 Python 检测到左参数没有其他引用,它会调用 realloc 以尝试通过调整字符串大小来避免复制.这不是您应该依赖的东西,因为它是一个实现细节,并且因为如果 realloc 最终需要频繁移动字符串,无论如何性能都会下降到 O(n^2).

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for + or += with two string operands. If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

没有奇怪的实现细节,由于涉及二次复制,算法是 O(n^2).像这样的代码只在具有可变字符串的语言中才有意义,比如 C++,甚至在 C++ 中你也想使用 +=.

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=.

这篇关于迭代字符串追加的时间复杂度实际上是 O(n^2) 还是 O(n)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆