如何在 Python 中将一个字符串附加到另一个字符串? [英] How do I append one string to another in Python?

查看:48
本文介绍了如何在 Python 中将一个字符串附加到另一个字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一种在 Python 中将一个字符串附加到另一个字符串的有效方法,而不是以下内容.

var1 = "foo"var2 = "酒吧"var3 = var1 + var2

有什么好的内置方法可以使用吗?

解决方案

如果您只有一个字符串引用并且您将另一个字符串连接到末尾,CPython 现在会对此进行特殊处理并尝试将字符串扩展到位.

最终的结果是操作被摊销了 O(n).

例如

s = ""对于范围(n)中的我:s+=str(i)

以前是 O(n^2),但现在是 O(n).

来自源代码(bytesobject.c):

voidPyBytes_ConcatAndDel(注册 PyObject **pv,注册 PyObject *w){PyBytes_Concat(pv, w);Py_XDECREF(w);}/* 下面的函数打破了字符串不可变的观念:它改变了字符串的大小.只有在有的情况下,我们才能逃脱只是一个引用对象的模块.你也可以想到就像创建一个新的字符串对象并销毁旧的对象一样,只有更有效率.在任何情况下,如果字符串可能,请不要使用它代码的其他部分已经知道......请注意,如果没有足够的内存来调整字符串的大小,则原始*pv 处的字符串对象被释放,*pv 设置为 NULL,一个out of内存"异常被设置,并返回 -1.否则(成功)0 是返回,*pv 中的值可能与输入相同,也可能不同.与往常一样,为尾随的 \0 字节分配了一个额外的字节(newsize*不*包括那个),并存储一个尾随 \0 字节.*/整数_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize){注册 PyObject *v;注册 PyBytesObject *sv;v = * pv;if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize <0) {*pv = 0;Py_DECREF(v);PyErr_BadInternalCall();返回-1;}/* XXX UNREF/NEWREF 接口应该更对称 */_Py_DEC_REFTOTAL;_Py_ForgetReference(v);*pv = (PyObject *)PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);如果 (*pv == NULL) {PyObject_Del(v);PyErr_NoMemory();返回-1;}_Py_NewReference(*pv);sv = (PyBytesObject *) *pv;Py_SIZE(sv) = newsize;sv->ob_sval[newsize] = '\0';sv->ob_shash = -1;/* 使缓存的哈希值无效 */返回0;}

凭经验验证很容易.

<前>$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"1000000 个循环,最好的 3 个:每个循环 1.85 微秒$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"10000 个循环,最好的 3 个:每个循环 16.8 微秒$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"10000 个循环,最好的 3 个:每个循环 158 微秒$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"1000 个循环,最好的 3 个:每个循环 1.71 毫秒$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"10 个循环,最好的 3 个:每个循环 14.6 毫秒$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"10 个循环,最好的 3 个:每个循环 173 毫秒

很重要 但是请注意,此优化不是 Python 规范的一部分.据我所知,它仅在 cPython 实现中.例如,在 pypy 或 jython 上的相同经验测试可能会显示较旧的 O(n**2) 性能.

<前>$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"10000 个循环,最好的 3 个:每个循环 90.8 微秒$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"1000 个循环,最好的 3 个:每个循环 896 微秒$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"100 个循环,最好的 3 个:每个循环 9.03 毫秒$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"10 个循环,最好的 3 个:每个循环 89.5 毫秒

到目前为止一切都很好,但是,

<前>$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"10 个循环,最好的 3 个:每个循环 12.8 秒

哎哟比二次方还差.所以 pypy 做的事情对短字符串很有效,但对大字符串表现不佳.

I want an efficient way to append one string to another in Python, other than the following.

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

Is there any good built-in method to use?

解决方案

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = ""
for i in range(n):
    s+=str(i)

used to be O(n^2), but now it is O(n).

From the source (bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It's easy enough to verify empirically.

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

这篇关于如何在 Python 中将一个字符串附加到另一个字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆