替换字符串中的空字符串 [英] Replacing the empty strings in a string

查看:97
本文介绍了替换字符串中的空字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无意中发现在python中,一个表单的操作

string1.join(string2)

可以等价表示为

string2.replace('', string1)[len(string1):-len(string1)]

此外,在使用几个不同大小的输入尝试 timeit 之后,这种奇怪的加入方式似乎快了两倍多.

  1. 为什么 join 方法应该更慢?
  2. 像这样替换空字符串是否安全/定义明确?

解决方案

所以首先,让我们分解一下为什么会这样.

<预><代码>>>>string1 = "foo">>>string2 = "bar">>>string1.join(string2)'bfooafor'

这是将string1放在string2的每一项(字符)之间的操作.

所以替换空字符串会做一些有趣的事情,它将空字符之间的间隙计算为空字符串,因此基本上完成相同的任务,除了在开始和结束处有一个额外的分隔符:

<预><代码>>>>string2.replace('', string1)'foobfooafoorfoo'

所以切出这些会产生与 str.join() 相同的结果:

<预><代码>>>>string2.replace('', string1)[len(string1):-len(string1)]'bfooafor'

显然,此解决方案的可读性远低于 str.join(),因此我总是建议不要使用它.str.join() 也被开发为在所有平台上都有效.在某些版本的 Python 上,替换空字符串的效率可能要低得多(我不知道是否是这种情况,但这是一种可能性——就像在 CPython 中重复连接相当快一样,但在其他地方不一定如此.)

我什至在文档中找不到任何表明替换空字符串的行为应该以这种方式起作用的内容,str.replace() 的文档简单地说:

<块引用>

返回字符串的副本,其中所有出现的子字符串 old 都被 new 替换.如果给出了可选参数计数,则仅替换出现的第一个计数.

我不认为我们应该假定字母之间的间隙应该算作空字符串的出现(可以说,您可以在字符串中的任何位置放置无限个空字符串),因此,依赖这种行为可能是个坏主意.

这种操作也非常罕见 - 将一系列字符串连接在一起更为常见 - 连接字符串的单个字符并不是我个人经常需要做的事情.

有趣的是,这个 x.replace("", y)Python 源代码:

2347/* 字符串替换不同情况的算法 */23482349/* len(self)>=1, from="", len(to)>=1, maxcount>=1 */2350 Py_LOCAL(PyStringObject *)2351 replace_interleave(PyStringObject *self,2352 const char *to_s,Py_ssize_t to_len,第2353章第2354章...

很可能是这种特殊的外壳使其性能良好.同样,由于文档中没有提到它,这是一个实现细节,假设它在其他 Python 版本中也能很快(或根本没有)运行将是一个错误.

I accidentally found that in python, an operation of the form

string1.join(string2)

Can be equivalently expressed as

string2.replace('', string1)[len(string1):-len(string1)]

Furthermore, after trying timeit with a few different sized inputs, this weird way to join seems to be more than twice as fast.

  1. Why should the join method be slower?
  2. Is replacing the empty string like this a safe/well-defined thing to do?

解决方案

So first of all, let's break down why this works.

>>> string1 = "foo"
>>> string2 = "bar"
>>> string1.join(string2)
'bfooafoor'

This is the operation of putting string1 between every item (character) of string2.

So replacing the empty string does something kind of interesting, it counts the gap between empty characters as the empty string and therefore does essentially the same task, except with an extra separator at the start and end:

>>> string2.replace('', string1)
'foobfooafoorfoo'

So slicing out these produces the same result as str.join():

>>> string2.replace('', string1)[len(string1):-len(string1)]
'bfooafoor'

Obviously, this solution is much, much less readable than str.join(), and so I'd always recommend against it. str.join() has also been developed to be efficient on all platforms. Replacing the empty string might be far less efficient on some versions of Python (I don't know if that's the case, but it's a possibility - just as repeated concatenation is reasonably fast in CPython, but that's not necessarily the case elsewhere.)

I can't even find anything in the documentation that suggests that this behaviour of replacing the empty string should function this way, the docs for str.replace() simply say:

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

I see no reason why we should presume that the gaps in between letters should count as an occurrence of the empty string (arguably, you could fit infinite empty strings anywhere in the string), and as such, relying on this behaviour might be a bad idea.

This operation is also pretty rare - it's more common to have a sequence of strings to join together - joining individual characters of a string isn't something I have personally had to do often.

Interestingly, this x.replace("", y) appears to be special cased in the Python source:

2347 /* Algorithms for different cases of string replacement */
2348
2349 /* len(self)>=1, from="", len(to)>=1, maxcount>=1 */
2350 Py_LOCAL(PyStringObject *)
2351 replace_interleave(PyStringObject *self,
2352 const char *to_s, Py_ssize_t to_len,
2353 Py_ssize_t maxcount)
2354 {
...

It may well be this special casing causes it to perform well. Again, as it's not mentioned in the documentation, this is an implementation detail, and assuming it will work as quickly (or at all) in other Python versions would be a mistake.

这篇关于替换字符串中的空字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆