Python字符串串联习语.需要澄清. [英] Python string concatenation Idiom. Need Clarification.

查看:50
本文介绍了Python字符串串联习语.需要澄清.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 http://jaynes.colorado.edu/PythonIdioms.html

"将字符串构建为列表并使用''.最后加入.连接是一个字符串在分隔符上调用的方法,不是列表.从空调用字符串将片段连接在一起,没有分隔符,这是一个Python怪癖,起初相当令人惊讶.这是重要:用+建立字符串是二次时间而不是线性!如果你学会了一个习惯用法,就学会了这一点.

"Build strings as a list and use ''.join at the end. join is a string method called on the separator, not the list. Calling it from the empty string concatenates the pieces with no separator, which is a Python quirk and rather surprising at first. This is important: string building with + is quadratic time instead of linear! If you learn one idiom, learn this one.

错误:对于字符串中的s:结果+ = s

Wrong: for s in strings: result += s

右:result =''.join(strings)"

Right: result = ''.join(strings)"

我不确定为什么会这样.如果我想加入一些字符串,对我来说,将它们放在列表中然后调用''.join在直觉上对我来说不是更好.不会将它们放入列表中会产生一些开销吗?要澄清...

I'm not sure why this is true. If I have some strings I want to join them, for me it isn't intuitively better to me to put them in a list then call ''.join. Doesn't putting them into a list create some overhead? To Clarify...

Python命令行:

Python Command Line:

>>> str1 = 'Not'
>>> str2 = 'Cool'
>>> str3 = ''.join([str1, ' ', str2]) #The more efficient way **A**
>>> print str3
Not Cool
>>> str3 = str1 + ' ' + str2 #The bad way **B**
>>> print str3
Not Cool

A 是真正的线性时间,而 B 是二次时间吗?

Is A really linear time and B is quadratic time?

推荐答案

是.对于您选择的示例,重要性并不明确,因为您只有两个非常短的字符串,因此追加操作可能会更快.

Yes. For the examples you chose the importance isn't clear because you only have two very short strings so the append would probably be faster.

但是,每次在Python中对字符串进行 a + b 时,都会导致新分配,然后将a和b中的所有字节复制到新字符串中.如果在具有很多字符串的循环中执行此操作,则必须一次又一次地复制这些字节,一次又一次,并且每次必须复制的数量都会变长.这给出了二次行为.

But every time you do a + b with strings in Python it causes a new allocation and then copies all the bytes from a and b into the new string. If you do this in a loop with lots of strings these bytes have to be copied again, and again, and again and each time the amount that has to be copied gets longer. This gives the quadratic behaviour.

另一方面,创建字符串列表不会复制字符串的内容-只会复制引用.这非常快,并且可以线性运行.然后,join方法仅进行一次内存分配,并且仅将每个字符串复制到正确位置一次.这也只需要线性时间.

On the other hand, creating a list of strings doesn't copy the contents of the strings - it just copies the references. This is incredibly fast, and runs in linear time. The join method then makes just one memory allocation and copies each string into the correct position only once. This also takes only linear time.

所以是的,如果您可能要处理大量字符串,请使用''.join 惯用语.仅用两个字符串就没关系.

So yes, do use the ''.join idiom if you are potentially dealing with a large number of strings. For just two strings it doesn't matter.

如果您需要更多说服力,请自己尝试用1000万个字符创建一个字符串:

If you need more convincing, try it for yourself creating a string from 10M characters:

>>> chars = ['a'] * 10000000
>>> r = ''
>>> for c in chars: r += c
>>> print len(r)

与之比较:

>>> chars = ['a'] * 10000000
>>> r = ''.join(chars)
>>> print len(r)

第一种方法大约需要10秒钟.第二秒不到1秒.

The first method takes about 10 seconds. The second takes under 1 second.

这篇关于Python字符串串联习语.需要澄清.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆