Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能) [英] Python cStringIO take more time than StringIO in writing (performance of string methods)
问题描述
我有这个代码来测试字符串连接的文件,StringIO,StringIO和普通的字符串。
#!/ usr / bin / env python
#title:pythonTiming.py
#description:将用于测试python中的定时函数
#author:myusuf
#date:19-11-2014
#version:0
#usage:python pythonTiming.py
#notes:
#python_version:2.6.6
#= ================================================== ===========================
导入时间
导入cStringIO
导入StringIO
类定时器(对象):
$ b $ def __enter __(self):
self.start = time.time()
返回self
def __exit __(self,* args):
self.end = time.time()
self.interval = self.end - self.start
testbuf =你好这是一个一般的字符串,将被表示
这个字符串会被写入一个文件,StringIO和一个sregualr条件,然后根据时间看最好的处理字符串
* 1000
MyFile = open(./ testfile ().txt,wb +)
MyStr ='
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()
$ b def strWithFiles :
全局MyFile
打印将字符串写入文件
用于范围(1000)中的索引:
MyFile.write(testbuf)
pass
def strWithStringIO():
全局MyStrIo
打印将字符串写入StrinIO
用于范围(1000)中的索引:
MyStrIo.write(testbuf)
strWithStr():
全局MyStr
print将字符串写入STR
用于范围(500)中的索引:
MyStr = MyStr + testbuf
$ b def strWithCstr():
全局MycStrIo
打印将字符串写入Cstring
用于范围(1000)中的索引:
MycStrIo.write(testbuf)
用Timer()作为t:
st rWithFiles()
print('##请求花费了%.03f秒。'%t.interval)
以Timer()作为t:
strWithStringIO()
'print'('###请求花了%.03f秒。%t.interval)
with Timer()as t:
strWithCstr()
print(' ###请求花费了%.03f秒。'%t.interval)
以Timer()作为t:
read1 ='x'+ MyFile.read(-1)
print('file read ##请求花了%.03f秒。'%t.interval)
用Timer()作为t:
read2 ='x'+ MyStrIo.read (-1)
print('stringIo read ###请求花费了%.03f秒。'%t.interval)
用Timer()作为t:
read3 = 'x'+ MycStrIo.read(-1)
print('CString read ####请求花费了%.03f秒。'%t.interval)
MyFile.close()
-
虽然Pyth在文档网站上说
cStringIO
比> StringIO
要快,但结果表明StringIO
在连接中有更好的性能,为什么?
另一方面,读取
cStringIO
比> StringIO
(它的行为类似于文件)更快,因为我读了文件的实现并且cStringIO
是在C中,那么为什么字符串连接速度慢?
是否有任何其他方式来处理字符串比这些方法更快?
原因是 StringIO
在幕后,它只是保留了所有已写入的字符串的列表,只有在必要时才将它们结合起来。所以写操作就像将一个对象附加到列表一样简单。然而, cStringIO
模块没有这个优点,必须将每个字符串的数据复制到它的缓冲区中,在必要时调整它的缓冲区大小数据写入大量数据时)。
由于你写了很多较大的字符串,这意味着 StringIO
cStringIO
。当从你写入的 StringIO
对象读取数据时,它可以通过计算写入到它的字符串的长度之和来预先分配一个缓冲区不过, StringIO
并不是加入一系列字符串的最快方式。这是因为它提供了额外的功能(寻找缓冲区的不同部分并在那里写入数据)。如果不需要这个功能,所有你想要做的就是加入一个列表字符串,然后 str.join
是最快的方法。
joined_string =.join(testbuf索引范围(1000))
#或建立分开加入的字符串列表$ b $ (1000):
strings.append(testbuf)
joined_string =.join(strings)
In my way to profile string methods in python so that I can use the fastest one. I have this code to test string concatenation in files, StringIO, StringIO and normal string.
#!/usr/bin/env python
#title : pythonTiming.py
#description : Will be used to test timing function in python
#author : myusuf
#date : 19-11-2014
#version : 0
#usage :python pythonTiming.py
#notes :
#python_version :2.6.6
#==============================================================================
import time
import cStringIO
import StringIO
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
self.end = time.time()
self.interval = self.end - self.start
testbuf = """ Hello This is a General String that will be repreated
This string will be written to a file , StringIO and a sregualr strin then see the best to handle string according to time
""" * 1000
MyFile = open("./testfile.txt" ,"wb+")
MyStr = ''
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()
def strWithFiles():
global MyFile
print "writing string to file "
for index in range(1000):
MyFile.write(testbuf)
pass
def strWithStringIO():
global MyStrIo
print "writing string to StrinIO "
for index in range(1000):
MyStrIo.write(testbuf)
def strWithStr():
global MyStr
print "Writing String to STR "
for index in range(500):
MyStr = MyStr + testbuf
def strWithCstr():
global MycStrIo
print "writing String to Cstring"
for index in range(1000):
MycStrIo.write(testbuf)
with Timer() as t:
strWithFiles()
print('##Request took %.03f sec.' % t.interval)
with Timer() as t:
strWithStringIO()
print('###Request took %.03f sec.' % t.interval)
with Timer() as t:
strWithCstr()
print('####Request took %.03f sec.' % t.interval)
with Timer() as t:
read1 = 'x' + MyFile.read(-1)
print('file read ##Request took %.03f sec.' % t.interval)
with Timer() as t:
read2 = 'x' + MyStrIo.read(-1)
print('stringIo read ###Request took %.03f sec.' % t.interval)
with Timer() as t:
read3 = 'x' + MycStrIo.read(-1)
print('CString read ####Request took %.03f sec.' % t.interval)
MyFile.close()
While the Python documentation site says that
cStringIO
is faster thanStringIO
but the results says thatStringIO
has better performance in concatenation, why?The other hand is that, reading from
cStringIO
is faster thanStringIO
(its behavior similar to file), as I read the implementation of file andcStringIO
are in C, so why string concatenation is slow?Is there any other way to deal with string more faster than these methods?
The reason that StringIO
performs better is behind the scenes it just keeps a list of all the strings that have been written to it, and only combines them when necessary. So a write operation is as simple as appending an object to a list. However, the cStringIO
module does not have this luxury and must copy over the data of each string into its buffer, resizing its buffer as and when necessary (which creates much redundant copying of data when writing large amounts of data).
Since you are writing lots of larger strings, this means there is less work for StringIO
to do in comparison to cStringIO
. When reading from a StringIO
object you have written to, it can optmise the amount of copying needed by computing the sum of the lengths of the strings written to it preallocating a buffer of that size.
However, StringIO
is not the fastest way of joining a series of strings. This is because it provides additional functionality (seeking to different parts of the buffer and writing data there). If this functionality is not needed all you want to do is join a list strings together, then str.join
is the fastest way to do this.
joined_string = "".join(testbuf for index in range(1000))
# or building the list of strings to join separately
strings = []
for i in range(1000):
strings.append(testbuf)
joined_string = "".join(strings)
这篇关于Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!