Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能) [英] Python cStringIO take more time than StringIO in writing (performance of string methods)

查看:158
本文介绍了Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的方式来配置python的字符串方法,以便我可以使用最快的一个。
我有这个代码来测试字符串连接的文件,StringIO,StringIO和普通的字符串。

 #!/ usr / bin / env python 
#title:pythonTiming.py
#description:将用于测试python中的定时函数
#author:myusuf
#date:19-11-2014
#version:0
#usage:python pythonTiming.py
#notes:
#python_version:2.6.6
#= ================================================== ===========================

导入时间
导入cStringIO
导入StringIO

类定时器(对象):
$ b $ def __enter __(self):
self.start = time.time()
返回self

def __exit __(self,* args):
self.end = time.time()
self.interval = self.end - self.start

testbuf =你好这是一个一般的字符串,将被表示
这个字符串会被写入一个文件,StringIO和一个sregualr条件,然后根据时间看最好的处理字符串

* 1000

MyFile = open(./ testfile ().txt,wb +)
MyStr ='
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()
$ b def strWithFiles :
全局MyFile
打印将字符串写入文件
用于范围(1000)中的索引:
MyFile.write(testbuf)
pass

def strWithStringIO():
全局MyStrIo
打印将字符串写入StrinIO
用于范围(1000)中的索引:
MyStrIo.write(testbuf)

strWithStr():
全局MyStr
print将字符串写入STR
用于范围(500)中的索引:
MyStr = MyStr + testbuf
$ b def strWithCstr():
全局MycStrIo
打印将字符串写入Cstring
用于范围(1000)中的索引:
MycStrIo.write(testbuf)

用Timer()作为t:
st rWithFiles()
print('##请求花费了%.03f秒。'%t.interval)

以Timer()作为t:
strWithStringIO()
'print'('###请求花了%.03f秒。%t.interval)

with Timer()as t:
strWithCstr()
print(' ###请求花费了%.03f秒。'%t.interval)

以Timer()作为t:
read1 ='x'+ MyFile.read(-1)
print('file read ##请求花了%.03f秒。'%t.interval)

用Timer()作为t:
read2 ='x'+ MyStrIo.read (-1)
print('stringIo read ###请求花费了%.03f秒。'%t.interval)

用Timer()作为t:
read3 = 'x'+ MycStrIo.read(-1)
print('CString read ####请求花费了%.03f秒。'%t.interval)




MyFile.close()




  1. 虽然Pyth在文档网站上说 cStringIO > StringIO 要快,但结果表明 StringIO 在连接中有更好的性能,为什么?

  2. 另一方面,读取 cStringIO > StringIO (它的行为类似于文件)更快,因为我读了文件的实现并且 cStringIO 是在C中,那么为什么字符串连接速度慢?


  3. 是否有任何其他方式来处理字符串比这些方法更快?



解决方案

原因是 StringIO 在幕后,它只是保留了所有已写入的字符串的列表,只有在必要时才将它们结合起来。所以写操作就像将一个对象附加到列表一样简单。然而, cStringIO 模块没有这个优点,必须将每个字符串的数据复制到它的缓冲区中,在必要时调整它的缓冲区大小数据写入大量数据时)。

由于你写了很多较大的字符串,这意味着 StringIO cStringIO 。当从你写入的 StringIO 对象读取数据时,它可以通过计算写入到它的字符串的长度之和来预先分配一个缓冲区不过, StringIO 并不是加入一系列字符串的最快方式。这是因为它提供了额外的功能(寻找缓冲区的不同部分并在那里写入数据)。如果不需要这个功能,所有你想要做的就是加入一个列表字符串,然后 str.join 是最快的方法。

  joined_string =.join(testbuf索引范围(1000))
#或建立分开加入的字符串列表$ b $ (1000):
strings.append(testbuf)
joined_string =.join(strings)


In my way to profile string methods in python so that I can use the fastest one. I have this code to test string concatenation in files, StringIO, StringIO and normal string.

#!/usr/bin/env python
#title           : pythonTiming.py
#description     : Will be used to test timing function in python
#author          : myusuf
#date            : 19-11-2014
#version         : 0
#usage           :python pythonTiming.py
#notes           :
#python_version  :2.6.6  
#==============================================================================

import time
import cStringIO
import StringIO

class Timer(object):

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.interval = self.end - self.start

testbuf = """ Hello This is a General String that will be repreated
This string will be written to a file , StringIO and a sregualr strin then see the best to handle string according to time 

""" * 1000

MyFile = open("./testfile.txt" ,"wb+")
MyStr  = ''
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()

def strWithFiles():
    global MyFile
    print "writing string to file "
    for index in range(1000):
        MyFile.write(testbuf) 
    pass

def strWithStringIO():
    global MyStrIo
    print "writing string to StrinIO "
    for index in range(1000):
        MyStrIo.write(testbuf)

def strWithStr():
    global MyStr
    print "Writing String to STR "
    for index in range(500):
        MyStr =  MyStr +  testbuf

def strWithCstr():
    global MycStrIo
    print "writing String to Cstring"
    for index in range(1000):
        MycStrIo.write(testbuf)

with Timer() as t:
    strWithFiles()
print('##Request took %.03f sec.' % t.interval)

with Timer() as t:                                                                                
    strWithStringIO()
print('###Request took %.03f sec.' % t.interval)  

with Timer() as t:                                                                                
    strWithCstr()
print('####Request took %.03f sec.' % t.interval)  

with Timer() as t:
    read1 = 'x' + MyFile.read(-1)
print('file read ##Request took %.03f sec.' % t.interval)

with Timer() as t:
    read2 = 'x' + MyStrIo.read(-1)
print('stringIo read ###Request took %.03f sec.' % t.interval)

with Timer() as t:
    read3 = 'x' + MycStrIo.read(-1)
print('CString read ####Request took %.03f sec.' % t.interval)




MyFile.close()

  1. While the Python documentation site says that cStringIO is faster than StringIO but the results says that StringIO has better performance in concatenation, why?

  2. The other hand is that, reading from cStringIO is faster than StringIO (its behavior similar to file), as I read the implementation of file and cStringIO are in C, so why string concatenation is slow?

  3. Is there any other way to deal with string more faster than these methods?

解决方案

The reason that StringIO performs better is behind the scenes it just keeps a list of all the strings that have been written to it, and only combines them when necessary. So a write operation is as simple as appending an object to a list. However, the cStringIO module does not have this luxury and must copy over the data of each string into its buffer, resizing its buffer as and when necessary (which creates much redundant copying of data when writing large amounts of data).

Since you are writing lots of larger strings, this means there is less work for StringIO to do in comparison to cStringIO. When reading from a StringIO object you have written to, it can optmise the amount of copying needed by computing the sum of the lengths of the strings written to it preallocating a buffer of that size.

However, StringIO is not the fastest way of joining a series of strings. This is because it provides additional functionality (seeking to different parts of the buffer and writing data there). If this functionality is not needed all you want to do is join a list strings together, then str.join is the fastest way to do this.

joined_string = "".join(testbuf for index in range(1000))
# or building the list of strings to join separately
strings = []
for i in range(1000):
    strings.append(testbuf)
joined_string = "".join(strings)

这篇关于Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆