对StringIO，cStringIO和ByteIO感到困惑 [英] Confusing about StringIO, cStringIO and ByteIO

查看：197 发布时间：2020/7/10 2:12:44 python stringio bytesio cstringio

本文介绍了对StringIO，cStringIO和ByteIO感到困惑的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经用谷歌搜索并且还在SO上搜索这些缓冲模块之间的区别.但是，我仍然不太了解，我认为我阅读的一些帖子已经过时了.

I have googled and also search on SO for the difference between these buffer modules. However, I still don't understand very well and I think some of the posts I read are out of date.

在Python 2.7.11中，我使用r = requests.get(url)下载了特定格式的二进制文件.然后，我将StringIO.StringIO(r.content)，cStringIO.StringIO(r.content)和io.BytesIO(r.content)传递给了一个用于解析内容的函数.

In Python 2.7.11, I downloaded a binary file of a specific format using r = requests.get(url). Then I passed StringIO.StringIO(r.content), cStringIO.StringIO(r.content) and io.BytesIO(r.content) to a function designed for parsing the content.

这三种方法都可用.我的意思是，即使文件是二进制文件，使用StringIO仍然可行.为什么?

All these three methods are available. I mean, even if the file is binary, it's still feasible to use StringIO. Why?

另一件事是关于它们的效率.

Another thing is concerning their efficiency.

In [1]: import StringIO, cStringIO, io

In [2]: from numpy import random

In [3]: x = random.random(1000000)

In [4]: %timeit y = cStringIO.StringIO(x)
1000000 loops, best of 3: 736 ns per loop

In [5]: %timeit y = StringIO.StringIO(x)
1000 loops, best of 3: 283 µs per loop

In [6]: %timeit y = io.BytesIO(x)
1000 loops, best of 3: 1.26 ms per loop

如上所述，cStringIO > StringIO > BytesIO.

我发现有人提到io.BytesIO总是制作新副本，这会花费更多时间.但是也有一些帖子提到，此问题已在更高的Python版本中得到解决.

I found someone mentioned that io.BytesIO always makes a new copy which costs more time. But there are also some posts mentioned that this was fixed in later Python versions.

那么，有人能在最新的Python 2.x和3.x中对这两个IO进行全面比较吗?

So, can anyone make a thorough comparison between these IOs, in both latest Python 2.x and 3.x?

我找到了一些参考文献:

Some of the reference I found:

https://trac.edgewall.org/ticket/12046

io.StringIO需要一个unicode字符串. io.BytesIO需要一个字节字符串. StringIO.StringIO允许使用unicode或bytes字符串. cStringIO.StringIO需要一个编码为字节字符串的字符串.

io.StringIO requires a unicode string. io.BytesIO requires a bytes string. StringIO.StringIO allows either unicode or bytes string. cStringIO.StringIO requires a string that is encoded as a bytes string.

但是cStringIO.StringIO('abc')不会引发任何错误.

But cStringIO.StringIO('abc') doesn't raise any error.

https://review.openstack.org/#/c/286926/1

StringIO类是用于此目的的错误类，尤其是考虑到子单元v2是二进制而不是字符串.

The StringIO class is the wrong class to use for this, especially considering that subunit v2 is binary and not a string.

http://comments.gmane.org/gmane.comp .python.devel/148717

cStringIO.StringIO(b'data')没有复制数据，而io.BytesIO(b'data')进行了复制(即使以后不修改数据).

cStringIO.StringIO(b'data') didn't copy the data while io.BytesIO(b'data') makes a copy (even if the data is not modified later).

2014年这篇文章中有一个修补程序.

There is a fix patch in this post in 2014.

此处未列出很多SO帖子.

以下是埃里克(Eric)示例的python 2.7结果

Here are the Python 2.7 results for Eric's example

%timeit cStringIO.StringIO(u_data)
1000000 loops, best of 3: 488 ns per loop
%timeit cStringIO.StringIO(b_data)
1000000 loops, best of 3: 448 ns per loop
%timeit StringIO.StringIO(u_data)
1000000 loops, best of 3: 1.15 µs per loop
%timeit StringIO.StringIO(b_data)
1000000 loops, best of 3: 1.19 µs per loop
%timeit io.StringIO(u_data)
1000 loops, best of 3: 304 µs per loop
# %timeit io.StringIO(b_data)
# error
# %timeit io.BytesIO(u_data)
# error
%timeit io.BytesIO(b_data)
10000 loops, best of 3: 77.5 µs per loop

对于2.7，cStringIO.StringIO和StringIO.StringIO的效率远远高于io.

As for 2.7, cStringIO.StringIO and StringIO.StringIO are far more efficient than io.

推荐答案

在python 2和3中，应使用io.StringIO处理unicode对象，使用io.BytesIO处理bytes对象，以实现前向兼容性(这是3个都必须提供的).

You should use io.StringIO for handling unicode objects and io.BytesIO for handling bytes objects in both python 2 and 3, for forwards-compatibility (this is all 3 has to offer).

这是一个更好的测试(针对python 2和3)，其中不包括从numpy到str/bytes

Here's a better test (for python 2 and 3), that doesn't include conversion costs from numpy to str/bytes

import numpy as np
import string
b_data = np.random.choice(list(string.printable), size=1000000).tobytes()
u_data = b_data.decode('ascii')
u_data = u'\u2603' + u_data[1:]  # add a non-ascii character

然后:

import io
%timeit io.StringIO(u_data)
%timeit io.StringIO(b_data)
%timeit io.BytesIO(u_data)
%timeit io.BytesIO(b_data)

在python 2中，您还可以测试:

In python 2, you can also test:

import StringIO, cStringIO
%timeit cStringIO.StringIO(u_data)
%timeit cStringIO.StringIO(b_data)
%timeit StringIO.StringIO(u_data)
%timeit StringIO.StringIO(b_data)

其中一些会崩溃，抱怨非ASCII字符

Some of these will crash, complaining about non-ascii characters

Python 3.5结果:

Python 3.5 results:

>>> %timeit io.StringIO(u_data)
100 loops, best of 3: 8.61 ms per loop
>>> %timeit io.StringIO(b_data)
TypeError: initial_value must be str or None, not bytes
>>> %timeit io.BytesIO(u_data)
TypeError: a bytes-like object is required, not 'str'
>>> %timeit io.BytesIO(b_data)
The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 344 ns per loop

Python 2.7结果(在另一台机器上运行):

Python 2.7 results (run on a different machine):

>>> %timeit io.StringIO(u_data)
1000 loops, best of 3: 304 µs per loop
>>> %timeit io.StringIO(b_data)
TypeError: initial_value must be unicode or None, not str
>>> %timeit io.BytesIO(u_data)
TypeError: 'unicode' does not have the buffer interface
>>> %timeit io.BytesIO(b_data)
10000 loops, best of 3: 77.5 µs per loop

>>> %timeit cStringIO.StringIO(u_data)
UnicodeEncodeError: 'ascii' codec cant encode character u'\u2603' in position 0: ordinal not in range(128)
>>> %timeit cStringIO.StringIO(b_data)
1000000 loops, best of 3: 448 ns per loop
>>> %timeit StringIO.StringIO(u_data)
1000000 loops, best of 3: 1.15 µs per loop
>>> %timeit StringIO.StringIO(b_data)
1000000 loops, best of 3: 1.19 µs per loop

这篇关于对StringIO，cStringIO和ByteIO感到困惑的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对StringIO，cStringIO和ByteIO感到困惑 [英] Confusing about StringIO, cStringIO and ByteIO

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对StringIO，cStringIO和ByteIO感到困惑 [英] Confusing about StringIO, cStringIO and ByteIO

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭