在Python 2.7中,为什么在文本模式下编写字符串比在二进制模式下编写字符串更快? [英] In Python 2.7 why are strings written faster in text mode than in binary mode?

查看:78
本文介绍了在Python 2.7中,为什么在文本模式下编写字符串比在二进制模式下编写字符串更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的示例脚本使用"w" (文本)或"wb" (二进制模式)将一些字符串写入文件:

The following example script writes some strings to a file using either "w", text, or "wb", binary mode:

import itertools as it
from string import ascii_lowercase
import time

characters = it.cycle(ascii_lowercase)
mode = 'w'
# mode = 'wb'  # using this mode takes longer to execute
t1 = time.clock()
with open('test.txt', mode) as fh:
    for __ in xrange(10**7):
        fh.write(''.join(it.islice(characters, 0, 50)))
t2 = time.clock()
print 'Mode: {}, time elapsed: {:.2f}'.format(mode, t2 - t1)

在Python 2中,使用"w" 模式,我发现它在 24.89 +/- 0.02 s 中执行,而使用"wb" 需要 25.67 +/- 0.02 s 来执行.以下是每种模式下三个连续运行的具体时间:

With Python 2, using "w" mode I found it executes in 24.89 +/- 0.02 s while using "wb" it takes 25.67 +/- 0.02 s to execute. These are the specific timings for three consecutive runs for each mode:

mode_w  = [24.91, 24.86, 24.91]
mode_wb = [25.68, 25.64, 25.69]

我对这些结果感到惊讶,因为Python 2始终将其字符串存储为二进制字符串,因此"w" "wb" 都不需要执行任何编码工作.另一方面,文本模式需要执行其他工作,例如检查行尾:

I'm surprised by these results since Python 2 stores its strings anyway as binary strings, so neither "w" nor "wb" need to perform any encoding work. Text mode on the other hand needs to perform additional work such as checking for line endings:

默认设置为使用文本模式,该模式可以将'\ n'字符在书写时转换为特定于平台的表示形式,并在阅读时返回.

The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading.

因此,如果有什么需要的话,我希望文本模式"w" 比二进制模式"wb" 花费更长的时间.但是,情况似乎恰恰相反.为什么会这样?

So if anything I'd expect text mode "w" to take longer than binary mode "wb". However the opposite seems to be the case. Why is this?

使用CPython 2.7.12测试

推荐答案

查看

此处,当 open 的模式包括"b" 时,将设置 f-> f_binary .在这种情况下,Python从字符串对象构造一个辅助缓冲区对象,然后从该缓冲区获取数据 s 和长度 n .我想这是为了与其他支持缓冲区接口的对象兼容(通用).

Here f->f_binary is set when the mode for open includes "b". In this case Python constructs an auxiliary buffer object from the string object and then gets the data s and length n from that buffer. I suppose this is for compatibility (generality) with other objects that support the buffer interface.

此处 PyArg_ParseTuple(args,"s *,& pbuf) 创建相应的缓冲对象.此操作需要额外的计算时间,而在使用文本模式时,Python只需解析参数作为对象("O" )几乎是免费的.通过

Here PyArg_ParseTuple(args, "s*", &pbuf) creates the corresponding buffer object. This operation requires additional compute time while when working with text mode, Python simply parses the argument as an Object ("O") at almost no cost. Retrieving the data and length via

s = PyString_AS_STRING(text);
n = PyString_GET_SIZE(text);

在创建缓冲区时,

还执行/a>.

is also performed when the buffer is created.

这意味着在二进制模式下工作时,与从字符串对象创建辅助缓冲区对象相关的额外开销.因此,在二进制模式下执行时间会更长.

This means that when working in binary mode there's an additional overhead associated with creating an auxiliary buffer object from the string object. For that reason the execution time is longer when working in binary mode.

这篇关于在Python 2.7中,为什么在文本模式下编写字符串比在二进制模式下编写字符串更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆