实际文件对象比StringIO和cStringIO慢吗? [英] Real file objects slower than StringIO and cStringIO?

查看:96
本文介绍了实际文件对象比StringIO和cStringIO慢吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

StringIO在其代码中具有以下注释:

StringIO has the following notes in its code:

Notes:
- Using a real file is often faster (but less convenient).
- There's also a much faster implementation in C, called cStringIO, but
  it's not subclassable.

真实文件通常更快"这一行在我看来真的很奇怪:写磁盘如何击败写存储器?我尝试分析这些不同的情况,并得出与这些文档相矛盾的结果,以及对另一个问题解释为什么在某些情况下cStringIO变慢,尽管我在这里没有做任何串联.该测试将给定数量的数据写入文件,然后从头开始查找并读回.在新"测试中,我每次都创建一个新对象,在相同"测试中,我截断并重复使用同一对象以进行每次重复,以排除开销的来源.对于使用数据量较小但不是较大的临时文件,这一开销很重要.

The "real file is often faster" line seemed really odd to me: how could writing to disk beat writing to memory? I tried profiling these different cases and got results that contradict these docs, as well as the answer to this question. This other question does explain why cStringIO is slower under some circumstances, though I'm not doing any concatenating here. The test writes a given amount of data to a file, then seeks to the beginning and reads it back out. On the "new" tests, I created a new object each time, and on the "same" ones I truncate and reuse the same object for each repetition to rule out that source of overhead. That overhead mattered for using tempfiles with small data sizes but not large ones.

代码是此处.

Using 1000 passes with size 1.0KiB
New StringIO:   0.0026 0.0025 0.0034
Same StringIO:  0.0026 0.0023 0.0030
New cStringIO:  0.0009 0.0010 0.0008
Same cStringIO: 0.0009 0.0009 0.0009
New tempfile:   0.0679 0.0554 0.0542
Same tempfile:  0.0069 0.0064 0.0070
==============================================================
Using 1000 passes with size 100.0KiB
New StringIO:   0.0093 0.0099 0.0108
Same StringIO:  0.0109 0.0090 0.0086
New cStringIO:  0.0130 0.0139 0.0120
Same cStringIO: 0.0118 0.0115 0.0124
New tempfile:   0.1006 0.0905 0.0899
Same tempfile:  0.0573 0.0526 0.0523
==============================================================
Using 1000 passes with size 1.0MiB
New StringIO:   0.0727 0.0700 0.0717
Same StringIO:  0.0740 0.0735 0.0712
New cStringIO:  0.1484 0.1399 0.1470
Same cStringIO: 0.1493 0.1393 0.1465
New tempfile:   0.6576 0.6750 0.6821
Same tempfile:  0.5951 0.5870 0.5678
==============================================================
Using 1000 passes with size 10.0MiB
New StringIO:   1.0965 1.1129 1.1079
Same StringIO:  1.1206 1.2979 1.1932
New cStringIO:  2.2532 2.2162 2.2482
Same cStringIO: 2.2624 2.2225 2.2377
New tempfile:   6.8350 6.7924 6.8481
Same tempfile:  6.8424 7.8114 7.8404
==============================================================

这两个StringIO实现相当可比,尽管cStringIO对于大数据量明显降低了速度.但是tempfile.TemporaryFile的时间总是最慢的StringIO的3倍.

The two StringIO implementations were pretty comparable, though cStringIO slowed down significantly for large data sizes. But the tempfile.TemporaryFile always took 3 times as long as the slowest StringIO.

推荐答案

这一切都取决于常"的含义. StringIO是通过将您的写操作保留在一个列表中,然后在读取时将该列表连接到字符串来实现的.您的测试用例-一系列的写入操作,然后是读取操作-是其最佳方案.如果我调整测试用例以在文件中进行50次随机写入/读取,则cStringIO往往会以第二名赢得文件系统.

It all depends on what "often" means. StringIO is implemented by keeping your writes in a list and then joining the list to a string on read. Your test case - a series of writes followed by a read - is its best scenario. If I tweak the test case to do 50 random writes/reads in the file, then cStringIO tends to win with the file system in second place.

此注释似乎反映了系统程序员对让c库和操作系统执行文件系统操作的偏见,因为从一般意义上很难猜测在所有条件下性能最佳的情况.

The comment seems to reflect a system programmer's bias to let the c libraries plus operating system do file system things because its hard to guess in a general sense what performs best under all conditions.

def write_and_read_test_data(flo):
    fsize = len(closure['test_data'])
    flo.write(closure['test_data'])
    for _ in range(50):
        flo.seek(random.randint(0, fsize-1))
        flo.write('x')
        flo.read(1)
    flo.seek(0)
    closure['output'] = flo.read()

10meg测试用例花费的时间超过了我的注意力范围...

The 10meg test case took longer than my attention span...

Using 1000 passes with size 1.0KiB
New StringIO:   0.9551 0.9467 0.9366
Same StringIO:  0.9252 0.9228 0.9207
New cStringIO:  0.3274 0.3280 0.3251
Same cStringIO: 0.3182 0.3231 0.3280
New tempfile:   1.1833 1.1853 1.1650
Same tempfile:  0.9563 0.9414 0.9504
==============================================================
Using 1000 passes with size 100.0KiB
New StringIO:   5.6253 5.6589 5.6025
Same StringIO:  5.5799 5.5608 5.5589
New cStringIO:  0.4157 0.4133 0.4140
Same cStringIO: 0.4078 0.4076 0.4088
New tempfile:   2.0420 2.0391 2.0408
Same tempfile:  1.5722 1.5749 1.5693
==============================================================
Using 1000 passes with size 1.0MiB
New StringIO:   105.2350 106.3904 107.5411
Same StringIO:  108.3744 109.4510 105.6012
New cStringIO:  2.4698 2.4781 2.4165
Same cStringIO: 2.4699 2.4600 2.4451
New tempfile:   6.6086 6.5783 6.5916
Same tempfile:  6.1420 6.1614 6.1366

这篇关于实际文件对象比StringIO和cStringIO慢吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆