加速写入文件 [英] Speed up writing to files

查看：20 发布时间：2021/12/29 12:37:46 python performance file-io

本文介绍了加速写入文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我分析了一些我用 cProfile 继承的遗留代码.我已经做了很多有帮助的改变(比如使用 simplejson 的 C 扩展！).

I've profiled some legacy code I've inherited with cProfile. There were a bunch of changes I've already made that have helped (like using simplejson's C extensions!).

基本上，此脚本将数据从一个系统导出到 ASCII 固定宽度文件.每一行都是一条记录，它有很多值.每行有 7158 个字符，包含大量空格.总记录数为 150 万条记录.每行一次生成一个，需要一段时间(每秒 5-10 行).

Basically this script is exporting data from one system to an ASCII fixed-width file. Each row is a record, and it has many values. Each line is 7158 characters and contains a ton of spaces. The total record count is 1.5 million records. Each row is generated one at a time, and takes a while (5-10 rows a second).

随着每一行的生成，它会尽可能简单地写入磁盘.分析表明，大约 19-20% 的总时间花在 file.write() 上.对于 1,500 行的测试用例，即 20 秒.我想减少这个数字.

As each row is generated it's written to disk as simply as possible. The profiling indicates that about 19-20% of the total time is spent in file.write(). For a test case of 1,500 rows that's 20 seconds. I'd like to reduce that number.

现在看来下一个胜利将是减少写入磁盘的时间.如果可能的话，我想减少它.我可以在内存中保存一个记录缓存，但我不能等到最后一次性全部转储.

Now it seems the next win will be reducing the amount of time spent writing to disk. I'd like to reduce it, if possible. I can keep a cache of records in memory, but I can't wait until the end and dump it all at once.

fd = open(data_file, 'w')
for c, (recordid, values) in enumerate(generatevalues()):
        row = prep_row(recordid, values)
        fd.write(row)
        if c % 117 == 0:
                if limit > 0 and c >= limit:
                        break
                sys.stdout.write('
%s @ %s' % (str(c + 1).rjust(7), datetime.now()))
                sys.stdout.flush()

我的第一个想法是将记录缓存在列表中并分批写出.那会更快吗?类似的东西:

My first thought would be to keep a cache of records in a list and write them out in batches. Would that be faster? Something like:

rows = []
for c, (recordid, values) in enumerate(generatevalues()):
        rows.append(prep_row(recordid, values))
        if c % 117 == 0:
            fd.write('
'.join(rows))
            rows = []

我的第二个想法是使用另一个线程，但这让我想死在里面.

My second thought would be to use another thread, but that makes me want to die inside.

加速写入文件 [英] Speed up writing to files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

加速写入文件 [英] Speed up writing to files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭