python read() from stdout 比逐行读取慢得多(啜饮?) [英] python read() from stdout much slower than reading line by line (slurping?)

查看:66
本文介绍了python read() from stdout 比逐行读取慢得多(啜饮?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 python SubProcess 调用,它运行一个可执行文件并将输出通过管道传输到我的子进程标准输出.

I have a python SubProcess call that runs an executable and pipes the output to my subprocess stdout.

在 stdout 数据相对较小(~2k 行)的情况下,逐行读取和作为块读取 (stdout.read()) 之间的性能相当...... stdout.read() 略有不同更快.

In cases where the stdout data is relatively small (~2k lines), the performance between reading line by line and reading as a chunk (stdout.read()) is comparable...with stdout.read() being slightly faster.

一旦数据变得更大(比如 30k+ 行),逐行读取的性能就会明显提高.

Once the data gets to be larger (say 30k+ lines), the performance for reading line by line is significantly better.

这是我的比较脚本:

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
for line in (iter(proc.stdout.readline,b'')):
    tmp.append(line)
print("line by line = %.2f"%(time.clock()-tic))

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
fullFile=proc.stdout.read()
print("slurped = %.2f"%(time.clock()-tic))

这些是读取约 96k 行(或 50mb 磁盘内存)的结果:

And these are the results for a read of ~96k lines (or 50mb of on disk memory):

line by line = 5.48
slurped = 153.03

我不清楚为什么性能差异如此极端.我的期望是 read() 版本应该比逐行存储结果更快.当然,在实际情况下,我期待更快的逐行结果,其中在读取期间可以完成重要的每行处理.

I am unclear why the performance difference is so extreme. My expectation is that the read() version should be faster than storing the results line by line. Of course, I was expecting faster line by line results in practical case where there is significant per line processing that could be done during the read.

谁能让我深入了解 read() 的性能成本?

Can anyone give me insight into the read() performance cost?

推荐答案

这不仅仅是 Python,在没有缓冲的情况下通过字符读取总是比读入行或大块要慢.

This is not just Python, reading by chars without buffering is always slower then reading-in lines or big chunks.

考虑这两个简单的 C 程序:

Consider these two simple C programs:

[readchars.c]

[readchars.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        int c;
        c = fgetc(fh);
        while (c != EOF) {
                c = fgetc(fh);
        }

        return 0;
}

[readlines.c]

[readlines.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        char* s = (char*) malloc(120);
        s = fgets(s, 120, fh);
        while ((s != NULL) && !feof(fh)) {
                s = fgets(s, 120, fh);
        }

        free(s);

        return 0;
}

他们的结果(YMMW,largefile.txt 是 ~200MB 的文本文件):

And their results (YMMW, largefile.txt was ~200MB text file):

$ gcc readchars.c -o readchars
$ time ./readchars            
./readchars  1.32s user 0.03s system 99% cpu 1.350 total
$ gcc readlines.c -o readlines
$ time ./readlines            
./readlines  0.27s user 0.03s system 99% cpu 0.300 total

这篇关于python read() from stdout 比逐行读取慢得多(啜饮?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆