使用子过程非常大的输入和管道 [英] Very large input and piping using subprocess.Popen

查看:80
本文介绍了使用子过程非常大的输入和管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的问题.我有一个大文件,它需要经历三个步骤,一个使用外部程序的解码步骤,使用python进行的一些处理,然后使用另一个外部程序进行重新编码.我一直在使用subprocess.Popen()尝试在python中执行此操作,而不是形成unix管道.但是,所有数据都被缓冲到内存中.有没有执行此任务的pythonic方式,还是我最好还是回到一个简单的python脚本,该脚本从stdin读取并在两侧使用unix管道写入stdout?

I have pretty simple problem. I have a large file that goes through three steps, a decoding step using an external program, some processing in python, and then recoding using another external program. I have been using subprocess.Popen() to try to do this in python rather than forming unix pipes. However, all the data are buffered to memory. Is there a pythonic way of doing this task, or am I best dropping back to a simple python script that reads from stdin and writes to stdout with unix pipes on either side?

import os, sys, subprocess

def main(infile,reflist):
    print infile,reflist
    samtoolsin = subprocess.Popen(["samtools","view",infile],
                                  stdout=subprocess.PIPE,bufsize=1)
    samtoolsout = subprocess.Popen(["samtools","import",reflist,"-",
                                    infile+".tmp"],stdin=subprocess.PIPE,bufsize=1)
    for line in samtoolsin.stdout.read():
        if(line.startswith("@")):
            samtoolsout.stdin.write(line)
        else:
            linesplit = line.split("\t")
            if(linesplit[10]=="*"):
                linesplit[9]="*"
            samtoolsout.stdin.write("\t".join(linesplit))

推荐答案

Popen具有一个bufsize参数,该参数将限制内存中缓冲区的大小.如果根本不希望文件在内存中,则可以将文件对象作为stdinstdout参数传递.从子流程文档:

Popen has a bufsize parameter that will limit the size of the buffer in memory. If you don't want the files in memory at all, you can pass file objects as the stdin and stdout parameters. From the subprocess docs:

bufsize(如果给定的话)与内置open()函数的相应参数的含义相同:0表示未缓冲,1表示行缓冲,任何其他正值表示使用(大约)该大小的缓冲区.负bufsize表示使用系统默认值,通常表示已完全缓冲. bufsize的默认值为0(无缓冲).

bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negative bufsize means to use the system default, which usually means fully buffered. The default value for bufsize is 0 (unbuffered).

这篇关于使用子过程非常大的输入和管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆