Python:并行运行子进程 [英] Python: running subprocess in parallel

查看:34
本文介绍了Python:并行运行子进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码将 md5sums 写入日志文件

I have the following code that writes the md5sums to a logfile

for file in files_output:
    p=subprocess.Popen(['md5sum',file],stdout=logfile)
p.wait()

  1. 这些会被并行写入吗?即如果 md5sum 需要很长时间来处理其中一个文件,是否会在等待前一个文件完成之前启动另一个文件?

  1. Will these be written in parallel? i.e. if md5sum takes a long time for one of the files, will another one be started before waiting for a previous one to complete?

如果上面的答案是肯定的,我是否可以假设写入日志文件的 md5sum 的顺序可能会因每个文件的 md5sum 需要多长时间而有所不同?(有些文件可能很大,有些文件很小)

If the answer to the above is yes, can I assume the order of the md5sums written to logfile may differ based upon how long md5sum takes for each file? (some files can be huge, some small)

推荐答案

所有子进程并行运行.(为了避免这种情况,必须显式等待它们的完成.)它们甚至可以同时写入日志文件,从而导致输出乱码.为避免这种情况,您应该让每个进程写入不同的日志文件,并在所有进程完成后收集所有输出.

All sub processes are run in parallel. (To avoid this one has to wait explicitly for their completion.) They even can write into the log file at the same time, thus garbling the output. To avoid this you should let each process write into a different logfile and collect all outputs when all processes are finished.

q = Queue.Queue()
result = {}  # used to store the results
for fileName in fileNames:
  q.put(fileName)

def worker():
  while True:
    fileName = q.get()
    if fileName is None:  # EOF?
      return
    subprocess_stuff_using(fileName)
    wait_for_finishing_subprocess()
    checksum = collect_md5_result_for(fileName)
    result[fileName] = checksum  # store it

threads = [ threading.Thread(target=worker) for _i in range(20) ]
for thread in threads:
  thread.start()
  q.put(None)  # one EOF marker for each thread

在此之后,结果应该存储在result中.

After this the results should be stored in result.

这篇关于Python:并行运行子进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆