Python,并行运行命令行工具 [英] Python, running command line tools in parallel
问题描述
我使用Python作为一个脚本语言来做一些数据处理和调用命令行工具的数字处理。我希望并行运行命令行工具,因为它们是彼此独立的。当一个命令行工具完成后,我可以从输出文件收集其结果。所以我还需要一些同步机制来通知我的主要Python程序,一个任务完成,以便结果可以解析到我的主程序。
I am using Python as a script language to do some data processing and call command-line tools for number crunching. I wish to run command-line tools in parallel since they are independent with each other. When one command-line tool is finished, I can collect its results from the output file. So I also need some synchronization mechanism to notify my main Python program that one task is finished so that the result could be parsed into my main program.
目前,我使用 os.system()
,它对单线程正常工作,但不能并行。
Currently, I use os.system()
, which works fine for one-thread, but cannot be parallelized.
谢谢!
推荐答案
使用从
对象。然后,您可以使用 multiprocessing
模块中创建池 Pool.map()
做并行处理。一个例子是我的markphotos脚本(见下文),其中一个函数被调用多次并行到每个进程的图片。
Use the Pool
object from the multiprocessing
module. You can then use e.g. Pool.map()
to do parallel processing. An example would be my markphotos script (see below), where a function is called multiple times in parallel to each process a picture.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Adds my copyright notice to photos.
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2012-10-28 17:00:24 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to markphotos.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/
import sys
import subprocess
from multiprocessing import Pool, Lock
from os import utime, devnull
import os.path
from time import mktime
globallock = Lock()
def processfile(name):
"""Adds copyright notice to the file.
Arguments:
name -- file to modify
"""
args = ['exiftool', '-CreateDate', name]
createdate = subprocess.check_output(args)
fields = createdate.split(":") #pylint: disable=E1103
year = int(fields[1])
cr = "R.F. Smith <rsmith@xs4all.nl> http://rsmith.home.xs4all.nl/"
cmt = "Copyright © {} {}".format(year, cr)
args = ['exiftool', '-Copyright="Copyright (C) {} {}"'.format(year, cr),
'-Comment="{}"'.format(cmt), '-overwrite_original', '-q', name]
rv = subprocess.call(args)
modtime = int(mktime((year, int(fields[2]), int(fields[3][:2]),
int(fields[3][3:]), int(fields[4]), int(fields[5]),
0,0,-1)))
utime(name, (modtime, modtime))
globallock.acquire()
if rv == 0:
print "File '{}' processed.".format(name)
else:
print "Error when processing file '{}'".format(name)
globallock.release()
def checkfor(args):
"""Make sure that a program necessary for using this script is
available.
Arguments:
args -- list of commands to pass to subprocess.call.
"""
if isinstance(args, str):
args = args.split()
try:
with open(devnull, 'w') as f:
subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
except:
print "Required program '{}' not found! exiting.".format(args[0])
sys.exit(1)
def main(argv):
"""Main program.
Arguments:
argv -- command line arguments
"""
if len(argv) == 1:
binary = os.path.basename(argv[0])
print "Usage: {} [file ...]".format(binary)
sys.exit(0)
checkfor(['exiftool', '-ver'])
p = Pool()
p.map(processfile, argv[1:])
p.close()
if __name__ == '__main__':
main(sys.argv)
这篇关于Python,并行运行命令行工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!