Python,并行运行命令行工具 [英] Python, running command line tools in parallel

查看:411
本文介绍了Python,并行运行命令行工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Python作为一个脚本语言来做一些数据处理和调用命令行工具的数字处理。我希望并行运行命令行工具,因为它们是彼此独立的。当一个命令行工具完成后,我可以从输出文件收集其结果。所以我还需要一些同步机制来通知我的主要Python程序,一个任务完成,以便结果可以解析到我的主程序。

I am using Python as a script language to do some data processing and call command-line tools for number crunching. I wish to run command-line tools in parallel since they are independent with each other. When one command-line tool is finished, I can collect its results from the output file. So I also need some synchronization mechanism to notify my main Python program that one task is finished so that the result could be parsed into my main program.

目前,我使用 os.system(),它对单线程正常工作,但不能并行。

Currently, I use os.system(), which works fine for one-thread, but cannot be parallelized.

谢谢!

推荐答案

使用 multiprocessing 模块中创建池对象。然后,您可以使用 Pool.map()做并行处理。一个例子是我的markphotos脚本(见下文),其中一个函数被调用多次并行到每个进程的图片。

Use the Pool object from the multiprocessing module. You can then use e.g. Pool.map() to do parallel processing. An example would be my markphotos script (see below), where a function is called multiple times in parallel to each process a picture.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Adds my copyright notice to photos.
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2012-10-28 17:00:24 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to markphotos.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

import sys
import subprocess
from multiprocessing import Pool, Lock
from os import utime, devnull
import os.path
from time import mktime

globallock = Lock() 

def processfile(name):
    """Adds copyright notice to the file.

    Arguments:
    name -- file to modify
    """
    args = ['exiftool', '-CreateDate', name]
    createdate = subprocess.check_output(args)
    fields = createdate.split(":") #pylint: disable=E1103
    year = int(fields[1])
    cr = "R.F. Smith <rsmith@xs4all.nl> http://rsmith.home.xs4all.nl/"
    cmt = "Copyright © {} {}".format(year, cr)
    args = ['exiftool', '-Copyright="Copyright (C) {} {}"'.format(year, cr),
            '-Comment="{}"'.format(cmt), '-overwrite_original', '-q', name]
    rv = subprocess.call(args)
    modtime = int(mktime((year, int(fields[2]), int(fields[3][:2]),
                          int(fields[3][3:]), int(fields[4]), int(fields[5]),
                          0,0,-1)))
    utime(name, (modtime, modtime))
    globallock.acquire()
    if rv == 0:
        print "File '{}' processed.".format(name)
    else:
        print "Error when processing file '{}'".format(name)
    globallock.release()

def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- list of commands to pass to subprocess.call.
    """
    if isinstance(args, str):
        args = args.split()
    try:
        with open(devnull, 'w') as f:
            subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
    except:
        print "Required program '{}' not found! exiting.".format(args[0])
        sys.exit(1)

def main(argv):
    """Main program.

    Arguments:
    argv -- command line arguments
    """
    if len(argv) == 1:
        binary = os.path.basename(argv[0])
        print "Usage: {} [file ...]".format(binary)
        sys.exit(0)
    checkfor(['exiftool',  '-ver'])
    p = Pool()
    p.map(processfile, argv[1:])
    p.close()

if __name__ == '__main__':
    main(sys.argv)

这篇关于Python,并行运行命令行工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆