如何在python中启动多个作业并与主作业进行通信 [英] how to start multiple jobs in python and communicate with the main job
问题描述
我是python多线程/多处理的新手用户,所以请多多包涵. 我想解决以下问题,在这方面我需要一些帮助/建议. 让我简单地描述一下:
I am a novice user of python multithreading/multiprocessing, so please bear with me. I would like to solve the following problem and I need some help/suggestions in this regard. Let me describe in brief:
-
我想启动一个python脚本,该脚本在 顺序开始.
I would like to start a python script which does something in the beginning sequentially.
顺序部分结束后,我想开始一些工作 并行.
After the sequential part is over, I would like to start some jobs in parallel.
- 假设我要启动四个并行作业.
- 我还想在其他计算机上使用计算集群上的"lsf"启动这些作业.我的初始脚本也在"lsf"上运行 机器.
- 我在四台计算机上启动的四个作业将执行两个逻辑步骤A和B-一个接一个.
- 最初开始工作时,他们从逻辑步骤A开始并完成它.
- 每个作业(4个作业)完成步骤A后;他们应该通知开始这些工作的第一份工作.换句话说,开始的主要工作是等待这四个工作的确认.
- 一旦主要工作收到这四个工作的确认;它应该通知所有四个作业执行逻辑步骤B.
- 逻辑步骤B将在完成任务后自动终止作业.
- 主要任务正在等待所有任务完成,稍后应继续执行顺序部分.
- Assume that there are four parallel jobs I want to start.
- I would like to also start these jobs on some other machines using "lsf" on the computing cluster.My initial script is also running on a " lsf" machine.
- The four jobs which I started on four machines will perform two logical steps A and B---one after the other.
- When a job started initially, they start with logical step A and finish it.
- After every job (4jobs) has finished the Step A; they should notify the first job which started these. In other words, the main job which started is waiting for the confirmation from these four jobs.
- Once the main job receives confirmation from these four jobs; it should notify all the four jobs to do the logical step B.
- Logical step B will automatically terminate the jobs after finishing the task.
- Main job is waiting for the all jobs to finish and later on it should continue with the sequential part.
一个示例场景是:
- 在集群中的"lsf"计算机上运行的Python脚本在四台"lsf"计算机上启动了四个"tcl shell".
- 在每个tcl shell中,都有一个脚本来执行逻辑步骤A.
- 完成步骤A后,他们应该以某种方式通知正在等待确认的python脚本.
- 一旦从所有四个接收到确认,python脚本就会通知他们执行逻辑步骤B.
- 逻辑步骤B也是一个脚本,该脚本源自其tcl外壳程序;该脚本还将在最后关闭tcl shell.
- 同时,python脚本正在等待所有四个作业完成.
- 完成所有四个工作之后;它应该再次从顺序部分继续,并在稍后完成.
这是我的问题:
-
我很困惑-我应该使用多线程/多处理.哪个更合适? 其实这两者有什么区别?我读到了这些内容,但无法得出结论.
I am confused about---should I use multithreading/multiprocessing. Which one suits better? In fact what is the difference between these two? I read about these but I wasn't able to conclude.
什么是python GIL?我还在任何时间点的某个地方读过,只有一个线程会执行. 我在这里需要一些解释.它给我的印象是我不能使用线程.
What is python GIL? I also read somewhere at any one point in time only one thread will execute. I need some explanation here. It gives me an impression that I can't use threads.
关于如何以更Python方式系统地解决我的问题的任何建议. 我正在寻找一些逐步的口头解释,以及在每个步骤中都需要阅读的一些提示. 明确概念后,我想自己编写代码.
Any suggestions on how could I solve my problem systematically and in a more pythonic way. I am looking for some verbal step by step explanation and some pointers to read on each step. Once the concepts are clear, I would like to code it myself.
谢谢.
推荐答案
除了roganjosh的回答外,我还将包括一些在A完成后开始步骤B的信号:
In addition to roganjosh's answer, I would include some signaling to start the step B after A has finished:
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue, proceed):
print "Process {} has started been created".format(process_number)
print "Process {} has ended step A".format(process_number)
sys.stdout.flush()
queue.put((process_number, "done"))
proceed.wait() #wait for the signal to do the second part
print "Process {} has ended step B".format(process_number)
sys.stdout.flush()
def multiproc_master():
queue = mp.Queue()
proceed = mp.Event()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in range(4)]
for p in processes:
p.start()
#block = True waits until there is something available
results = [queue.get(block=True) for p in processes]
proceed.set() #set continue-flag
for p in processes: #wait for all to finish (also in windows)
p.join()
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs
这篇关于如何在python中启动多个作业并与主作业进行通信的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!