始终并行运行恒定数量的子流程 [英] Always run a constant number of subprocesses in parallel

查看:101
本文介绍了始终并行运行恒定数量的子流程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用子过程让一个书面脚本的20个实例并行运行.可以说,我有一个很大的URL列表,其中包含100.000个条目,并且我的程序应始终控制脚本的20个实例一直在该列表上工作.我想编写如下代码:

I want to use subprocesses to let 20 instances of a written script run parallel. Lets say i have a big list of urls with like 100.000 entries and my program should control that all the time 20 instances of my script are working on that list. I wanted to code it as follows:

urllist = [url1, url2, url3, .. , url100000]
i=0
while number_of_subproccesses < 20 and i<100000:
    subprocess.Popen(['python', 'script.py', urllist[i]]
    i = i+1

我的脚本只是将某些内容写到数据库或文本文件中.它不输出任何内容,不需要输入比url更多的信息.

My script just writes something into a database or textfile. It doesnt output anything and dont need more input than the url.

我的问题是我无法找到一些方法来获取活动的子流程的数量.我是一个新手程序员,因此欢迎您提供任何提示和建议.我还想知道一旦加载了20个子进程,而while循环再次检查条件,我将如何管理它?我想到可能要在其上再循环一遍,例如

My problem is i wasnt able to find something how to get the number of subprocesses that are active. Im a novice programmer so every hint and suggestion is welcome. I was also wondering how i can manage it once the 20 subprocesses are loaded that the while loop checks the conditions again? I thought of maybe putting another while loop over it, something like

while i<100000
   while number_of_subproccesses < 20:
       subprocess.Popen(['python', 'script.py', urllist[i]]
       i = i+1
       if number_of_subprocesses == 20:
           sleep() # wait to some time until check again

或者也许有一种可能是,while循环总是在检查子进程的数量?

Or maybe theres a bette possibility that the while loop is always checking on the number of subprocesses?

我也考虑过使用模块多处理,但是我发现仅通过子处理调用script.py而不是使用多处理功能真的很方便.

I also considered using the module multiprocessing, but i found it really convenient to just call the script.py with subprocessing instead of a function with multiprocessing.

也许有人可以帮助我,并引导我朝正确的方向前进.非常感谢!

Maybe someone can help me and lead me into the right direction. Thanks Alot!

推荐答案

采用与上述方法不同的方法-似乎无法将回调作为参数发送:

Taking a different approach from the above - as it seems that the callback can't be sent as a parameter:

NextURLNo = 0
MaxProcesses = 20
MaxUrls = 100000  # Note this would be better to be len(urllist)
Processes = []

def StartNew():
   """ Start a new subprocess if there is work to do """
   global NextURLNo
   global Processes

   if NextURLNo < MaxUrls:
      proc = subprocess.Popen(['python', 'script.py', urllist[NextURLNo], OnExit])
      print ("Started to Process %s", urllist[NextURLNo])
      NextURLNo += 1
      Processes.append(proc)

def CheckRunning():
   """ Check any running processes and start new ones if there are spare slots."""
   global Processes
   global NextURLNo

   for p in range(len(Processes):0:-1): # Check the processes in reverse order
      if Processes[p].poll() is not None: # If the process hasn't finished will return None
         del Processes[p] # Remove from list - this is why we needed reverse order

   while (len(Processes) < MaxProcesses) and (NextURLNo < MaxUrls): # More to do and some spare slots
      StartNew()

if __name__ == "__main__":
   CheckRunning() # This will start the max processes running
   while (len(Processes) > 0): # Some thing still going on.
      time.sleep(0.1) # You may wish to change the time for this
      CheckRunning()

   print ("Done!")

这篇关于始终并行运行恒定数量的子流程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆