始终并行运行恒定数量的子流程 [英] Always run a constant number of subprocesses in parallel
问题描述
我想使用子过程让一个书面脚本的20个实例并行运行.可以说,我有一个很大的URL列表,其中包含100.000个条目,并且我的程序应始终控制脚本的20个实例一直在该列表上工作.我想编写如下代码:
I want to use subprocesses to let 20 instances of a written script run parallel. Lets say i have a big list of urls with like 100.000 entries and my program should control that all the time 20 instances of my script are working on that list. I wanted to code it as follows:
urllist = [url1, url2, url3, .. , url100000]
i=0
while number_of_subproccesses < 20 and i<100000:
subprocess.Popen(['python', 'script.py', urllist[i]]
i = i+1
我的脚本只是将某些内容写到数据库或文本文件中.它不输出任何内容,不需要输入比url更多的信息.
My script just writes something into a database or textfile. It doesnt output anything and dont need more input than the url.
我的问题是我无法找到一些方法来获取活动的子流程的数量.我是一个新手程序员,因此欢迎您提供任何提示和建议.我还想知道一旦加载了20个子进程,而while循环再次检查条件,我将如何管理它?我想到可能要在其上再循环一遍,例如
My problem is i wasnt able to find something how to get the number of subprocesses that are active. Im a novice programmer so every hint and suggestion is welcome. I was also wondering how i can manage it once the 20 subprocesses are loaded that the while loop checks the conditions again? I thought of maybe putting another while loop over it, something like
while i<100000
while number_of_subproccesses < 20:
subprocess.Popen(['python', 'script.py', urllist[i]]
i = i+1
if number_of_subprocesses == 20:
sleep() # wait to some time until check again
或者也许有一种可能是,while循环总是在检查子进程的数量?
Or maybe theres a bette possibility that the while loop is always checking on the number of subprocesses?
我也考虑过使用模块多处理,但是我发现仅通过子处理调用script.py而不是使用多处理功能真的很方便.
I also considered using the module multiprocessing, but i found it really convenient to just call the script.py with subprocessing instead of a function with multiprocessing.
也许有人可以帮助我,并引导我朝正确的方向前进.非常感谢!
Maybe someone can help me and lead me into the right direction. Thanks Alot!
推荐答案
采用与上述方法不同的方法-似乎无法将回调作为参数发送:
Taking a different approach from the above - as it seems that the callback can't be sent as a parameter:
NextURLNo = 0
MaxProcesses = 20
MaxUrls = 100000 # Note this would be better to be len(urllist)
Processes = []
def StartNew():
""" Start a new subprocess if there is work to do """
global NextURLNo
global Processes
if NextURLNo < MaxUrls:
proc = subprocess.Popen(['python', 'script.py', urllist[NextURLNo], OnExit])
print ("Started to Process %s", urllist[NextURLNo])
NextURLNo += 1
Processes.append(proc)
def CheckRunning():
""" Check any running processes and start new ones if there are spare slots."""
global Processes
global NextURLNo
for p in range(len(Processes):0:-1): # Check the processes in reverse order
if Processes[p].poll() is not None: # If the process hasn't finished will return None
del Processes[p] # Remove from list - this is why we needed reverse order
while (len(Processes) < MaxProcesses) and (NextURLNo < MaxUrls): # More to do and some spare slots
StartNew()
if __name__ == "__main__":
CheckRunning() # This will start the max processes running
while (len(Processes) > 0): # Some thing still going on.
time.sleep(0.1) # You may wish to change the time for this
CheckRunning()
print ("Done!")
这篇关于始终并行运行恒定数量的子流程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!