在子流程中使用多重处理 [英] using multiprocessing in a sub process

查看:62
本文介绍了在子流程中使用多重处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Windows中,必须先检查进程是否为主要,然后才能使用多处理,否则将出现无限循环.

In windows, there must be a check if the process is main before multiprocessing can be used, otherwise there will be an infinite loop.

我试图将进程的名称更改为子进程的名称,以便在我调用的类或函数中使用多重处理,但是没有运气.这有可能吗?到目前为止,除非使用主进程,否则我无法使用多进程.

I tried to change the name of the process to the name of the subprocess to use multiprocessing from within a class or function that I call, but no luck. Is this even possible? Up to date I failed to use multiprocessing, unless it was using the main process.

如果可能的话,有人可以提供一个示例,说明如何在上级进程正在调用的类或函数中使用多重处理吗?谢谢.

if it is possible, could someone provide a example on how to use multiprocessing within a class or function that is being called from a higher process? Thanks.

这里是一个示例-第一个示例有效,但是所有操作都在一个文件中完成: simplemtexample3.py:

Here is an Example - the first one works, but everything is done in 1 file: simplemtexample3.py:

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

l = []
for i in range(80):
    l.append(random.randint(1,8))

print mp_factorizer(l, 4)

但是,当我尝试从另一个文件调用mp_factorizer时,由于if __name__ == '__main__',它不起作用:

However, when I try to call mp_factorizer from another file, it does not work because of the if __name__ == '__main__':

simplemtexample.py

simplemtexample.py

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

startsimplemtexample.py

startsimplemtexample.py

import simplemtexample as smt
import random

l = []
for i in range(80):
    l.append(random.randint(1,8))

print smt.mp_factorizer(l, 4)

如果要使用多重处理,必须使用

推荐答案

if __name__ == '__main__'(至少在Windows中是强制性的).

if __name__ == '__main__'is mandatory(at least in windows), if one wants to use multiprocessing.

在Windows中,其工作方式如下:对于要生成的每个工作线程,Windows将自动启动主进程,并再次启动所有需要的文件.但是,只有已启动的第一个进程称为 main .这就是为什么使用if __name__ == '__main__'阻止mt_factorizer的执行会阻止多进程创建无限循环的原因.

In windows it works like this: For every worker thread that you want to generate, windows will automatically start the main process, and all needed files again. However, only the first process that has been started is called main. This is why blocking execution of mt_factorizer with if __name__ == '__main__' prevents multiprocessing from creating an infinite loop.

因此,从本质上讲,Windows需要读取包含该工作程序以及该工作程序调用的所有功能的文件-对于每个工作程序.通过阻止mt_factorizer,我们确保不会创建其他工作程序,而Windows仍可以执行工作程序.这就是为什么将所有代码都包含在一个文件中的多处理示例直接阻止创建工作程序(在这种情况下,就像mt_factorizer那样)(而不是阻止工作程序功能)的原因,因此Windows仍可以执行工作程序功能.如果所有代码都在一个文件中,并且整个文件都受到保护,则无法创建任何工作程序.

So essentially windows needs to read the file that contains the worker, and all functions the worker calls - for each worker. By blocking mt_factorizer we make sure that no additional workers will be created, while windows can still execute the workers. This is the reason why multiprocessing examples that have all code in one file block the creation of workers (like mt_factorizer does in this case) directly (but not the worker function), so windows can still execute the worker function. If all code is in one file, and the whole file is being protected, no worker could be created.

如果多处理代码位于另一个类中并被调用,则if __name__ == '__main__'需要直接在调用上方实现: mpteststart.py

If the multiprocessing code is located in another class and being called, if __name__ == '__main__' needs to be implemented directly above the call: mpteststart.py

import random
import mptest as smt

l = []
for i in range(4):
    l.append(random.randint(1,8))
print "Random numbers generated"
if __name__ == '__main__':
    print smt.mp_factorizer(l, 4)

mptest.py

mptest.py

import multiprocessing
import math

print "Reading mptest.py file"
def mp_factorizer(nums, nprocs):

    out_q = multiprocessing.Queue()
    chunksize = int(math.ceil(len(nums) / float(nprocs)))
    procs = []
    for i in range(nprocs):

        p = multiprocessing.Process(
                target=worker,            
                args=(nums[chunksize * i:chunksize * (i + 1)],
                      out_q))
        procs.append(p)
        p.start()

    # Collect all results into a single result dict. We know how many dicts
    # with results to expect.
    resultlist = []
    for i in range(nprocs):
        temp=out_q.get()
        index =0
        #print temp
        for i in temp:
            resultlist.append(temp[index][0][0:])
            index +=1

    # Wait for all worker processes to finish
    for p in procs:
        p.join()
        resultlist2 = [x for x in resultlist if x != []]
    return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

在上面的代码中,if __name__ == '__main__'已被删除,因为它已经在调用文件中.

In above code, if __name__ == '__main__' has been removed since it is already in the calling file.

但是,结果有些出乎意料:

However, the result is somewhat unexpected:

Reading mptest.py file
random numbers generated
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
['1', '1', '4', '1']

阻止了多处理的无限执行,但是其余代码仍被执行多次(在这种情况下,将生成随机数).这不仅会导致性能下降,还可能导致其他讨厌的错误.解决方案是,如果在某个地方使用了多处理,则可以防止整个主进程被Windows重复执行: mptest.py

Multiprocessing is being blocked from endless execution, but the rest of the code still is being executed several times(Random number generation in this case). This will not only result in a performance decrease, but also may lead to other nasty bugs. The solution is to protect the whole main process from being repeatedly executed by windows, if multiprocessing is being used somewhere down the line: mptest.py

import random
import mptest as smt

if __name__ == '__main__':  
    l = []
    for i in range(4):
        l.append(random.randint(1,8))
    print "random numbers generated"   
    print smt.mp_factorizer(l, 4)

现在我们得到的只是预期的结果,随机数仅生成一次:

Now all we get back is the desired result, the random numbers are only generated once:

Reading mptest.py file
random numbers generated
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
['1', '6', '2', '1']

请注意,在此示例中,mpteststart.py是主要过程.如果不是,则必须将if __name__ == '__main__'向上移动到调用链上,直到它进入主进程为止. 以这种方式保护主进程后,将不再有不必​​要的重复代码执行.

Note that in this example, mpteststart.py is the main process. If it is not, if __name__ == '__main__' has to be moved up the calling chain until it is in the main process. Once the main process is being protected that way, there will be no unwanted repeated code execution anymore.

这篇关于在子流程中使用多重处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆