使用多处理模块的脚本不会终止 [英] Script using multiprocessing module does not terminate

查看:52
本文介绍了使用多处理模块的脚本不会终止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码,不打印"here".问题是什么? 我在两台计算机(Windows 7,Ubuntu 12.10)和 http://www.compileonline.com/execute_python_online.php 并非在所有情况下都打印"here".

The following code, does not print "here". What is the problem? I tested it on both my machines (windows 7, Ubuntu 12.10), and http://www.compileonline.com/execute_python_online.php It does not print "here" in all cases.

from multiprocessing import Queue, Process


def runLang(que):
    print "start"
    myDict=dict()
    for i in xrange(10000):
        myDict[i]=i
    que.put(myDict)
    print "finish"


def run(fileToAnalyze):
    que=Queue()
    processList=[]
    dicList=[]
    langs= ["chi","eng"]
    for lang in langs:
        p=Process(target=runLang,args=(que,))
        processList.append(p)
        p.start()

    for p1 in processList:
        p1.join()

    print "here"

    for _ in xrange(len(langs)):
        item=que.get()
        print item
        dicList.append(item)

if __name__=="__main__":
    processList = []
    for fileToAnalyse in ["abc.txt","def.txt"]:
        p=Process(target=run,args=(fileToAnalyse,))
        processList.append(p)
        p.start()
    for p1 in processList:
        p1.join()

推荐答案

这是因为,当您将put大量项目放入multiprocessing.Queue中时,一旦底层Pipe已满,它们最终将被缓冲在内存中.在Queue的另一端开始读取内容之前,不会刷新缓冲区,这将允许Pipe接受更多数据.在所有Queue实例的缓冲区完全刷新到其基础Pipe之前,Process不能终止.这意味着如果您尝试join一个进程而没有另一个进程/线程在其Queue上调用get,则可能会死锁.这是文档中提到的 :

This is because when you put lots of items into a multiprocessing.Queue, they eventually get buffered in memory, once the underlying Pipe is full. The buffer won't get flushed until something starts reading from the other end of the Queue, which will allow the Pipe to accept more data. A Process cannot terminate until the buffer for all its Queue instances have been entirely flushed to their underlying Pipe. The implication of this is that if you try to join a process without having another process/thread calling get on its Queue, you could deadlock. This is mentioned in the docs:

警告

如上所述,如果子进程已将项目放入队列中(并且 它没有使用JoinableQueue.cancel_join_thread),那么该过程 直到所有缓冲的项目都已刷新到 管道.

As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.

这意味着,如果您尝试加入该过程,则可能会陷入僵局. 除非您确定所有已放入队列中的项目 已被消耗.同样,如果子进程是非守护进程 那么当父进程尝试加入其所有进程时,其父进程可能会在退出时挂起 非守护儿童.

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

请注意,使用管理器创建的队列不存在此问题.

Note that a queue created using a manager does not have this issue.

您可以通过在父级中清空Queue之前不调用join来解决此问题:

You can fix the issue by not calling join until after you empty the Queue in the parent:

for _ in xrange(len(langs)):
    item = que.get()
    print(item)
    dicList.append(item)

# join after emptying the queue.
for p in processList:
    p.join()

print("here")

这篇关于使用多处理模块的脚本不会终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆