子进程完成但仍然不会终止,导致死锁 [英] Subprocess completes but still doesn't terminate, causing deadlock

查看:153
本文介绍了子进程完成但仍然不会终止,导致死锁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,因为目前没有答案,我觉得这样做不太好。
虽然我仍然对幕后实际发生的事情感兴趣,导致这个问题,但我最迫切的问题是更新2中指定的问题。那些是



JoinableQueue Manager()之间有什么区别Queue()(何时应该使用一个在另一个?)。重要的是,在这个例子中替换一个是安全的吗?






在下面的代码中,我有一个简单的流程池。每个进程都通过进程队列( pq )来提取要处理的数据和一个返回值队列( rq )将处理的返回值传递回主线程。如果我不附加到返回值队列,它可以工作,但是一旦我做,由于某些原因,进程被阻止停止。在这两种情况下,进程运行方法返回,所以在返回队列阻止时不是 put ,而是在第二个情况下,进程本身不会终止,因此程序在进程的 join 时会死锁。为什么会这样?



更新:


  1. 似乎有一些与队列中的项目数量有关。


    在我的机器上,至少可以有6570项目在队列中,它实际上是有效的,但是超出这个并且它死锁。


  2. 它似乎与 Manager()。Queue()

    是否限制 JoinableQueue 或只是我误解两个对象之间的差异,我发现如果我用一个 Manager()。Queue()替换返回队列,它的工作原理。他们之间有什么区别,何时应该使用一个?


  3. 如果我从 rq
    Oop。这里有一个答案,正如我在评论的那样,它消失了。无论如何,它说的一件事情是质疑是否,如果我添加消费者这个错误仍然发生。我已经尝试了这个,答案是,不,它不是。



    另一件它提到的是从多处理文档作为问题的一个可能的关键。参考 JoinableQueue ,它说:


    ...使用的信号量计算未完成任务的数量可能
    最终溢出引发异常。








  import multiprocessing 

class _ProcSTOP:
pass

class Proc(multiprocessing.Process):

def __init __(self,pq,rq):
self._pq = pq
self._rq = rq
super() $ _ $($)
print('++',self.name)

def run(self):
dat = self._pq.get()

而不是dat是_ProcSTOP:
#self._rq.put(dat)#取消注释我的死锁
self._pq.task_done()
dat = self._pq.get ()

self._pq.task_done()
print('==',self.name)

def __del __(self):
print(' - ',self.name)

如果__nam e__ =='__main__':

pq = multiprocessing.JoinableQueue()
rq = multiprocessing.JoinableQueue()
pool = []

我在范围(4):
p = Proc(pq,rq)
p.start()
pool.append(p)

for i in range 10000):
pq.put(i)

pq.join()

对于范围(4)中的i:
pq.put _ProcSTOP)

pq.join()

while len(pool)> 0:
print('??',pool)
pool.pop()。join()#hangs here(如果使用rq)

print('** complete ')






示例输出,不使用返回队列:

  ++ Proc-1 
++ Proc-2
++ Proc-3
++ Proc-4
== Proc-4
== Proc-3
== Proc-1
?? [Proc(Proc-1,started)>< Proc(Proc-2,started)>< Proc(Proc-3,started)>< Proc(Proc-4, >]
== Proc-2
?? [Proc(Proc-1,stopped)>< Proc(Proc-2,started)>< Proc(Proc-3,stopped)>]
- Proc-3
? [< Proc(Proc-1,stopped)>< Proc(Proc-2,started)>]
- Proc-2
?? [< Proc(Proc-1,stopped)>]
- Proc-1
**完成
- Proc-4






使用返回队列输出样本:

  ++ Proc-1 
++ Proc-2
++ Proc-3
++ Proc-4
== Proc- 2
== Proc-4
== Proc-1
?? [Proc(Proc-1,started)>< Proc(Proc-2,started)>< Proc(Proc-3,started)>< Proc(Proc-4, >]
== Proc-3
#这里挂起


解决方案

文档


警告



如上所述,如果子进程已将项目一个队列(并且没有使用JoinableQueue.cancel_join_thread()),那么该进程将不会终止,直到所有缓冲的项目都被刷新到管道。



这意味着如果您尝试加入该过程,您可能会遇到死锁,除非您确定已将所有已放入队列的项目都已被使用。类似地,如果子进程是非守护进程的,那么当进程尝试加入其所有非守护程序子进程时,父进程可能会退出。



请注意,队列已创建使用经理没有这个问题。请参阅编程指南。


所以JoinableQueue()使用一个管道,并等待它在关闭之前刷新所有数据。 >

另一方面,Manager.Queue()对象使用完全不同的方法。
管理员正在运行一个单独的进程,可立即接收所有数据(并将其存储在其内存中)。


管理员提供一种方式创建可以在不同进程之间共享的数据。管理对象控制管理共享对象的服务器进程。其他进程可以使用代理访问共享对象。



...



队列([maxsize])
创建一个共享的Queue.Queue对象并返回一个代理。



Ok, since there are currently no answer's I don't feel too bad doing this. While I'm still interested in what is actually happening behind the scenes to cause this problem, my most urgent questions are those specified in update 2. Those being,

What are the differences between a JoinableQueue and a Manager().Queue() (and when should you use one over the other?). And importantly, is it safe to replace one for the other, in this example?


In the following code, I have a simple process pool. Each process is passed the process queue (pq) to pull data to be processed from, and a return-value queue (rq) to pass the returned values of the processing back to the main thread. If I don't append to the return-value queue it works, but as soon as I do, for some reason the processes are blocked from stopping. In both cases the processes run methods return, so it's not put on the return-queue blocking, but in the second case the processes themselves do not terminate, so the program deadlocks when I join on the processes. Why would this be?

Updates:

  1. It seems to have something to with the number of items in the queue.

    On my machine at least, I can have up to 6570 items in the queue and it actually works, but any more than this and it deadlocks.

  2. It seems to work with Manager().Queue().

    Whether it's a limitation of JoinableQueue or just me misunderstanding the differences between the two objects, I've found that if I replace the return queue with a Manager().Queue(), it works as expected. What are the differences between them, and when should you use one over the other?

  3. The error does not occur if I'm consuming from rq

    Oop. There was an answer here for a moment, and as I was commenting on it, it disappeared. Anyway one of the things it said was questioning whether, if I add a consumer this error still occurs. I have tried this, and the answer is, no it doesn't.

    The other thing it mentioned was this quote from the multiprocessing docs as a possible key to the problem. Referring to JoinableQueue's, it says:

    ... the semaphore used to count the number of unfinished tasks may eventually overflow raising an exception.


import multiprocessing

class _ProcSTOP:
    pass

class Proc(multiprocessing.Process):

    def __init__(self, pq, rq):
        self._pq = pq
        self._rq = rq
        super().__init__()
        print('++', self.name)

    def run(self):
        dat = self._pq.get()

        while not dat is _ProcSTOP:
#            self._rq.put(dat)        # uncomment me for deadlock
            self._pq.task_done()
            dat = self._pq.get()

        self._pq.task_done() 
        print('==', self.name)

    def __del__(self):
        print('--', self.name)

if __name__ == '__main__':

    pq = multiprocessing.JoinableQueue()
    rq = multiprocessing.JoinableQueue()
    pool = []

    for i in range(4):
        p = Proc(pq, rq) 
        p.start()
        pool.append(p)

    for i in range(10000):
        pq.put(i)

    pq.join()

    for i in range(4):
       pq.put(_ProcSTOP)

    pq.join()

    while len(pool) > 0:
        print('??', pool)
        pool.pop().join()    # hangs here (if using rq)

    print('** complete')


Sample output, not using return-queue:

++ Proc-1
++ Proc-2
++ Proc-3
++ Proc-4
== Proc-4
== Proc-3
== Proc-1
?? [<Proc(Proc-1, started)>, <Proc(Proc-2, started)>, <Proc(Proc-3, started)>, <Proc(Proc-4, started)>]
== Proc-2
?? [<Proc(Proc-1, stopped)>, <Proc(Proc-2, started)>, <Proc(Proc-3, stopped)>]
-- Proc-3
?? [<Proc(Proc-1, stopped)>, <Proc(Proc-2, started)>]
-- Proc-2
?? [<Proc(Proc-1, stopped)>]
-- Proc-1
** complete
-- Proc-4


Sample output, using return queue:

++ Proc-1
++ Proc-2
++ Proc-3
++ Proc-4
== Proc-2
== Proc-4
== Proc-1
?? [<Proc(Proc-1, started)>, <Proc(Proc-2, started)>, <Proc(Proc-3, started)>, <Proc(Proc-4, started)>]
== Proc-3
# here it hangs

解决方案

From the documentation:

Warning

As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

Note that a queue created using a manager does not have this issue. See Programming guidelines.

So the JoinableQueue() uses a pipe and will wait until it can flush all data before closing.

On the other hand a Manager.Queue() object uses a completely different approach. Managers are running a separate process that receive all data immediately (and store it in its memory).

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

...

Queue([maxsize]) Create a shared Queue.Queue object and return a proxy for it.

这篇关于子进程完成但仍然不会终止,导致死锁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆