Python多处理> = 125列表永远不会完成 [英] Python multiprocessing >= 125 list never finishes
问题描述
我正在出于自己的目的尝试实现此多处理教程.起初,我认为它的伸缩性不好,但是当我举一个可复制的示例时,我发现如果项目列表超过124,则似乎永远不会返回答案.在x = 124
时,它的运行时间为.4秒,但是当我将其设置为x = 125
时,它永远不会完成.我在Windows 7上运行Python 2.7.
I am trying to implement this multiprocessing tutorial for my own purposes. At first I thought it did not scale well, but when I made a reproducible example I found that if the list of items goes above 124, it seems to never return an answer. At x = 124
it runs in .4 seconds, but when I set it to x = 125
it never finishes. I am running Python 2.7 on Windows 7.
from multiprocessing import Lock, Process, Queue, current_process
import time
class Testclass(object):
def __init__(self, x):
self.x = x
def toyfunction(testclass):
testclass.product = testclass.x * testclass.x
return testclass
def worker(work_queue, done_queue):
try:
for testclass in iter(work_queue.get, 'STOP'):
print(testclass.counter)
newtestclass = toyfunction(testclass)
done_queue.put(newtestclass)
except:
print('error')
return True
def main(x):
counter = 1
database = []
while counter <= x:
database.append(Testclass(10))
counter += 1
print(counter)
workers = 8
work_queue = Queue()
done_queue = Queue()
processes = []
start = time.clock()
counter = 1
for testclass in database:
testclass.counter = counter
work_queue.put(testclass)
counter += 1
print(counter)
print('items loaded')
for w in range(workers):
p = Process(target=worker, args=(work_queue, done_queue))
p.start()
processes.append(p)
work_queue.put('STOP')
for p in processes:
p.join()
done_queue.put('STOP')
newdatabase = []
for testclass in iter(done_queue.get, 'STOP'):
newdatabase.append(testclass)
print(time.clock()-start)
print("Done")
return(newdatabase)
if __name__ == '__main__':
database = main(124)
database2 = main(125)
推荐答案
好!来自文档:
警告如上所述,如果子进程已将项目放入队列中(但尚未 使用JoinableQueue.cancel_join_thread),则该过程直到所有 缓冲的项目已被冲洗到管道中. 这意味着,如果您尝试加入该过程,除非您确定,否则可能会陷入僵局. 放在队列中的所有项目都已消耗完.同样,如果 子进程是非守护进程,则父进程在尝试时可能会在退出时挂起 加入其所有非守护神的孩子.请注意,使用管理器创建的队列不会 没有这个问题.请参阅编程指南.
Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe. This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children. Note that a queue created using a manager does not have this issue. See Programming guidelines.
正如我在前面的评论中指出的那样,代码尝试.join()
在done_queue
队列被清空之前之前进行处理-并且在以一种时髦的方式更改代码后确保.join()
之前就被清空了,该代码可以很好地处理一百万个项目.
As I noted in a comment earlier, the code attempts to .join()
processes before the done_queue
Queue is drained - and that after changing the code in a funky way to be sure done_queue
was drained before .join()
'ing, the code worked fine for a million items.
因此,这是一个飞行员错误的案例,尽管很模糊.至于为什么行为取决于传递给main(x)
的数字,这是不可预测的:它取决于内部缓冲的方式.真有趣;-)
So this is a case of pilot error, although quite obscure. As to why behavior depends on the number passed to main(x)
, it's unpredictable: it depends on how buffering is done internally. Such fun ;-)
这篇关于Python多处理> = 125列表永远不会完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!