为什么我的多进程Python脚本永无休止? [英] Why does my multiprocess Python script never end?

查看:110
本文介绍了为什么我的多进程Python脚本永无休止?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了一些多进程示例,主要是: http://toastdriven.com/blog/2008/nov/11/brief-introduction-multiprocessing/中,我使用了简单应用程序",该应用程序使用多进程来测试URL. 当我使用它(在Python 3.3中,在PyCharm IDE的Windows上)进行一些修改并带有许多URL时,我的脚本永远不会停止,而且我也不明白为什么.

import httplib2
import sys
from multiprocessing import Lock, Process, Queue, current_process

def worker(work_queue, done_queue):
    for url in iter(work_queue.get, 'STOP'):
        try:
            print("In : %s - %s." % (current_process().name, url))
            status_code = print_site_status(url)
            done_queue.put("%s - %s got %s." % (current_process().name, url, status_code))
        except:
            done_queue.put("%s failed on %s with: %s" % (current_process().name, url, str(sys.exc_info()[0])))
    print("Out : %s " % (current_process().name))
    return True

def print_site_status(url):
    http = httplib2.Http(timeout=10)
    headers, content = http.request(url)
    return headers.get('status', 'no response')

def main():
    workers = 8
    work_queue = Queue()
    done_queue = Queue()
    processes = []
    with open("Annu.txt") as f: # file with URLs
        lines = f.read().splitlines()
    for surl in lines:
        work_queue.put(surl)

    for w in range(workers):
        p = Process(target=worker, args=(work_queue, done_queue))
        p.start()
        processes.append(p)
        work_queue.put('STOP')

    for p in processes:
        p.join()
    print("END")
    done_queue.put('STOP')

    for status in iter(done_queue.get, 'STOP'):
        print(status)

if __name__ == '__main__':
    main()

我很好地看到了所有URL状态都经过测试,以及所有指示进程结束的进程"Out"消息,但从未看到我的"END"消息. 我使用的网址列表是: http://www.pastebin.ca/2946850 .

那么...我的错误在哪里?是否与以下项重复:在进行大量工作时,Python多处理线程永远不会加入?

一些信息:当我在代码中的任何地方都禁止显示"done_queue"时,它是可行的.

解决方案

好的,我找到了答案(在Python文档中:

通过:

    print("Out : %s " % (current_process().name))
    done_queue.cancel_join_thread()
    return True

我不明白为什么初始代码只能使用少量的URL ...

I try some multiprocess examples, mainly : http://toastdriven.com/blog/2008/nov/11/brief-introduction-multiprocessing/ where I have taken the 'simple application', which use multiprocess to test URLs. When I use it (in Python 3.3, on Windows in PyCharm IDE) with some modifications, with a lot of URLs, my script never stop, and I don't see why.

import httplib2
import sys
from multiprocessing import Lock, Process, Queue, current_process

def worker(work_queue, done_queue):
    for url in iter(work_queue.get, 'STOP'):
        try:
            print("In : %s - %s." % (current_process().name, url))
            status_code = print_site_status(url)
            done_queue.put("%s - %s got %s." % (current_process().name, url, status_code))
        except:
            done_queue.put("%s failed on %s with: %s" % (current_process().name, url, str(sys.exc_info()[0])))
    print("Out : %s " % (current_process().name))
    return True

def print_site_status(url):
    http = httplib2.Http(timeout=10)
    headers, content = http.request(url)
    return headers.get('status', 'no response')

def main():
    workers = 8
    work_queue = Queue()
    done_queue = Queue()
    processes = []
    with open("Annu.txt") as f: # file with URLs
        lines = f.read().splitlines()
    for surl in lines:
        work_queue.put(surl)

    for w in range(workers):
        p = Process(target=worker, args=(work_queue, done_queue))
        p.start()
        processes.append(p)
        work_queue.put('STOP')

    for p in processes:
        p.join()
    print("END")
    done_queue.put('STOP')

    for status in iter(done_queue.get, 'STOP'):
        print(status)

if __name__ == '__main__':
    main()

I well see all the URLs status tested, and all the process 'Out' message that indicate hte end of the process, but never my 'END' message. A list of URLs I use is : http://www.pastebin.ca/2946850 .

So ... where is my error ? Is it a duplicate with : Python multiprocessing threads never join when given large amounts of work ?

Some informations : when I suppress 'done_queue' everywhere in the code : it's works.

解决方案

OK, I found the answer (in the Python doc : https://docs.python.org/3.4/library/multiprocessing.html#multiprocessing-programming ) :

Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.

So change the code :

    print("Out : %s " % (current_process().name))
    return True

By :

    print("Out : %s " % (current_process().name))
    done_queue.cancel_join_thread()
    return True

I don't understand why the initial code works with small quantity of URLs...

这篇关于为什么我的多进程Python脚本永无休止?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆