同时在python中运行多个线程-有可能吗? [英] running multiple threads in python, simultaneously - is it possible?

查看:402
本文介绍了同时在python中运行多个线程-有可能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个应该多次获取URL的小爬虫,我希望所有线程同时(同时)运行.

I'm writing a little crawler that should fetch a URL multiple times, I want all of the threads to run at the same time (simultaneously).

我已经编写了一些应该执行此操作的代码.

I've written a little piece of code that should do that.

import thread
from urllib2 import Request, urlopen, URLError, HTTPError


def getPAGE(FetchAddress):
    attempts = 0
    while attempts < 2:
        req = Request(FetchAddress, None)
        try:
            response = urlopen(req, timeout = 8) #fetching the url
            print "fetched url %s" % FetchAddress
        except HTTPError, e:
            print 'The server didn\'t do the request.'
            print 'Error code: ', str(e.code) + "  address: " + FetchAddress
            time.sleep(4)
            attempts += 1
        except URLError, e:
            print 'Failed to reach the server.'
            print 'Reason: ', str(e.reason) + "  address: " + FetchAddress
            time.sleep(4)
            attempts += 1
        except Exception, e:
            print 'Something bad happened in gatPAGE.'
            print 'Reason: ', str(e.reason) + "  address: " + FetchAddress
            time.sleep(4)
            attempts += 1
        else:
            try:
                return response.read()
            except:
                "there was an error with response.read()"
                return None
    return None

url = ("http://www.domain.com",)

for i in range(1,50):
    thread.start_new_thread(getPAGE, url)

从apache日志来看,线程似乎并没有同时运行,请求之间有一些缝隙,几乎无法检测到,但是我可以看到线程不是真正并行的.

from the apache logs it doesn't seems like the threads are running simultaneously, there's a little gap between requests, it's almost undetectable but I can see that the threads are not really parallel.

我已经了解了GIL,有没有一种方法可以绕过它而无需调用C \ C ++代码? 我真的不明白GIL如何进行线程化? python基本上在上一个线程结束后立即解释下一个线程?

I've read about GIL, is there a way to bypass it with out calling a C\C++ code? I can't really understand how does threading is possible with GIL? python basically interpreters the next thread as soon as it finishes with the previous one?

谢谢.

推荐答案

如您所指出的,GIL通常会阻止Python线程并行运行.

As you point out, the GIL often prevents Python threads from running in parallel.

但是,并非总是如此. I/O绑定代码是一种例外.当线程正在等待I/O请求完成时,通常在进入等待之前它已经释放了GIL.这意味着其他线程可以同时取得进展.

However, that's not always the case. One exception is I/O-bound code. When a thread is waiting for an I/O request to complete, it would typically have released the GIL before entering the wait. This means that other threads can make progress in the meantime.

但是,一般而言,当真正的并行性为时, multiprocessing 是更安全的选择.必填.

In general, however, multiprocessing is the safer bet when true parallelism is required.

这篇关于同时在python中运行多个线程-有可能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆