什么是最快的方式发送100,000 HTTP请求在Python? [英] What is the fastest way to send 100,000 HTTP requests in Python?

查看:303
本文介绍了什么是最快的方式发送100,000 HTTP请求在Python?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打开一个有100,000个网址的文件。我需要发送一个http请求到每个url并打印状态代码。我使用Python 2.6,到目前为止看到许多令人困惑的方式Python实现线程/并发。我甚至看过python的同意库,但无法弄清楚如何正确编写此程序。有没有人遇到类似的问题?我想通常我需要知道如何在Python中尽可能快地执行数千个任务 - 我想这意味着同时。

I am opening a file which has 100,000 url's. I need to send an http request to each url and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible - I suppose that means 'concurrently'.

谢谢你,
Igor

Thank you, Igor

推荐答案

Twistedless解决方案:

Twistedless solution:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
    while True:
        url = q.get()
        status, url = getStatus(url)
        doSomethingWithResult(status, url)
        q.task_done()

def getStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(url.netloc)   
        conn.request("HEAD", url.path)
        res = conn.getresponse()
        return res.status, ourl
    except:
        return "error", ourl

def doSomethingWithResult(status, url):
    print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for url in open('urllist.txt'):
        q.put(url.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

这个比扭曲的解决方案快一点,使用更少的CPU。

This one is slighty faster than the twisted solution and uses less CPU.

这篇关于什么是最快的方式发送100,000 HTTP请求在Python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆