在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么? [英] What is the fastest way to send 100,000 HTTP requests in Python?

查看:32
本文介绍了在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在打开一个包含 100,000 个 URL 的文件.我需要向每个 URL 发送一个 HTTP 请求并打印状态代码.我使用的是 Python 2.6,到目前为止,我研究了 Python 实现线程/并发的许多令人困惑的方式.我什至看过 python concurrence 库,但不知道如何正确编写这个程序.有没有人遇到过类似的问题?我想通常我需要知道如何在 Python 中尽可能快地执行数千个任务 - 我想这意味着同时".

I am opening a file which has 100,000 URL's. I need to send an HTTP request to each URL and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible - I suppose that means 'concurrently'.

推荐答案

Twistedless 解决方案:

Twistedless solution:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
    while True:
        url = q.get()
        status, url = getStatus(url)
        doSomethingWithResult(status, url)
        q.task_done()

def getStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(url.netloc)   
        conn.request("HEAD", url.path)
        res = conn.getresponse()
        return res.status, ourl
    except:
        return "error", ourl

def doSomethingWithResult(status, url):
    print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for url in open('urllist.txt'):
        q.put(url.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

这个比twisted解决方案稍微快一点,并且使用更少的CPU.

This one is slighty faster than the twisted solution and uses less CPU.

这篇关于在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆