在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么? [英] What is the fastest way to send 100,000 HTTP requests in Python?
问题描述
我正在打开一个包含 100,000 个 URL 的文件.我需要向每个 URL 发送一个 HTTP 请求并打印状态代码.我使用的是 Python 2.6,到目前为止,我研究了 Python 实现线程/并发的许多令人困惑的方式.我什至看过 python concurrence 库,但不知道如何正确编写这个程序.有没有人遇到过类似的问题?我想通常我需要知道如何在 Python 中尽可能快地执行数千个任务 - 我想这意味着同时".
I am opening a file which has 100,000 URL's. I need to send an HTTP request to each URL and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible - I suppose that means 'concurrently'.
推荐答案
Twistedless 解决方案:
Twistedless solution:
from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue
concurrent = 200
def doWork():
while True:
url = q.get()
status, url = getStatus(url)
doSomethingWithResult(status, url)
q.task_done()
def getStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status, ourl
except:
return "error", ourl
def doSomethingWithResult(status, url):
print status, url
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for url in open('urllist.txt'):
q.put(url.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)
这个比twisted解决方案稍微快一点,并且使用更少的CPU.
This one is slighty faster than the twisted solution and uses less CPU.
这篇关于在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!