什么是最快的方式发送100,000 HTTP请求在Python? [英] What is the fastest way to send 100,000 HTTP requests in Python?
问题描述
我打开一个有100,000个网址的文件。我需要发送一个http请求到每个url并打印状态代码。我使用Python 2.6,到目前为止看到许多令人困惑的方式Python实现线程/并发。我甚至看过python的同意库,但无法弄清楚如何正确编写此程序。有没有人遇到类似的问题?我想通常我需要知道如何在Python中尽可能快地执行数千个任务 - 我想这意味着同时。
I am opening a file which has 100,000 url's. I need to send an http request to each url and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible - I suppose that means 'concurrently'.
谢谢你,
Igor
Thank you, Igor
推荐答案
Twistedless解决方案:
Twistedless solution:
from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue
concurrent = 200
def doWork():
while True:
url = q.get()
status, url = getStatus(url)
doSomethingWithResult(status, url)
q.task_done()
def getStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status, ourl
except:
return "error", ourl
def doSomethingWithResult(status, url):
print status, url
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for url in open('urllist.txt'):
q.put(url.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)
这个比扭曲的解决方案快一点,使用更少的CPU。
This one is slighty faster than the twisted solution and uses less CPU.
这篇关于什么是最快的方式发送100,000 HTTP请求在Python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!