Python - 非常简单的多线程并行URL提取(没有队列) [英] Python - very simple multithreading parallel URL fetching (without queue)
本文介绍了Python - 非常简单的多线程并行URL提取(没有队列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我花了一整天时间在Python中寻找最简单的可能的多线程网址获取器,但我发现大多数脚本都使用了队列或多处理或复杂的库。
I spent a whole day looking for the simplest possible multithreaded URL fetcher in Python, but most scripts I found are using queues or multiprocessing or complex libraries.
写了一个自己,我报告为答案。请随时提出任何改进建议。
Finally I wrote one myself, which I am reporting as an answer. Please feel free to suggest any improvement.
我猜其他人可能一直在寻找类似的东西。
I guess other people might have been looking for something similar.
推荐答案
尽可能简化原始版本:
import threading
import urllib2
import time
start = time.time()
urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]
def fetch_url(url):
urlHandler = urllib2.urlopen(url)
html = urlHandler.read()
print "'%s\' fetched in %ss" % (url, (time.time() - start))
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print "Elapsed Time: %s" % (time.time() - start)
这里唯一的新技巧是:
- 跟踪您创建的主题。
- 如果你只是想知道他们什么时候完成,
加入
已经告诉你。 - 如果你不需要任何状态或外部API,你不需要
线程
子类,只是一个目标
函数。
- Keep track of the threads you create.
- Don't bother with a counter of threads if you just want to know when they're all done;
join
already tells you that. - If you don't need any state or external API, you don't need a
Thread
subclass, just atarget
function.
这篇关于Python - 非常简单的多线程并行URL提取(没有队列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文