urllib2的多处理没用吗? [英] Multiprocessing useless with urllib2?
问题描述
我最近尝试使用多处理模块(和它的工人池).我在这里阅读了一些有关多线程(与标准的非线程版本相比,它使整个过程变慢了)和多处理的讨论,但是我找不到一个(可能非常简单的)问题的答案:
I recently tried to speed up a little tool (which uses urllib2 to send a request to the (unofficial)twitter-button-count-url (> 2000 urls) and parses it´s results) with the multiprocessing module (and it´s worker pools). I read several discussion here about multithreading (which slowed the whole thing down compared to a standard, non-threaded version) and multiprocessing, but i could´t find an answer to a (probably very simple) question:
您可以通过多处理来加速url调用吗,还是不是像网络适配器那样的瓶颈?我看不到urllib2-open-method的哪一部分可以并行化以及应该如何工作...
Can you speed up url-calls with multiprocessing or ain´t the bottleneck something like the network-adapter? I don´t see which part of, for example, the urllib2-open-method could be parallelized and how that should work...
这是我要加快的请求,也是当前的多处理设置:
THis is the request i want to speed up and the current multiprocessing-setup:
urls=["www.foo.bar", "www.bar.foo",...]
tw_url='http://urls.api.twitter.com/1/urls/count.json?url=%s'
def getTweets(self,urls):
for i in urls:
try:
self.tw_que=urllib2.urlopen(tw_url %(i))
self.jsons=json.loads(self.tw_que.read())
self.tweets.append({'url':i,'date':today,'tweets':self.jsons['count']})
except ValueError:
print ....
continue
return self.tweets
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
result = [pool.apply_async(getTweets(i,)) for i in urls]
[i.get() for i in result]
推荐答案
看看 gevent ,特别是在此示例中: concurrent_download.py .它将比多处理和多线程更快,并且可以轻松处理数千个连接.
Take a look at a look at gevent and specifically at this example: concurrent_download.py. It will be reasonably faster than multiprocessing and multithreading + it can handle thousands of connections easily.
这篇关于urllib2的多处理没用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!