一次/并行检索多个URL [英] Retrieve multiple urls at once/in parallel

查看:62
本文介绍了一次/并行检索多个URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
如何通过urllib2在python?

Possible Duplicate:
How can I speed up fetching pages with urllib2 in python?

我有一个python脚本,可以下载网页,对其进行解析并从该网页返回一些值.我需要刮一些这样的页面以获得最终结果.每次检索页面都需要很长时间(5-10s),我希望并行发出请求以减少等待时间.
问题是-哪种机制可以快速,正确并以最少的CPU/内存浪费来实现?扭曲,不同步,线程化,还有其他吗?您能否提供一些示例链接?
谢谢

I have a python script that download web page, parse it and return some value from the page. I need to scrape a few such pages for getting the final result. Every page retrieve takes long time (5-10s) and I'd prefer to make requests in parallel to decrease wait time.
The question is - which mechanism will do it quick, correctly and with minimal CPU/Memory waste? Twisted, asyncore, threading, something else? Could you provide some link with examples?
Thanks

UPD:有一些解决问题的方法,我正在寻找速度和资源之间的折衷方案.如果您可以告诉一些体验细节-从您的角度来看它在负载下的运行速度如何-将会非常有帮助.

UPD: There's a few solutions for the problem, I'm looking for the compromise between speed and resources. If you could tell some experience details - how it's fast under load from your view, etc - it would be very helpful.

推荐答案

multiprocessing.Pool可以很不错,有

multiprocessing.Pool can be a good deal, there are some useful examples. For example if you have a list of urls, you can map the contents retrieval in a concurrent way:

def process_url(url):
    # Do what you want
    return what_you_want

pool = multiprocessing.Pool(processes=4) # how much parallelism?
pool.map(process_url, list_of_urls)

这篇关于一次/并行检索多个URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆