一次/并行检索多个URL [英] Retrieve multiple urls at once/in parallel

查看：62 发布时间：2020/5/24 21:08:37 python parallel-processing screen-scraping

本文介绍了一次/并行检索多个URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

可能重复:
如何通过urllib2在python?

Possible Duplicate:
How can I speed up fetching pages with urllib2 in python?

我有一个python脚本，可以下载网页，对其进行解析并从该网页返回一些值.我需要刮一些这样的页面以获得最终结果.每次检索页面都需要很长时间(5-10s)，我希望并行发出请求以减少等待时间.
问题是-哪种机制可以快速，正确并以最少的CPU/内存浪费来实现?扭曲，不同步，线程化，还有其他吗?您能否提供一些示例链接?
谢谢

I have a python script that download web page, parse it and return some value from the page. I need to scrape a few such pages for getting the final result. Every page retrieve takes long time (5-10s) and I'd prefer to make requests in parallel to decrease wait time.
The question is - which mechanism will do it quick, correctly and with minimal CPU/Memory waste? Twisted, asyncore, threading, something else? Could you provide some link with examples?
Thanks

UPD:有一些解决问题的方法，我正在寻找速度和资源之间的折衷方案.如果您可以告诉一些体验细节-从您的角度来看它在负载下的运行速度如何-将会非常有帮助.

UPD: There's a few solutions for the problem, I'm looking for the compromise between speed and resources. If you could tell some experience details - how it's fast under load from your view, etc - it would be very helpful.

推荐答案

multiprocessing.Pool可以很不错，有

multiprocessing.Pool can be a good deal, there are some useful examples. For example if you have a list of urls, you can map the contents retrieval in a concurrent way:

def process_url(url):
    # Do what you want
    return what_you_want

pool = multiprocessing.Pool(processes=4) # how much parallelism?
pool.map(process_url, list_of_urls)

这篇关于一次/并行检索多个URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

一次/并行检索多个URL [英] Retrieve multiple urls at once/in parallel

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

一次/并行检索多个URL [英] Retrieve multiple urls at once/in parallel

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭