用 Python 下载网页及其所有资源文件 [英] Downloading a web page and all of its resource files in Python
问题描述
我希望能够使用 Python 下载页面及其所有相关资源(图像、样式表、脚本文件等).我(有点)熟悉 urllib2 并且知道如何下载单个 url,但是在我开始在 BeautifulSoup + urllib2 上进行黑客攻击之前,我想确保还没有等效于wget --page-requisites http://www.google.com".
I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to "wget --page-requisites http://www.google.com".
我特别感兴趣的是收集有关下载整个网页(包括所有资源)所需时间的统计信息.
Specifically I am interested in gathering statistical information about how long it takes to download an entire web page, including all resources.
谢谢标记
推荐答案
Websucker?请参阅 http://effbot.org/zone/websucker.htm
Websucker? See http://effbot.org/zone/websucker.htm
这篇关于用 Python 下载网页及其所有资源文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!