如何在Python中使用wget下载网页(MHTML格式) [英] How to download a webpage (mhtml format) using wget in python

查看:577
本文介绍了如何在Python中使用wget下载网页(MHTML格式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何保存包含网页内容的网页,以便可以使用python语言的wget离线查看该网页?目前,我正在使用以下代码:

How can we save the webpage including the content in it, so that it is viewable offline, using wget in python language? Currently I am using the following code:

import wget

driver.webdriver.Chrome()
driver.get("http://www.yahoo.com")
wget.download("http://www.yahoo.com", C:\\Users\\karanjuneja\\Downloads\\kj\\yahoo.mhtml")

这可以正常工作并在文件夹中显示网页的mhtml版本,但是打开文件时,您只会找到编写的代码,而不是页面在网上的显示方式.有什么建议? 谢谢 卡兰(Karan)

This works and strores an mhtml version of the webpage in the folder, but when you open the file, you will only find the codes written and not the page how it appears online. Any suggestions? Thanks Karan

推荐答案

此代码将帮助您创建站点的脱机副本,即使没有Internet访问,也可以访问和查看该站点.

This code will help you to create a offline copy of a site that you can take and view even without internet access.

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://example.org

-mirror –(除其他外)使下载递归.

--mirror – Makes (among other things) the download recursive.

-convert-links –将所有链接(也转换为CSS样式表之类的东西)转换为相对链接,因此适合离线查看.

--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.

-adjust-extension –根据文件名的内容类型为文件名(html或css)添加合适的扩展名.

--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.

-页面要求–下载CSS样式表和图像以离线正确显示页面所需的内容.

--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.

-no-parent –递归时不要升至父目录.对于将下载限制为仅网站的一部分很有用.

--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

感谢Guy Rutenberg在他的论坛中提供了代码,这对我也有帮助.

Thanks to Guy Rutenberg for providing the code in his forum which helped me too.

这篇关于如何在Python中使用wget下载网页(MHTML格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆