如何保存“完整的网页”不只是使用Python的基本html [英] How to save "complete webpage" not just basic html using Python

查看：965 发布时间：2018/6/13 17:01:55 python html python-2.7 urllib2 urllib

本文介绍了如何保存“完整的网页”不只是使用Python的基本html的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用以下代码来保存使用Python的网页：

  import urllib 
 import sys 
 from bs4 import BeautifulSoup 
 
 url ='http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html'
f = urllib.urlretrieve（url，'test .html'）

问题：此代码将html保存为基本html JavaScript，图片等。我想将网页保存为完整的（如我们在浏览器中有选项）

更新：
我是现在使用下面的代码来保存webapge的所有js / images / css文件，这样它就可以保存为完整的网页，但是我的输出html仍然保存为基本html：

  import pycurl 
 import StringIO 
 
c = pycurl.Curl（）
 c.setopt（pycurl.URL，http：// www.vodafone.de/privat/tarife/red-smartphone-tarife.html）
 
b = StringIO.StringIO（）
 c.setopt（pycurl.WRITEFUNCTION，b.write）
 c.setopt（pycurl.FOLL OWLOCATION，1）
 c.setopt（pycurl.MAXREDIRS，5）
 c.perform（）
 html = b.getvalue（）
 #print html 
 fh = open（file.html，w）
 fh.write（html）
 fh.close（）

解决方案

试着用硒。该脚本将弹出网页的另存为对话框。您将仍然需要弄清楚如何模拟按下输入以便下载，因为文件对话框超出了硒的范围（您如何操作也取决于操作系统）。

  from selenium import webdriver 
 from selenium.webdriver.common.action_chains import ActionChains 
 from selenium.webdriver.common.keys import Keys 
 
 br = webdriver.Firefox（）
 br.get（'http://www.google.com/'）
 
 save_me = ActionChains（br）.key_down（Keys.CONTROL）\\ \\ 
 .key_down（'s'）。key_up（Keys.CONTROL）.key_up（'s'）
 save_me.perform（）

另外，我认为遵循 @Amber 关于抓取链接资源的建议可能更简单，因此是更好的解决方案。尽管如此，我认为使用硒是一个很好的起点，因为 br.page_source 会让您将整个dom与由javascript生成的动态内容结合在一起。

I am using following code to save webpage using Python:
import urllib import sys from bs4 import BeautifulSoup url = 'http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html' f = urllib.urlretrieve(url,'test.html')
Problem: This code saves html as basic html without javascripts, images etc. I want to save webpage as complete (Like we have option in browser)

Update: I am using following code now to save all the js/images/css files of webapge so that it can be saved as complete webpage but still my output html is getting saved like basic html:
import pycurl import StringIO c = pycurl.Curl() c.setopt(pycurl.URL, "http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html") b = StringIO.StringIO() c.setopt(pycurl.WRITEFUNCTION, b.write) c.setopt(pycurl.FOLLOWLOCATION, 1) c.setopt(pycurl.MAXREDIRS, 5) c.perform() html = b.getvalue() #print html fh = open("file.html", "w") fh.write(html) fh.close()

解决方案
Try emulating your browser with selenium. This script will pop up the save as dialog for the webpage. You will still have to figure out how to emulate pressing enter for download to start as the file dialog is out of selenium's reach (how you do it is also OS dependent).
from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys br = webdriver.Firefox() br.get('http://www.google.com/') save_me = ActionChains(br).key_down(Keys.CONTROL)\ .key_down('s').key_up(Keys.CONTROL).key_up('s') save_me.perform()
Also I think following @Amber suggestion of grabbing the the linked resources may be a simpler, thus a better solution. Still, I think using selenium is a good starting point as br.page_source will get you the entire dom along with the dynamic content generated by javascript.

这篇关于如何保存“完整的网页”不只是使用Python的基本html的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何保存“完整的网页”不只是使用Python的基本html [英] How to save "complete webpage" not just basic html using Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何保存“完整的网页”不只是使用Python的基本html [英] How to save &quot;complete webpage&quot; not just basic html using Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

如何保存“完整的网页”不只是使用Python的基本html [英] How to save "complete webpage" not just basic html using Python

登录关闭