从WebBrowser控件保存整个页面 [英] Saving the entire page from the WebBrowser control

查看:145
本文介绍了从WebBrowser控件保存整个页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎没有人实际编写代码来保存.NET webbrowser控件所呈现的页面内容,包括图像。和WebBrowser控件提供的ShowSaveAsDialog很烂 - 它那并不返回文件名,它那并不工作 - 尝试保存已填充搜索的谷歌页面,你得到的只是谷歌主页 - 它不使用指定的参数保存。



现在,当然有这种有用的方法(来自StackOverflow):



所以修改后的方法是:



使用System.NET.HttpWebRequest将主HTML文档作为字符串或流(简单)。

将其加载到HTMLAgilityPack文档中,您现在可以轻松查询文档以获取所有图像元素,样式表链接等的列表。

然后单独创建web请求每个文件并将它们保存到子目录。

最后更新主页面中的所有相关链接以指向子目录中的项目。




但让我感到惊讶的是,没有人发布过这样做的代码东,我能找到。



Argh。为什么使用WebBrowser这样的PITA?不管怎么说,如果有人有救了一个网页,而无需使用ShowSaveAsDialog一些代码,请点我在正确的方向。



马克

解决方案

此问题实际上与浏览器控件无关,而与浏览器本身无关。您需要一个浏览器来呈现网页,但是为了保存网页,您不需要渲染它。您需要使用 Web抓取的众所周知的技术:

http: //en.wikipedia.org/wiki/Web_scraping [ ^ ]。



请参阅我过去的答案以获取更多详细信息:

从网页获取特定数据 [ ^ ],

如何从其他网站获取数据 [ ^ ]。



如果你做了所有这些,你可以从 WebBrowser 控制类(它没有密封),如果你愿意,可以添加这个功能;这不是问题的根源。



-SA


It seems that no one has actually written the code to save the contents of a page, including images, as rendered by the .NET webbrowser control. And the ShowSaveAsDialog provided by the WebBrowser control sucks - it doesn''t return the filename, and it doesn''t work - try saving a Google page with a search filled in, and you get just the Google home page - it doesn''t save with the parameters specified.

Now, of course there''s this helpful approach (from StackOverflow):

So my approach revised would be:

Use System.NET.HttpWebRequest to get the main HTML document as a string or stream (easy).
Load this into a HTMLAgilityPack document where you can now easily query the document to get lists of all image elements, stylesheet links, etc.
Then make a separate web request for each of these files and save them to a subdirectory.
Finally update all relevent links in the main page to point to the items in the subdirectory.


but what amazes me is that no one has posted code that does this, at least that I can find.

Argh. Why is working with WebBrowser such a PITA? Anyways, if someone has some code for saving a web page without using ShowSaveAsDialog, please point me in the right direction.

Marc

解决方案

This problem is not actually related to the browser control, not a browser itself. You need a browser to render the Web page, but to save a Web page, you don''t really need to render it. You need to use the well-known technique of Web scraping:
http://en.wikipedia.org/wiki/Web_scraping[^].

Please see my past answers for further detail:
get specific data from web page[^],
How to get the data from another site[^].

If you do all that, you can derive a class from the WebBrowser control class (it is not sealed) and add this functionality if you wish; this is not a root of the problem.

—SA


这篇关于从WebBrowser控件保存整个页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆