下载"COMPLETE"网页以编程方式? [英] downloading a "COMPLETE" webpage programmatically?

查看:71
本文介绍了下载"COMPLETE"网页以编程方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用C#,我想以编程方式下载完整的网页……我的意思不仅是页面源代码
我想在FireFox中下载与将页面另存为"相同的网页,但是我想通过编程方式进行下载吗?

I am using C# and I want to download a Complete web page Programmatically......I mean not only the page source
I want the webpage to be downloaded the same as "save page as" in FireFox but i want to do this programmatically?

推荐答案

您需要下载并解析页面以找出所有外部资源,然后依次下载(并可能解析并下载其中的资源)

You need to download and parse the page to find out all the external resources, then download each in turn (and possibly parse it and download the resources within it).

对于解析HTML,我建议使用 HTML Agility Pack .您需要记住要下载哪些资源(图像,CSS,JavaScript等),并在页面上查询这些资源.

For parsing the HTML, I suggest using the HTML Agility Pack. You need to keep in mind what resources you want to download (images, css, javascript etc) and query the page for those specifically.

您需要记住,某些页面将定义

You will need to keep in mind that some pages will define a base and that you will need to consider that as well as the page URL with relative and absolute links.

您可能还想解析CSS,以获取诸如图像引用之类的信息.

You may also want to parse the CSS for things like image references.

要结束,您将需要将所有这些引用更改为指向资源下载到的本地引用(感谢@Scott M).

To finish off, you will want to change all these references to local ones that point to where the resources have been downloaded to (thanks @Scott M).

这篇关于下载"COMPLETE"网页以编程方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆