下载“完成”网页编程? [英] downloading a "COMPLETE" webpage programatically?

查看:71
本文介绍了下载“完成”网页编程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是C#,我想要下载一个完整的网页编程式......我的意思不仅是页面源代码
我希望网页的下载方式与保存页面作为在FireFox中,但我想这样做编程?

解决方案

您需要下载并解析页面以找出所有外部资源,然后依次下载(可能解析并下载资源)。​​

为了解析HTML,我建议使用 HTML Agility Pack 。你需要记住你想下载的资源(图片,CSS,JavaScript等),并查询特定的页面。



您需要牢记某些页面将定义 base ,您需要考虑以及包含相对和绝对链接的网页网址。



您可能还想为了完成这项工作,您需要将所有这些引用都改为本地引用,这些引用指向将资源下载到的地方(参考文献谢谢@Scott M)。


I am using C# and I want to download a Complete web page Programatically......I mean not only the page source
I want the webpage to be downloaded the same as "save page as" in FireFox but i want to do this programatically?

解决方案

You need to download and parse the page to find out all the external resources, then download each in turn (and possibly parse it and download the resources within it).

For parsing the HTML, I suggest using the HTML Agility Pack. You need to keep in mind what resources you want to download (images, css, javascript etc) and query the page for those specifically.

You will need to keep in mind that some pages will define a base and that you will need to consider that as well as the page URL with relative and absolute links.

You may also want to parse the CSS for things like image references.

To finish off, you will want to change all these references to local ones that point to where the resources have been downloaded to (thanks @Scott M).

这篇关于下载“完成”网页编程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆