我们如何在C#中创建网站复印机(该网站中的所有相关内容)? [英] How can we create website copier(all things related in that website) in C#?

查看:85
本文介绍了我们如何在C#中创建网站复印机(该网站中的所有相关内容)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我输入了url-www.w3school.org,那么w3school的所有页面都应该保存在一个文件夹中并创建一个摘要页面index.htm,之后我如何单击index,它将看起来像w3school主页我知道我可以使用httrack,但如果我将通过C#编程来使用它,那么它将探索我.

Suppose i entered the url-www.w3school.org,then all the pages of w3school should save in a folder and create a summary page index.htm,after that how i click on index then it will look like w3school home page.I know i can use httrack but if i will do it through C# programming,then it will explore me.

推荐答案

为此,您可以使用 System.Net.WebClient [
To do this you can use a System.Net.WebClient[^]

- use it to download the page as a big string
- save the string in a file to the harddisk
- parse the string, using regular expressions, for images and what else you want.
- download the images, and what else you want

(If you want to get all the pages of the website you will need to also parse for hyperlinks and recursively download all of those too)

Please do keep in mind that the home page has a link to the home page!
To get around circles like that use a Dictionary to keep track of what you have and have not downloaded. A dictionary can contain a dictionary, that way you can save index.html in the first one, and add a dictionary for asp.net, and in the 2d dictionary save index.html again.

In the end you can then recursively print the dictionary to give you the sitemap.

Do keep in mind that this will generate a lot of traffic, and might not always be allowed by the website owners.

Hope this helps you on your way :-)


这篇关于我们如何在C#中创建网站复印机(该网站中的所有相关内容)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆