浏览网站中的所有页面. [英] Navigating all the pages in a website.

查看:241
本文介绍了浏览网站中的所有页面.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

有谁知道在知道网站的索引(主页URL)后如何以编程方式浏览网站的所有页面(可能不需要登录的页面)的想法.任何帮助或想法将不胜感激.

谢谢
是的,在Code Project上有几篇文章实现了网络爬虫"/网络蜘蛛"(您要尝试的事物的名称)做).阅读它们.简短的版本是:下载HTML,检查链接,导航到那些链接并下载该HTML,直到没有更多要检查的链接为止.


您确实需要某种信息源来了解网站中所有页面的名称和位置,以便能够以编程方式导航这些页面.

例如,Google在网站的Web根目录中搜索SiteMap.xml,如果找到它,它将读取其中写入的页面位置,然后对页面进行爬网以从每个页面读取面向SEO 的数据.

因此,您需要在目标网站中使用某种SiteMap.xml来了解该网站中所有页面的名称和位置,以便您可以通过编程方式浏览页面.





在页面加载事件中,您必须重定向到相关页面.



您可以尝试使用应用程序启动事件


Hi everyone,

Does anyone have an idea of how to programmatically navigate through all the pages of a website(probably pages where login is not required), when the index(home URL) of the website is known. Any help or ideas will be highly appreciated.

Thanks
Anurag

解决方案

Yeah, there are several articles on Code Project that implement "web crawlers" / "web spiders" (the name for the thing you''re trying to do). Read them. The short version is: download the HTML, check for links, navigate to those links and download that HTML until you have no more links to check.


You really need some kind of information source to know the name and location of all pages within a web site, to be able to navigate those programmatically.

For example, Google searches for SiteMap.xml within the web root directory of a web site, and if it is found, it reads the page locations written within it and then crawls the pages to read SEO oriented data from each page.

So, what you need is some kind of SiteMap.xml within your target web site to know the name and location of all pages within the site so that you can programmatically navigate through the pages.


Hi ,


In the page load event you have to redirect to related page .

Or

U have try with Application Start Event


这篇关于浏览网站中的所有页面.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆