Asp.net的网页抓取..... [英] Page Scraping By Asp.net .....

查看:77
本文介绍了Asp.net的网页抓取.....的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好先生
我有一个页面,该页面是2个文本框(用户名和密码)和一个按钮(登录)
我想在其中的控件的帮助下进入gmail收件箱...通过使用页面抓取...,您能否给我关于..页面抓取...的任何帮助... />

谢谢....

请回复...

Hello Sir
i have a page that are 2 textbox (username and password) and one button (login)
i want to go gmail inbox with the help of there controls... by the using page scraping ... ,, can u give me any idea about .. page scraping... for my Help......


Thankss....

Please Reply...

推荐答案



HTML Agility Pack(http:// http://htmlagilitypack.codeplex.com/ [
Hi,

The HTML Agility Pack (http://http://htmlagilitypack.codeplex.com/[^]) is a superb library for scraping web pages.

Once the page is loaded with the pack, you can use XPath queries to grab data - e.g.

\\a\@href



将获得所有链接标签的所有"href"属性



will get all the ''href'' attributes of all link tags

\\div[@class=''email'']



将获得类名称为"email"的"div"元素



will get the ''div'' element that has the class name ''email''

\\div[@id=''subject'']\td



将获得ID为"subject"的所有div标签的所有"td"直接子元素.

如您所见,能够以这种方式从网页提取数据非常简单-许多其他机制将使您使用嵌套循环或非常复杂的递归函数.



will get all immediate ''td'' children of any div tags that have the id ''subject''.

As you can see being able to extract data from a web page this way is very straightforward - many other mechanisms will have you using nested loops or very complicated recursive functions.


Google HttpWebRequest.您可以使用该类检索网页的源代码,然后对其进行解析.在CodeProject上还有几篇文章演示了该类的用法.
Google HttpWebRequest. You can use that class to retrieve the source code of a web page, and then parse it. There are also several articles here on CodeProject that demonstrate the use of that class.


这篇关于Asp.net的网页抓取.....的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆