Asp.net的网页抓取..... [英] Page Scraping By Asp.net .....
问题描述
您好先生
我有一个页面,该页面是2个文本框(用户名和密码)和一个按钮(登录)
我想在其中的控件的帮助下进入gmail收件箱...通过使用页面抓取...,您能否给我关于..页面抓取...的任何帮助... />
谢谢....
请回复...
Hello Sir
i have a page that are 2 textbox (username and password) and one button (login)
i want to go gmail inbox with the help of there controls... by the using page scraping ... ,, can u give me any idea about .. page scraping... for my Help......
Thankss....
Please Reply...
推荐答案
HTML Agility Pack(http:// http://htmlagilitypack.codeplex.com/ [
Hi,
The HTML Agility Pack (http://http://htmlagilitypack.codeplex.com/[^]) is a superb library for scraping web pages.
Once the page is loaded with the pack, you can use XPath queries to grab data - e.g.
\\a\@href
将获得所有链接标签的所有"href"属性
will get all the ''href'' attributes of all link tags
\\div[@class=''email'']
将获得类名称为"email"的"div"元素
will get the ''div'' element that has the class name ''email''
\\div[@id=''subject'']\td
将获得ID为"subject"的所有div标签的所有"td"直接子元素.
如您所见,能够以这种方式从网页提取数据非常简单-许多其他机制将使您使用嵌套循环或非常复杂的递归函数.
will get all immediate ''td'' children of any div tags that have the id ''subject''.
As you can see being able to extract data from a web page this way is very straightforward - many other mechanisms will have you using nested loops or very complicated recursive functions.
Google HttpWebRequest
.您可以使用该类检索网页的源代码,然后对其进行解析.在CodeProject上还有几篇文章演示了该类的用法.
Google HttpWebRequest
. You can use that class to retrieve the source code of a web page, and then parse it. There are also several articles here on CodeProject that demonstrate the use of that class.
这篇关于Asp.net的网页抓取.....的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!