使用VB.NET进行网页抓取 [英] Web scraping using VB.NET

查看:353
本文介绍了使用VB.NET进行网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网址:

test.com/Search/NumberSearch.aspx

页面上有许多控件,其中一个是文本框.当用户在文本框中输入一个六位数(大约)的数字并按回车键时,页面将转到另一个页面:

On the page there are a number of controls, one of them is textbox. When the user enters a six digit (approximately) number into the textbox and hits enter, the page goes to another page:

test.com/Data/DetailsPage.aspx?mynum=123456

在该页面上,除了我需要在代码中捕获的许多链接之外,还有许多文本框和其他控件,我需要从这些文本框中抓取数据.

on that page there are a number of textboxes and other controls from which I need to scrape the data in addition to a number of links that I need to caputure in my code.

我尝试使用VB.NET WebRequest:

I have tried using VB.NET WebRequest:

Dim wreq As WebRequest = WebRequest.Create("test.com/Data/DetailsPage.aspx?mynum=" & num)
Dim wresp As HttpWebResponse = CType(wreq.GetResponse(), HttpWebResponse)  
Dim dStream As Stream = wresp.GetResponseStream()
Dim rdr As New StreamReader(dStream)
Dim respStr As String = rdr.ReadToEnd()

结果,我的respStr包含一个带有html代码的字符串,但是该代码用于

As a result my respStr contains a string with html code but that code is for

 test.com/Search/NumberSearch.aspx

不适用于结果

test.com/Data/DetailsPage.aspx?mynum=123456

页面,包含详细信息.

page with details.

我的目标是以编程方式获取html的详细信息页面.

My goal is to get the details page html programmatically.

我也尝试使用

WebClient.DownloadString

但是得到了相同的结果.有人可以帮忙吗?

but gotten the same result. Can anyone help?

推荐答案

我会尝试设置User-Agent标头,因为这是许多站点的关键点:

I would try setting the User-Agent header, as that is what many sites key off of:

Dim wreq As WebRequest = WebRequest.Create("test.com/Data/DetailsPage.aspx?mynum=" & num)
wreq.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36")

Dim wresp As HttpWebResponse = CType(wreq.GetResponse(), HttpWebResponse)
Dim dStream As Stream = wresp.GetResponseStream()
Dim rdr As New StreamReader(dStream)
Dim respStr As String = rdr.ReadToEnd()

这篇关于使用VB.NET进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆