使用VB.NET进行网页抓取 [英] Web scraping using VB.NET
问题描述
我有一个网址:
test.com/Search/NumberSearch.aspx
页面上有许多控件,其中一个是文本框.当用户在文本框中输入一个六位数(大约)的数字并按回车键时,页面将转到另一个页面:
On the page there are a number of controls, one of them is textbox. When the user enters a six digit (approximately) number into the textbox and hits enter, the page goes to another page:
test.com/Data/DetailsPage.aspx?mynum=123456
在该页面上,除了我需要在代码中捕获的许多链接之外,还有许多文本框和其他控件,我需要从这些文本框中抓取数据.
on that page there are a number of textboxes and other controls from which I need to scrape the data in addition to a number of links that I need to caputure in my code.
我尝试使用VB.NET WebRequest:
I have tried using VB.NET WebRequest:
Dim wreq As WebRequest = WebRequest.Create("test.com/Data/DetailsPage.aspx?mynum=" & num)
Dim wresp As HttpWebResponse = CType(wreq.GetResponse(), HttpWebResponse)
Dim dStream As Stream = wresp.GetResponseStream()
Dim rdr As New StreamReader(dStream)
Dim respStr As String = rdr.ReadToEnd()
结果,我的respStr包含一个带有html代码的字符串,但是该代码用于 p>
As a result my respStr contains a string with html code but that code is for
test.com/Search/NumberSearch.aspx
不适用于结果
test.com/Data/DetailsPage.aspx?mynum=123456
页面,包含详细信息.
page with details.
我的目标是以编程方式获取html的详细信息页面.
My goal is to get the details page html programmatically.
我也尝试使用
WebClient.DownloadString
但是得到了相同的结果.有人可以帮忙吗?
but gotten the same result. Can anyone help?
推荐答案
我会尝试设置User-Agent标头,因为这是许多站点的关键点:
I would try setting the User-Agent header, as that is what many sites key off of:
Dim wreq As WebRequest = WebRequest.Create("test.com/Data/DetailsPage.aspx?mynum=" & num)
wreq.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36")
Dim wresp As HttpWebResponse = CType(wreq.GetResponse(), HttpWebResponse)
Dim dStream As Stream = wresp.GetResponseStream()
Dim rdr As New StreamReader(dStream)
Dim respStr As String = rdr.ReadToEnd()
这篇关于使用VB.NET进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!