从网址不变的网页源中抓取数据 [英] scrape data from web page source where url doesn't change
本文介绍了从网址不变的网页源中抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要执行以下操作
选择 专科医院和 所有门诊护理机构**注2 p>
Select "Special Hospital" and "All Ambulatory Care Facilities **NOTE #2"
我有2问题
- 我不知道如何选择 专科医院和 所有门诊护理设施**注意#2
- 当我手动选择这两种类型,然后单击某些医院时,URL并不会变成特定选择。
它变为 http://healthapps.state.nj.us/facilities/ acFacilityList.aspx 选择两种类型后,然后在单击医院时保持这种状态。
因此,由于我不知道如何指定每家医院的URL,因此我无法编写将刮取这些页面的代码。
- I don't know how to select the "Special Hospital" and "All Ambulatory Care Facilities **NOTE #2"
- When I manually select those 2 types and then click on some of the hospitals, the URL doesn't become selection specific. It becomes http://healthapps.state.nj.us/facilities/acFacilityList.aspx after I select the 2 types, then stays that way when I click on the hospitals. Therefore, I'm not able to write the code that will scrape those pages because I don't know how to specify the URL for each hospital.
很抱歉,这必须是一个非常基本的问题,但是我无法在Google上用它访问Access VBA的有用信息
I apologize, this has to be a very basic question but I wasn't able to google anything useful on it for Access VBA
这是从页面提取数据的代码,我还没有执行循环,所以这只是页面后面的源数据的基本提取
here's the code that pulls data from a page, i didn't do the loops yet, so this is just a basic pull of the source data behind a page
Public Function btnGetWebData_Click()
Dim strURL
Dim HTML_Content As HTMLDocument
Dim dados As Object
'Create HTMLFile Object
Set HTML_Content = New HTMLDocument
'Get the WebPage Content to HTMLFile Object
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://healthapps.state.nj.us/facilities/acFacilityList.aspx", False
'http://healthapps.state.nj.us/facilities/acFacilityList.aspx
.Send
HTML_Content.Body.innerHTML = .responseText
Debug.Print .responseText
Debug.Print HTML_Content.Body.innerHTML
End With
End Function
推荐答案
它导航到每个结果页面,然后回到中间的首页,以便通过点击来利用postBack链接。
It navigates to each result page, and back to homepage in between so as to leverage the postBack links through clicks.
Option Explicit
Public Sub VisitPages()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#middleContent_cbType_5").Click
.querySelector("#middleContent_cbType_12").Click
.querySelector("#middleContent_btnGetList").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
Dim list As Object, i As Long
Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
For i = 0 To list.Length - 1
list.item(i).Click
While .Busy Or .readyState < 4: DoEvents: Wend
Application.Wait Now + TimeSerial(0, 0, 3) '<== Delete me later. This is just to demo page changes
'do stuff with new page
.Navigate2 .document.URL '<== back to homepage
While .Busy Or .readyState < 4: DoEvents: Wend
Set list = .document.querySelectorAll("#main_table [href*=doPostBack]") 'reset list (often required in these scenarios)
Next
Stop '<== Delete me later
'.Quit '<== Remember to quit application
End With
End Sub
与执行postBacks相同
Same thing with executing the postBacks
Option Explicit
Public Sub VisitPages()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#middleContent_cbType_5").Click
.querySelector("#middleContent_cbType_12").Click
.querySelector("#middleContent_btnGetList").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
Dim list As Object, i As Long, col As Collection
Set col = New Collection
Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
For i = 0 To list.Length - 1
col.Add CStr(list.item(i))
Next
For i = 1 To col.Count
.document.parentWindow.execScript col.item(i)
While .Busy Or .readyState < 4: DoEvents: Wend
'Do stuff with page
.Navigate2 .document.URL
While .Busy Or .readyState < 4: DoEvents: Wend
Next
Stop '<== Delete me later
'.Quit '<== Remember to quit application
End With
End Sub
这篇关于从网址不变的网页源中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文