从网址不变的网页源中抓取数据 [英] scrape data from web page source where url doesn't change

查看:473
本文介绍了从网址不变的网页源中抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要执行以下操作

选择 专科医院所有门诊护理机构**注2 p>

  • 单击搜索

  • 遍历列表中的所有医院

  • 点击每家医院

  • 从医院页面获取一些数据

  • Select "Special Hospital" and "All Ambulatory Care Facilities **NOTE #2"

    我有2问题


    1. 我不知道如何选择 专科医院所有门诊护理设施**注意#2

    2. 当我手动选择这两种类型,然后单击某些医院时,URL并不会变成特定选择。
      它变为 http://healthapps.state.nj.us/facilities/ acFacilityList.aspx 选择两种类型后,然后在单击医院时保持这种状态。
      因此,由于我不知道如何指定每家医院的URL,因此我无法编写将刮取这些页面的代码。

    1. I don't know how to select the "Special Hospital" and "All Ambulatory Care Facilities **NOTE #2"
    2. When I manually select those 2 types and then click on some of the hospitals, the URL doesn't become selection specific. It becomes http://healthapps.state.nj.us/facilities/acFacilityList.aspx after I select the 2 types, then stays that way when I click on the hospitals. Therefore, I'm not able to write the code that will scrape those pages because I don't know how to specify the URL for each hospital.

    很抱歉,这必须是一个非常基本的问题,但是我无法在Google上用它访问Access VBA的有用信息

    I apologize, this has to be a very basic question but I wasn't able to google anything useful on it for Access VBA

    这是从页面提取数据的代码,我还没有执行循环,所以这只是页面后面的源数据的基本提取

    here's the code that pulls data from a page, i didn't do the loops yet, so this is just a basic pull of the source data behind a page

    Public Function btnGetWebData_Click() 
        Dim strURL
        Dim HTML_Content As HTMLDocument
        Dim dados As Object
    
        'Create HTMLFile Object
        Set HTML_Content = New HTMLDocument
    
        'Get the WebPage Content to HTMLFile Object
        With CreateObject("msxml2.xmlhttp")
            .Open "GET", "http://healthapps.state.nj.us/facilities/acFacilityList.aspx", False
            'http://healthapps.state.nj.us/facilities/acFacilityList.aspx
            .Send
            HTML_Content.Body.innerHTML = .responseText
            Debug.Print .responseText
            Debug.Print HTML_Content.Body.innerHTML
        End With
    End Function
    


    推荐答案

    它导航到每个结果页面,然后回到中间的首页,以便通过点击来利用postBack链接。

    It navigates to each result page, and back to homepage in between so as to leverage the postBack links through clicks.

    Option Explicit
    Public Sub VisitPages()
        Dim IE As New InternetExplorer
        With IE
            .Visible = True
            .navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            With .document
                .querySelector("#middleContent_cbType_5").Click
                .querySelector("#middleContent_cbType_12").Click
                .querySelector("#middleContent_btnGetList").Click
            End With
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            Dim list As Object, i  As Long
            Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
            For i = 0 To list.Length - 1
                list.item(i).Click
    
                While .Busy Or .readyState < 4: DoEvents: Wend
    
                Application.Wait Now + TimeSerial(0, 0, 3) '<== Delete me later. This is just to demo page changes
                'do stuff with new page
                .Navigate2 .document.URL             '<== back to homepage
                While .Busy Or .readyState < 4: DoEvents: Wend
                Set list = .document.querySelectorAll("#main_table [href*=doPostBack]") 'reset list (often required in these scenarios)
            Next
            Stop                                     '<== Delete me later
            '.Quit '<== Remember to quit application
        End With
    End Sub
    






    与执行postBacks相同


    Same thing with executing the postBacks

    Option Explicit
    Public Sub VisitPages()
        Dim IE As New InternetExplorer
        With IE
            .Visible = True
            .navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            With .document
                .querySelector("#middleContent_cbType_5").Click
                .querySelector("#middleContent_cbType_12").Click
                .querySelector("#middleContent_btnGetList").Click
            End With
    
            While .Busy Or .readyState < 4: DoEvents: Wend
    
            Dim list As Object, i  As Long, col As Collection
            Set col = New Collection
            Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
            For i = 0 To list.Length - 1
               col.Add CStr(list.item(i))
            Next
            For i = 1 To col.Count
                .document.parentWindow.execScript col.item(i)
                 While .Busy Or .readyState < 4: DoEvents: Wend
                'Do stuff with page
                .Navigate2 .document.URL
                While .Busy Or .readyState < 4: DoEvents: Wend
            Next
            Stop                                     '<== Delete me later
            '.Quit '<== Remember to quit application
        End With
    End Sub
    

    这篇关于从网址不变的网页源中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆