VBA - HTML抓取问题 [英] VBA - HTML scraping problems

查看:245
本文介绍了VBA - HTML抓取问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站上查看拍卖数据(





您可以使用以下VBA代码检索如上所述的信息。



BTW,同样的方法应用于以下答案: 1 2 3 4 5 6 7


I'm attempting to scrape auction data from a website (https://www.rbauction.com/heavy-equipment-auctions). My current attempt was to use the below code to pull the website's HTML into VBA and then sparce through it and keep only the items I wanted (auction name, number of days, number of items).

Sub RBA_Auction_Scrape()

Dim S_Sheet As Worksheet: Set S_Sheet = ActiveWorkbook.ActiveSheet
Dim Look_String As String

On Error GoTo ERR_LABEL:

Dim Web_HTML As String
Dim HTTP_OBJ As New MSXML2.XMLHTTP60

    Web_HTML = ""
    HTTP_OBJ.Open "GET", "https://www.rbauction.com/heavy-equipment auctions", False
    HTTP_OBJ.Send

On Error Resume Next

Select Case HTTP_OBJ.Status
   Case 0: Web_HTML = HTTP_OBJ.responseText
   Case 200: Web_HTML = HTTP_OBJ.responseText
   Case Else: GoTo ERR_LABEL:
End Select

Debug.Print (Web_HTML)

It successfully pulls in the data, but the 'upcoming heavy equipment auction' section that has all of the names and sizes of the auctions does not get pulled into VBA. I'm not very good with HTML in general but I was hoping someone could offer a solution or at least an explanation as to when I search through the website HTML that is pulled into VBA, the articles that I want are not found.

Please help!!!

解决方案

The webpage source HTML by the link provided https://www.rbauction.com/heavy-equipment-auctions doesn't contain the necessary data, it uses AJAX. The website https://www.rbauction.com has an API available. Response is returned in JSON format. Navigate the page e. g. in Chrome, then open Developer Tools window (F12), Network tab, reload (F5) the page and examine logged XHRs. Most relevant data is JSON string returned by the URL https://www.rbauction.com/rba-api/calendar/v1?e1=true:

You may use the below VBA code to retrieve info as described above. Import JSON.bas module into the VBA project for JSON processing.

Option Explicit

Sub Test_www_rbauction_com()

    Const Transposed = False ' Output option

    Dim sResponse As String
    Dim vJSON
    Dim sState As String
    Dim i As Long
    Dim aRows()
    Dim aHeader()

    ' Retrieve JSON data
    XmlHttpRequest "GET", "https://www.rbauction.com/rba-api/calendar/v1?e1=true", "", "", "", sResponse
    ' Parse JSON response
    JSON.Parse sResponse, vJSON, sState
    If sState <> "Object" Then
        MsgBox "Invalid JSON response"
        Exit Sub
    End If
    ' Pick core data
    vJSON = vJSON("auctions")
    ' Extract selected properties for each item
    For i = 0 To UBound(vJSON)
        Set vJSON(i) = ExtractKeys(vJSON(i), Array("eventId", "name", "date", "itemCount"))
        DoEvents
    Next
    ' Convert JSON structure to 2-d arrays for output
    JSON.ToArray vJSON, aRows, aHeader
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        If Transposed Then
            Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
            Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        Else
            OutputArray .Cells(1, 1), aHeader
            Output2DArray .Cells(2, 1), aRows
        End If
        .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String)

    Dim arrHeader

    'With CreateObject("Msxml2.ServerXMLHTTP")
    '    .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
    With CreateObject("MSXML2.XMLHTTP")
        .Open sMethod, sUrl, False
        If IsArray(arrSetHeaders) Then
            For Each arrHeader In arrSetHeaders
                .SetRequestHeader arrHeader(0), arrHeader(1)
            Next
        End If
        .send sFormData
        sRespHeaders = .GetAllResponseHeaders
        sContent = .responseText
    End With

End Sub

Function ExtractKeys(oSource, aKeys, Optional oDest = Nothing) As Object

    Dim vKey

    If oDest Is Nothing Then Set oDest = CreateObject("Scripting.Dictionary")
    For Each vKey In aKeys
        If oSource.Exists(vKey) Then
            If IsObject(oSource(vKey)) Then
                Set oDest(vKey) = oSource(vKey)
            Else
                oDest(vKey) = oSource(vKey)
            End If
        End If
    Next
    Set ExtractKeys = oDest

End Function

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

The output for me is as follows:

BTW, the same approach applied in the following answers: 1, 2, 3, 4, 5, 6 and 7.

这篇关于VBA - HTML抓取问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆