VBA - HTML抓取问题 [英] VBA - HTML scraping problems
问题描述
我正在尝试从网站上查看拍卖数据(
您可以使用以下VBA代码检索如上所述的信息。 将
BTW,同样的方法应用于以下答案: 1 , 2 , 3 , 4 , 5 , 6 和 7 。
I'm attempting to scrape auction data from a website (https://www.rbauction.com/heavy-equipment-auctions). My current attempt was to use the below code to pull the website's HTML into VBA and then sparce through it and keep only the items I wanted (auction name, number of days, number of items).
Sub RBA_Auction_Scrape()
Dim S_Sheet As Worksheet: Set S_Sheet = ActiveWorkbook.ActiveSheet
Dim Look_String As String
On Error GoTo ERR_LABEL:
Dim Web_HTML As String
Dim HTTP_OBJ As New MSXML2.XMLHTTP60
Web_HTML = ""
HTTP_OBJ.Open "GET", "https://www.rbauction.com/heavy-equipment auctions", False
HTTP_OBJ.Send
On Error Resume Next
Select Case HTTP_OBJ.Status
Case 0: Web_HTML = HTTP_OBJ.responseText
Case 200: Web_HTML = HTTP_OBJ.responseText
Case Else: GoTo ERR_LABEL:
End Select
Debug.Print (Web_HTML)
It successfully pulls in the data, but the 'upcoming heavy equipment auction' section that has all of the names and sizes of the auctions does not get pulled into VBA. I'm not very good with HTML in general but I was hoping someone could offer a solution or at least an explanation as to when I search through the website HTML that is pulled into VBA, the articles that I want are not found.
Please help!!!
The webpage source HTML by the link provided https://www.rbauction.com/heavy-equipment-auctions doesn't contain the necessary data, it uses AJAX. The website https://www.rbauction.com has an API available. Response is returned in JSON format. Navigate the page e. g. in Chrome, then open Developer Tools window (F12), Network tab, reload (F5) the page and examine logged XHRs. Most relevant data is JSON string returned by the URL https://www.rbauction.com/rba-api/calendar/v1?e1=true:
You may use the below VBA code to retrieve info as described above. Import JSON.bas module into the VBA project for JSON processing.
Option Explicit
Sub Test_www_rbauction_com()
Const Transposed = False ' Output option
Dim sResponse As String
Dim vJSON
Dim sState As String
Dim i As Long
Dim aRows()
Dim aHeader()
' Retrieve JSON data
XmlHttpRequest "GET", "https://www.rbauction.com/rba-api/calendar/v1?e1=true", "", "", "", sResponse
' Parse JSON response
JSON.Parse sResponse, vJSON, sState
If sState <> "Object" Then
MsgBox "Invalid JSON response"
Exit Sub
End If
' Pick core data
vJSON = vJSON("auctions")
' Extract selected properties for each item
For i = 0 To UBound(vJSON)
Set vJSON(i) = ExtractKeys(vJSON(i), Array("eventId", "name", "date", "itemCount"))
DoEvents
Next
' Convert JSON structure to 2-d arrays for output
JSON.ToArray vJSON, aRows, aHeader
' Output
With ThisWorkbook.Sheets(1)
.Cells.Delete
If Transposed Then
Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
Else
OutputArray .Cells(1, 1), aHeader
Output2DArray .Cells(2, 1), aRows
End If
.Columns.AutoFit
End With
MsgBox "Completed"
End Sub
Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String)
Dim arrHeader
'With CreateObject("Msxml2.ServerXMLHTTP")
' .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
With CreateObject("MSXML2.XMLHTTP")
.Open sMethod, sUrl, False
If IsArray(arrSetHeaders) Then
For Each arrHeader In arrSetHeaders
.SetRequestHeader arrHeader(0), arrHeader(1)
Next
End If
.send sFormData
sRespHeaders = .GetAllResponseHeaders
sContent = .responseText
End With
End Sub
Function ExtractKeys(oSource, aKeys, Optional oDest = Nothing) As Object
Dim vKey
If oDest Is Nothing Then Set oDest = CreateObject("Scripting.Dictionary")
For Each vKey In aKeys
If oSource.Exists(vKey) Then
If IsObject(oSource(vKey)) Then
Set oDest(vKey) = oSource(vKey)
Else
oDest(vKey) = oSource(vKey)
End If
End If
Next
Set ExtractKeys = oDest
End Function
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
The output for me is as follows:
BTW, the same approach applied in the following answers: 1, 2, 3, 4, 5, 6 and 7.
这篇关于VBA - HTML抓取问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!