如何从HTML div提取数据到Excel [英] How to extract data from HTML divs into Excel
问题描述
我正在尝试提取此网页中的详细信息,它们似乎位于某些"div"下,其中"selection-left"和"selection-right"为右.我还没有找到成功拉出它的方法.这是网址-
我已尝试在此链接中使用QHar的方法-如何使用VBA 从嵌套的div中提取值.但是我在这方面遇到了错误-ReDim结果(1个国家/地区,长度/2,1至4)
这是我一直在努力工作的代码
选项显式公共子GetData()Dim html作为HTMLDocument,ws作为工作表,国家作为对象,得分作为对象,results(),我作为长,只要长设置ws = ThisWorkbook.Worksheets("Sheet1"):设置html =新HTMLDocument:r = 1使用CreateObject("MSXML2.XMLHTTP").打开获取","https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/","False".发送html.body.innerHTML = .responseText结束于设置参与者= html.querySelectorAll(.market-content .selection-left"):设置分数= html.querySelectorAll(".. market-content .selection-right")ReDim结果(1寄往国家.长度/2,1到4)对于i = 0到参与者长度-1步骤2results(r,1)= partner.item(i).innerText:results(r,2)='"&scores.item(i).innerTextr = r + 1下一个ws.Cells(1,1).Resize(1,4)= Array(竞争",参与者",得分")ws.Cells(2,1).Resize(UBound(results,1),UBound(results,2))=结果结束子
我将需要帮助以使此代码正常工作
内容是动态添加的,因此不会以您当前的请求格式显示;因此,如果您的NodeList的长度为0,则会出现错误.您可以尝试像页面一样发出POST请求,但看起来并不像编码的代码那样简单快捷.如果这是一个小项目,我将使用浏览器自动化,以便js可以在页面上运行,并且您可以单击Show more按钮.您需要等待条件才能正确加载页面.我使用了显示更多按钮.
选项显式公共子GetOddsIE()Dim d作为InternetExplorer,赔率作为Object,名称作为Object,iDim ws作为工作表,results(),竞争作为字符串设置d =新的InternetExplorer设置ws = ThisWorkbook.Worksheets("Sheet1")const URL ="https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"与d.Visible = False.Navigate2 URL而.Busy或.ReadyState<>4:DoEvents:Wend使用.Document.getElementsByClassName("expandable-below-container-button")做DoEvents.Length = 0'时循环,等待元素出现.Item(0).单击'单击以显示更多结束于设置名称= .Document.getElementsByClassName("selection-left-selection-name")设置赔率= .Document.getElementsByClassName(赔率转换")竞争= .Document.getElementsByClassName(联盟")(0).innerTextReDim结果(1到名称.长度,1到3)对于i = 0到名称.长度-1结果(i + 1,1)=竞争results(i + 1,2)=名称.Item(i).innerTextresults(i + 1,3)='"&odds.Item(i).innerText下一个.放弃结束于ws.Cells(1,1).Resize(1,3)= Array(竞争",参与者",得分")ws.Cells(2,1).Resize(UBound(results,1),UBound(results,2))=结果结束子
I am trying to extract the details in this webpage and they seem to be under certain "divs" with "selection-left" and "selection-right" right. I haven't found a way to successfully pull it yet. This is the URL - https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/
And here is an image of what I want to extract. I want to copy the competition name and each participant and score.
I have tried using QHar's approach in this link - How to extract values from nested divs using VBA. But I'm getting errors along this line - ReDim results(1 To countries.Length / 2, 1 To 4)
Here is the code I've been trying to make work
Option Explicit
Public Sub GetData()
Dim html As HTMLDocument, ws As Worksheet, countries As Object, scores As Object, results(), i As
Long, r As Long
Set ws = ThisWorkbook.Worksheets("Sheet1"): Set html = New HTMLDocument: r = 1
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/", False
.send
html.body.innerHTML = .responseText
End With
Set participant = html.querySelectorAll(".market-content .selection-left"): Set scores = html.querySelectorAll("..market-content .selection-right")
ReDim results(1 To countries.Length / 2, 1 To 4)
For i = 0 To participant.Length - 1 Step 2
results(r, 1) = participant.item(i).innerText: results(r, 2) = "'" & scores.item(i).innerText
r = r + 1
Next
ws.Cells(1, 1).Resize(1, 4) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
I will need help to make this code work
Content is dynamically added so will not be present in your current request format; hence your error as you have a nodeList of Length 0. You could try making POST requests as the page does but it doesn't look like a quick and easy bit of coding. I would go with browser automation, if this is a small project, so that js can run on the page and you can click the show more button. You will need a wait condition for the page to have properly loaded. I use the presence of the show more button.
Option Explicit
Public Sub GetOddsIE()
Dim d As InternetExplorer, odds As Object, names As Object, i As Long
Dim ws As Worksheet, results(), competition As String
Set d = New InternetExplorer
Set ws = ThisWorkbook.Worksheets("Sheet1")
Const URL = "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"
With d
.Visible = False
.Navigate2 URL
While .Busy Or .ReadyState <> 4: DoEvents: Wend
With .Document.getElementsByClassName("expandable-below-container-button")
Do
DoEvents
Loop While .Length = 0 'wait for element to be present
.Item(0).Click 'click on show more
End With
Set names = .Document.getElementsByClassName("selection-left-selection-name")
Set odds = .Document.getElementsByClassName("odds-convert")
competition = .Document.getElementsByClassName("league")(0).innerText
ReDim results(1 To names.Length, 1 To 3)
For i = 0 To names.Length - 1
results(i + 1, 1) = competition
results(i + 1, 2) = names.Item(i).innerText
results(i + 1, 3) = "'" & odds.Item(i).innerText
Next
.Quit
End With
ws.Cells(1, 1).Resize(1, 3) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
这篇关于如何从HTML div提取数据到Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!