如何从HTML div提取数据到Excel [英] How to extract data from HTML divs into Excel

查看:104
本文介绍了如何从HTML div提取数据到Excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提取此网页中的详细信息,它们似乎位于某些"div"下,其中"selection-left"和"selection-right"为右.我还没有找到成功拉出它的方法.这是网址-

我已尝试在此链接中使用QHar的方法-如何使用VBA 从嵌套的div中提取值.但是我在这方面遇到了错误-ReDim结果(1个国家/地区,长度/2,1至4)

这是我一直在努力工作的代码

 选项显式公共子GetData()Dim html作为HTMLDocument,ws作为工作表,国家作为对象,得分作为对象,results(),我作为长,只要长设置ws = ThisWorkbook.Worksheets("Sheet1"):设置html =新HTMLDocument:r = 1使用CreateObject("MSXML2.XMLHTTP").打开获取","https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/","False".发送html.body.innerHTML = .responseText结束于设置参与者= html.querySelectorAll(.market-content .selection-left"):设置分数= html.querySelectorAll(".. market-content .selection-right")ReDim结果(1寄往国家.长度/2,1到4)对于i = 0到参与者长度-1步骤2results(r,1)= partner.item(i).innerText:results(r,2)='"&scores.item(i).innerTextr = r + 1下一个ws.Cells(1,1).Resize(1,4)= Array(竞争",参与者",得分")ws.Cells(2,1).Resize(UBound(results,1),UBound(results,2))=结果结束子 

我将需要帮助以使此代码正常工作

解决方案

内容是动态添加的,因此不会以您当前的请求格式显示;因此,如果您的NodeList的长度为0,则会出现错误.您可以尝试像页面一样发出POST请求,但看起来并不像编码的代码那样简单快捷.如果这是一个小项目,我将使用浏览器自动化,以便js可以在页面上运行,并且您可以单击Show more按钮.您需要等待条件才能正确加载页面.我使用了显示更多按钮.

 选项显式公共子GetOddsIE()Dim d作为InternetExplorer,赔率作为Object,名称作为Object,iDim ws作为工作表,results(),竞争作为字符串设置d =新的InternetExplorer设置ws = ThisWorkbook.Worksheets("Sheet1")const URL ="https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"与d.Visible = False.Navigate2 URL而.Busy或.ReadyState<>4:DoEvents:Wend使用.Document.getElementsByClassName("expandable-below-container-button")做DoEvents.Length = 0'时循环,等待元素出现.Item(0).单击'单击以显示更多结束于设置名称= .Document.getElementsByClassName("selection-left-selection-name")设置赔率= .Document.getElementsByClassName(赔率转换")竞争= .Document.getElementsByClassName(联盟")(0).innerTextReDim结果(1到名称.长度,1到3)对于i = 0到名称.长度-1结果(i + 1,1)=竞争results(i + 1,2)=名称.Item(i).innerTextresults(i + 1,3)='"&odds.Item(i).innerText下一个.放弃结束于ws.Cells(1,1).Resize(1,3)= Array(竞争",参与者",得分")ws.Cells(2,1).Resize(UBound(results,1),UBound(results,2))=结果结束子 


I am trying to extract the details in this webpage and they seem to be under certain "divs" with "selection-left" and "selection-right" right. I haven't found a way to successfully pull it yet. This is the URL - https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/

And here is an image of what I want to extract. I want to copy the competition name and each participant and score.

I have tried using QHar's approach in this link - How to extract values from nested divs using VBA. But I'm getting errors along this line - ReDim results(1 To countries.Length / 2, 1 To 4)

Here is the code I've been trying to make work

Option Explicit

Public Sub GetData()
Dim html As HTMLDocument, ws As Worksheet, countries As Object, scores As Object, results(), i As 
Long, r As Long

Set ws = ThisWorkbook.Worksheets("Sheet1"): Set html = New HTMLDocument: r = 1

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/", False
    .send
    html.body.innerHTML = .responseText
End With

Set participant = html.querySelectorAll(".market-content .selection-left"): Set scores = html.querySelectorAll("..market-content .selection-right")
ReDim results(1 To countries.Length / 2, 1 To 4)

For i = 0 To participant.Length - 1 Step 2
    results(r, 1) = participant.item(i).innerText: results(r, 2) = "'" & scores.item(i).innerText

    r = r + 1
Next
ws.Cells(1, 1).Resize(1, 4) = Array("Competition", "Participant", "Score")
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

I will need help to make this code work

解决方案

Content is dynamically added so will not be present in your current request format; hence your error as you have a nodeList of Length 0. You could try making POST requests as the page does but it doesn't look like a quick and easy bit of coding. I would go with browser automation, if this is a small project, so that js can run on the page and you can click the show more button. You will need a wait condition for the page to have properly loaded. I use the presence of the show more button.

Option Explicit

Public Sub GetOddsIE()
    Dim d As InternetExplorer, odds As Object, names As Object, i As Long
    Dim ws As Worksheet, results(), competition As String

    Set d = New InternetExplorer
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Const URL = "https://sports.ladbrokes.com/en-af/betting/golf/golf-all-golf/us-masters/2020-us-masters/228648232/"

    With d
        .Visible = False
        .Navigate2 URL
        While .Busy Or .ReadyState <> 4: DoEvents: Wend
        With .Document.getElementsByClassName("expandable-below-container-button")
            Do
                DoEvents
            Loop While .Length = 0  'wait for element to be present
            .Item(0).Click 'click on show more
        End With

        Set names = .Document.getElementsByClassName("selection-left-selection-name")
        Set odds = .Document.getElementsByClassName("odds-convert")
        competition = .Document.getElementsByClassName("league")(0).innerText

        ReDim results(1 To names.Length, 1 To 3)

        For i = 0 To names.Length - 1
            results(i + 1, 1) = competition
            results(i + 1, 2) = names.Item(i).innerText
            results(i + 1, 3) = "'" & odds.Item(i).innerText
        Next
        .Quit
    End With
    ws.Cells(1, 1).Resize(1, 3) = Array("Competition", "Participant", "Score")
    ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub


这篇关于如何从HTML div提取数据到Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆