VBA-Web Scraping-无法访问表格网页 [英] VBA-Web Scraping- Can't acces table web page

查看:23
本文介绍了VBA-Web Scraping-无法访问表格网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图抓取此网站中的数据价格表 https://www.energylive.cloud/ ,就像我在其他网站上所做的那样,但我不能(我没有太多的抓取经验).提前致谢!!!:

I tried to scrape the data prices table in this web https://www.energylive.cloud/ , like I did in other webs, but I can't (I don't have much experience scraping). Thanks in advance!!!:

Sub ej()

Dim XMLrequest As New MSXML2.XMLHTTP60
Dim HTMLdoc As New MSHTML.HTMLDocument
Dim HTMLtable As MSHTML.IHTMLElement
'Dim HTMLi As MSHTML.IHTMLElementCollection


Dim url As String

url = "https://www.energylive.cloud/"

XMLrequest.Open "GET", url, False   
XMLrequest.send

If XMLrequest.Status <> 200 Then    
    MsgBox XMLrequest.Status & XMLrequest.statusText
End If

HTMLdoc.body.innerHTML = XMLrequest.responseText

'debug.print htmldoc.body.innerText    'I checked here but the table is not here

Set HTMLtable = HTMLdoc.getElementById("price_table")


    'Debug.Print HTMLtable.ID


End Sub

推荐答案

您查找的内容在该页面中不可用.它是动态添加的.这是链接,您可以在其中找到您可以使用 xhr 抓取所需的静态内容.要找出该链接,您需要使用 chrome 开发工具或类似工具.打开开发工具后,选择网络选项卡,然后尝试重新加载页面以观察 Allxhr 中应该找到该链接的网络活动.

The content you look for is not available in that page. It's added dynamically. This is the link where you can find the desired content which are static that you can grab using xhr. To find out that link you need to make use of chrome dev tools or something similar. After opening dev tools, select network tab and then try reloading the page to observe network activity within All or xhr where you should find that link.

从 json 响应中解析所需的内容并不容易,尤其是当您使用 vba 时,因为没有这样的内置库来帮助您获取它们.不过,更常见的方法是使用任何第三方 json 转换器.

It's not that easy to parse required content out of json response especially when you are using vba as there is no such built-in library to help you grab them. The more common approach though is to go for any third party json converter.

然而,我在这里使用了正则表达式,它似乎完美地抓取了数据.运行脚本时,应该眨眼间就能得到所有的表格内容.

However, I've used regex here which seems to have grabbed the data flawlessly. When you run the script, you should get all the tabular content with the blink of an eye.

Sub FetchTabularData()
    Const mainUrl$ = "https://www.energylive.cloud/pwr-hour/get-index-averages?callback=%3F"
    Dim I&, S$, Elem As Object, subElemName As Object
    Dim subElemChange As Object, subElemPrice As Object
    Dim subElemMtd As Object, subElemYtd As Object, R As Long: R = 1
    Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", mainUrl, False
        .send
        S = .responseText
    End With
    
    ws.Range("A1:E1") = [{"Index","Value","Changes","Month To Date","Year To Date"}]
    
    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True

        .Pattern = "\[?{[\s\S]+?\},?"
        Set Elem = .Execute(S)
        For I = 0 To Elem.count - 1
            .Pattern = "Index""\:""(.*?)"","
            Set subElemName = .Execute(Elem(I))
            .Pattern = "Value""\:""(.*?)\"","
            Set subElemPrice = .Execute(Elem(I))
            .Pattern = "Perc""\:""(.*?)"","
            Set subElemChange = .Execute(Elem(I))
            .Pattern = "Month-to-date""\:""(.*?)"","
            Set subElemMtd = .Execute(Elem(I))
            .Pattern = "Year-to-date""\:""(.*?)"""
            Set subElemYtd = .Execute(Elem(I))

            R = R + 1: ws.Cells(R, 1) = subElemName(0).submatches(0)
            ws.Cells(R, 2) = subElemPrice(0).submatches(0)
            ws.Cells(R, 3) = subElemChange(0).submatches(0) & "%"
            ws.Cells(R, 4) = subElemMtd(0).submatches(0)
            ws.Cells(R, 5) = subElemYtd(0).submatches(0)
        Next I
    End With
End Sub

PS 你不需要添加任何对库的引用来执行上面的脚本.只需确保您的 Excel 工作簿中有一个名为 Sheet1 的工作表.

PS You don't need to add any reference to the library to execute the above script. Just make sure you have a sheet named Sheet1 in your excel workbook.

这篇关于VBA-Web Scraping-无法访问表格网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆