VBA-Web Scraping-无法访问表格网页 [英] VBA-Web Scraping- Can't acces table web page
问题描述
我试图抓取此网站中的数据价格表 https://www.energylive.cloud/ ,就像我在其他网站上所做的那样,但我不能(我没有太多的抓取经验).提前致谢!!!:
I tried to scrape the data prices table in this web https://www.energylive.cloud/ , like I did in other webs, but I can't (I don't have much experience scraping). Thanks in advance!!!:
Sub ej()
Dim XMLrequest As New MSXML2.XMLHTTP60
Dim HTMLdoc As New MSHTML.HTMLDocument
Dim HTMLtable As MSHTML.IHTMLElement
'Dim HTMLi As MSHTML.IHTMLElementCollection
Dim url As String
url = "https://www.energylive.cloud/"
XMLrequest.Open "GET", url, False
XMLrequest.send
If XMLrequest.Status <> 200 Then
MsgBox XMLrequest.Status & XMLrequest.statusText
End If
HTMLdoc.body.innerHTML = XMLrequest.responseText
'debug.print htmldoc.body.innerText 'I checked here but the table is not here
Set HTMLtable = HTMLdoc.getElementById("price_table")
'Debug.Print HTMLtable.ID
End Sub
推荐答案
您查找的内容在该页面中不可用.它是动态添加的.这是链接,您可以在其中找到您可以使用 xhr 抓取所需的静态内容.要找出该链接,您需要使用 chrome 开发工具或类似工具.打开开发工具后,选择网络选项卡,然后尝试重新加载页面以观察 All
或 xhr
中应该找到该链接的网络活动.
The content you look for is not available in that page. It's added dynamically. This is the link where you can find the desired content which are static that you can grab using xhr. To find out that link you need to make use of chrome dev tools or something similar. After opening dev tools, select network tab and then try reloading the page to observe network activity within All
or xhr
where you should find that link.
从 json 响应中解析所需的内容并不容易,尤其是当您使用 vba 时,因为没有这样的内置库来帮助您获取它们.不过,更常见的方法是使用任何第三方 json 转换器.
It's not that easy to parse required content out of json response especially when you are using vba as there is no such built-in library to help you grab them. The more common approach though is to go for any third party json converter.
然而,我在这里使用了正则表达式,它似乎完美地抓取了数据.运行脚本时,应该眨眼间就能得到所有的表格内容.
However, I've used regex here which seems to have grabbed the data flawlessly. When you run the script, you should get all the tabular content with the blink of an eye.
Sub FetchTabularData()
Const mainUrl$ = "https://www.energylive.cloud/pwr-hour/get-index-averages?callback=%3F"
Dim I&, S$, Elem As Object, subElemName As Object
Dim subElemChange As Object, subElemPrice As Object
Dim subElemMtd As Object, subElemYtd As Object, R As Long: R = 1
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", mainUrl, False
.send
S = .responseText
End With
ws.Range("A1:E1") = [{"Index","Value","Changes","Month To Date","Year To Date"}]
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "\[?{[\s\S]+?\},?"
Set Elem = .Execute(S)
For I = 0 To Elem.count - 1
.Pattern = "Index""\:""(.*?)"","
Set subElemName = .Execute(Elem(I))
.Pattern = "Value""\:""(.*?)\"","
Set subElemPrice = .Execute(Elem(I))
.Pattern = "Perc""\:""(.*?)"","
Set subElemChange = .Execute(Elem(I))
.Pattern = "Month-to-date""\:""(.*?)"","
Set subElemMtd = .Execute(Elem(I))
.Pattern = "Year-to-date""\:""(.*?)"""
Set subElemYtd = .Execute(Elem(I))
R = R + 1: ws.Cells(R, 1) = subElemName(0).submatches(0)
ws.Cells(R, 2) = subElemPrice(0).submatches(0)
ws.Cells(R, 3) = subElemChange(0).submatches(0) & "%"
ws.Cells(R, 4) = subElemMtd(0).submatches(0)
ws.Cells(R, 5) = subElemYtd(0).submatches(0)
Next I
End With
End Sub
PS 你不需要添加任何对库的引用来执行上面的脚本.只需确保您的 Excel 工作簿中有一个名为 Sheet1
的工作表.
PS You don't need to add any reference to the library to execute the above script. Just make sure you have a sheet named Sheet1
in your excel workbook.
这篇关于VBA-Web Scraping-无法访问表格网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!