VBA-Web抓取无法获取HTMLElement innerText [英] VBA - web scraping can not get HTMLElement innerText

查看:180
本文介绍了VBA-Web抓取无法获取HTMLElement innerText的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用excel VBA取消汇率,但无法获取所需的innerText值.我不明白为什么,因为在其他网站上也可以使用相同的技术.

I'm trying to scrap the exchange rates using excel VBA but I can not get the innerText value I need. I don't understand why because the same technique works on the other sites.

URL- https://www.nbs .rs/export/sites/default/internet/english/scripts/kl_srednji.html

Sub GetCurr()

Dim tempHTMLDoc As New MSHTML.HTMLDocument
Dim HTMLCurrency As MSHTML.IHTMLElementCollection
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLDate As MSHTML.IHTMLElementCollection
Dim HTMLElem As MSHTML.IHTMLElement
Dim connectionTest As Boolean
Dim EUR, CZK, HRK, HUF, PLN, RON, RSD As String
Dim myURL As String
Dim i As Long

connectionTest = True
myURL = "https://www.nbs.rs/export/sites/default/internet/english/scripts/kl_srednji.html"

Call WebConnection(tempHTMLDoc, connectionTest, myURL)
If connectionTest = False Then Exit Sub

Set HTMLDate = tempHTMLDoc.getElementsByTagName("span")
'Debug.Print HTMLDate.Length

For Each HTMLElem In HTMLDate 'I am looking for which element contains the date (can not find)
  Debug.Print HTMLElem.innerText
Next HTMLElem

'I am trying to get the necessary currencies
Set HTMLRows = tempHTMLDoc.getElementsByTagName("tr")

Debug.Print HTMLRows.Length

For i = 0 To HTMLRows.Length - 1 'If lenght > 0

    Set HTMLCurrency = HTMLRows(i).getElementsByTagName("td")

    If HTMLCurrency.Length > 4 Then 'each currency contains 5 "td" tags

        Select Case HTMLCurrency(2).innerText
            Case "EUR"
                EUR = HTMLCurrency(4).innerText
            Case "HRK"
                HRK = HTMLCurrency(4).innerText
            Case "HUF"
                HUF = HTMLCurrency(4).innerText
            Case "PLN"
                PLN = HTMLCurrency(4).innerText
            Case "RON"
                RON = HTMLCurrency(4).innerText
            Case "CZK"
                CZK = HTMLCurrency(4).innerText
        End Select

    End If

Next i

Debug.Print "EUR - ", EUR; vbNewLine; "HRK - ", HRK; vbNewLine; "HUF - ", HUF; vbNewLine; "PLN - ", PLN; vbNewLine; _
            "RON - ", RON; vbNewLine; "CZK - ", CZK

End Sub

'============================================================================

Sub WebConnection(HTMLDoc As MSHTML.HTMLDocument, ConnTest As Boolean, URL As String)

Dim XMLPage As New MSXML2.XMLHTTP60
Dim errorMsg As VbMsgBoxResult

On Error GoTo CONNECTION_ERROR

XMLPage.Open "GET", URL, False
XMLPage.send

On Error GoTo 0

If XMLPage.Status <> 200 Then
    errorMsg = MsgBox("There is something wrong with webpage. Do you want to try to continue?", vbYesNo + vbCritical, "ERROR")
    If errorMsg = vbNo Then
        ConnTest = False
        Exit Sub
    End If
End If

HTMLDoc.body.innerHTML = XMLPage.responseText
Exit Sub

CONNECTION_ERROR:
MsgBox "There is something wrong with the connection.", vbCritical, "ERROR"
ConnTest = False
Exit Sub

End Sub

我尝试使用id(index:srednjiKursList:tbody_element)或类名(tableCell),但它不起作用.该网站的构建方式不同

I tried to use id (index:srednjiKursList:tbody_element) or class name(tableCell) but it doesn't work. This website is built in a different way

推荐答案

您的原始链接(称为登录页面)是动态加载的.您的GET请求太快了,无法检索所需的信息.

Your original link, let's call it the landing page, is dynamically loaded. Your GET request is too quick to retrieve the required info.

您可以使用其他网址.

当您转到登录页面时,您会看到它实际上发出了 XMLHTTP GET 请求到以下页面:

When you go to the landing page you show it actually issues an XMLHTTP GET request to the following page:

以上内容来自使用 fiddler ,但是您可以使用以下方法检查网络流量,Chrome开发人员工具( F12 ).

The above is from using fiddler but you could inspect the web traffic with, for example, Chrome dev tools (F12).

您可以将该URL直接输入到您的代码中,并且效果很好.

You can input that URL directly into your code and it works perfectly.

整个表格:

您还可以按以下方式获取整个表格:

You can also grab the whole table as follows:

Option Explicit
Public Sub GetInfo()
    Dim html As New HTMLDocument, hTable As HTMLTable, clipboard As Object
    Const URL = "https://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=eng"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

    Set hTable = html.getElementById("index:srednjiKursLista")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
    clipboard.SetText hTable.outerHTML
    clipboard.PutInClipboard
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub


结果示例:

仅列出货币:

您还可以根据表结构使用一些数学运算来获取列出的那些元素.

You could also use a little maths, based on table structure, to get just those elements you listed.

Option Explicit
Public Sub GetInfo()
    Dim html As New HTMLDocument, hTable As HTMLTable, clipboard As Object
    Const URL = "https://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=eng"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

    Set hTable = html.getElementById("index:srednjiKursLista")
 
    Dim list As Object, i As Long
    Dim EUR As Double, CZK As Double, HRK As Double, HUF As Double, PLN As Double, RON As Double, RSD As Double
    Set list = hTable.querySelectorAll("td")
    For i = 2 To list.Length - 1 Step 5
        Select Case list.item(i).innerText
        Case "EUR"
            EUR = list.item(i + 2).innerText
        Case "HRK"
            HRK = list.item(i + 2).innerText
        Case "HUF"
            HUF = list.item(i + 2).innerText
        Case "PLN"
            PLN = list.item(i + 2).innerText
        Case "RON"
            RON = list.item(i + 2).innerText
        Case "CZK"
            CZK = list.item(i + 2).innerText
        End Select
    Next
 
    Debug.Print "EUR - ", EUR; vbNewLine; "HRK - ", HRK; vbNewLine; "HUF - ", HUF; vbNewLine; "PLN - ", PLN; vbNewLine; _
                                                                                                      "RON - ", RON; vbNewLine; "CZK - ", CZK
End Sub


使用剪贴板:

以下行:

GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

向Microsoft Forms对象库添加了后期绑定引用,以便您可以访问剪贴板.

adds a late bound reference to Microsoft Forms Object Library so you can access the clipboard.

您还可以在项目中添加用户表单,也可以进入VBE>工具>参考> Microsoft Forms对象库具有访问权限:

You could also either add a userform to your project or go VBE > Tools > references > Microsoft Forms Object Library to have access:

这篇关于VBA-Web抓取无法获取HTMLElement innerText的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆