在 HTMLElement 上使用 getElementById 而不是 HTMLDocument [英] Use getElementById on HTMLElement instead of HTMLDocument

查看:36
本文介绍了在 HTMLElement 上使用 getElementById 而不是 HTMLDocument的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用 VBS/VBA 从网页中抓取数据.

I've been playing around with scraping data from web pages using VBS/VBA.

如果它是 Javascript,我会很容易离开,但在 VBS/VBA 中似乎没有那么简单.

If it were Javascript I'd be away as its easy, but it doesn't seem to be quite as straight forward in VBS/VBA.

这是我为回答而制作的示例,它有效,但我曾计划使用 getElementByTagName 访问子节点,但我不知道如何使用它们!HTMLElement 对象没有这些方法.

This is an example I made for an answer, it works but I had planned on accessing the child nodes using getElementByTagName but I could not figure out how to use them! The HTMLElement object does not have those methods.

Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement

Set Browser = New InternetExplorer

Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"

Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
    DoEvents
Loop

Set Document = Browser.Document

Set Elements = Document.getElementsByClassName("profile-col1")

For Each Element in Elements
    Debug.Print "[  name] " & Trim(Element.Children(1).Children(0).innerText)
    Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element

Set Document = Nothing
Set Browser = Nothing
End Sub

我一直在查看 HTMLElement.document 属性,看看它是否像文档的一个片段,但它要么难以使用,要么不是我认为的

I have been looking at the HTMLElement.document property, seeing if it is like a fragment of the document but its either difficult to work with or just isnt what I think

Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't

这似乎也是一种冗长的方法(尽管这通常是 vba imo 的方法).有谁知道是否有更简单的方法来链接函数?

This also seems a long winded way to do it (although thats usually the way for vba imo). Anyone know if there is a simpler way to chain functions?

Document.getElementById("target").getElementsByTagName("tr") 会很棒...

推荐答案

我也不喜欢.

所以使用javascript:

So use javascript:

Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String

    Dim el As IHTMLElement
    Dim nd As HTMLDOMTextNode

    Set el = doc.createElement("INPUT")
    Do
        el.ID = GenerateRandomAlphaString(100)
    Loop Until Document.getElementById(el.ID) Is Nothing
    el.Style.display = "none"
    Set nd = Document.appendChild(el)

    doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString

    GetJavaScriptResult = Document.getElementById(el.ID).Value

    Document.removeChild nd

End Function


Function GenerateRandomAlphaString(Length As Long) As String

    Dim i As Long
    Dim Result As String

    Randomize Timer

    For i = 1 To Length
        Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32))
    Next i

    GenerateRandomAlphaString = Result

End Function

如果您对此有任何问题,请告诉我;我已将上下文从方法更改为函数.

Let me know if you have any problems with this; I've changed the context from a method to a function.

顺便问一下,您使用的是什么版本的 IE?我怀疑你在 <IE8.如果您升级到 IE8,我认为它会将 shdocvw.dll 更新为 ieframe.dll,您将能够使用 document.querySelector/All.

By the way, what version of IE are you using? I suspect you're on < IE8. If you upgrade to IE8 I presume it'll update shdocvw.dll to ieframe.dll and you will be able to use document.querySelector/All.

编辑

评论回复不是真正的评论:基本上,在 VBA 中执行此操作的方法是遍历子节点.问题是你没有得到正确的返回类型.您可以通过创建自己的类(分别)实现 IHTMLElement 和 IHTMLElementCollection 来解决此问题;但这对我来说太痛苦了而没有得到报酬:).如果您下定决心,请阅读 VB6/VBA 的实现关键字.

Comment response which isn't really a comment: Basically the way to do this in VBA is to traverse the child nodes. The problem is you don't get the correct return types. You could fix this by making your own classes that (separately) implement IHTMLElement and IHTMLElementCollection; but that's WAY too much of a pain for me to do it without getting paid :). If you're determined, go and read up on the Implements keyword for VB6/VBA.

Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection

    Dim descendants As New Collection
    Dim results As New Collection
    Dim i As Long

    getDescendants el, descendants

    For i = 1 To descendants.Count
        If descendants(i).tagname = tagname Then
            results.Add descendants(i)
        End If
    Next i

    getSubElementsByTagName = results

End Function

Public Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection)
    Dim i As Long
    descendants.Add nd
    For i = 1 To nd.Children.Length
        getDescendants nd.Children.Item(i), descendants
    Next i
End Function

这篇关于在 HTMLElement 上使用 getElementById 而不是 HTMLDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆