在 HTMLElement 上使用 getElementById 而不是 HTMLDocument [英] Use getElementById on HTMLElement instead of HTMLDocument
问题描述
我一直在尝试使用 VBS/VBA 从网页中抓取数据.
I've been playing around with scraping data from web pages using VBS/VBA.
如果它是 Javascript,我会很容易离开,但在 VBS/VBA 中似乎没有那么简单.
If it were Javascript I'd be away as its easy, but it doesn't seem to be quite as straight forward in VBS/VBA.
这是我为回答而制作的示例,它有效,但我曾计划使用 getElementByTagName
访问子节点,但我不知道如何使用它们!HTMLElement
对象没有这些方法.
This is an example I made for an answer, it works but I had planned on accessing the child nodes using getElementByTagName
but I could not figure out how to use them! The HTMLElement
object does not have those methods.
Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set Browser = New InternetExplorer
Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"
Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Document = Browser.Document
Set Elements = Document.getElementsByClassName("profile-col1")
For Each Element in Elements
Debug.Print "[ name] " & Trim(Element.Children(1).Children(0).innerText)
Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element
Set Document = Nothing
Set Browser = Nothing
End Sub
我一直在查看 HTMLElement.document
属性,看看它是否像文档的一个片段,但它要么难以使用,要么不是我认为的
I have been looking at the HTMLElement.document
property, seeing if it is like a fragment of the document but its either difficult to work with or just isnt what I think
Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't
这似乎也是一种冗长的方法(尽管这通常是 vba imo 的方法).有谁知道是否有更简单的方法来链接函数?
This also seems a long winded way to do it (although thats usually the way for vba imo). Anyone know if there is a simpler way to chain functions?
Document.getElementById("target").getElementsByTagName("tr")
会很棒...
推荐答案
我也不喜欢.
所以使用javascript:
So use javascript:
Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String
Dim el As IHTMLElement
Dim nd As HTMLDOMTextNode
Set el = doc.createElement("INPUT")
Do
el.ID = GenerateRandomAlphaString(100)
Loop Until Document.getElementById(el.ID) Is Nothing
el.Style.display = "none"
Set nd = Document.appendChild(el)
doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString
GetJavaScriptResult = Document.getElementById(el.ID).Value
Document.removeChild nd
End Function
Function GenerateRandomAlphaString(Length As Long) As String
Dim i As Long
Dim Result As String
Randomize Timer
For i = 1 To Length
Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32))
Next i
GenerateRandomAlphaString = Result
End Function
如果您对此有任何问题,请告诉我;我已将上下文从方法更改为函数.
Let me know if you have any problems with this; I've changed the context from a method to a function.
顺便问一下,您使用的是什么版本的 IE?我怀疑你在 <IE8.如果您升级到 IE8,我认为它会将 shdocvw.dll 更新为 ieframe.dll,您将能够使用 document.querySelector/All.
By the way, what version of IE are you using? I suspect you're on < IE8. If you upgrade to IE8 I presume it'll update shdocvw.dll to ieframe.dll and you will be able to use document.querySelector/All.
编辑
评论回复不是真正的评论:基本上,在 VBA 中执行此操作的方法是遍历子节点.问题是你没有得到正确的返回类型.您可以通过创建自己的类(分别)实现 IHTMLElement 和 IHTMLElementCollection 来解决此问题;但这对我来说太痛苦了而没有得到报酬:).如果您下定决心,请阅读 VB6/VBA 的实现关键字.
Comment response which isn't really a comment: Basically the way to do this in VBA is to traverse the child nodes. The problem is you don't get the correct return types. You could fix this by making your own classes that (separately) implement IHTMLElement and IHTMLElementCollection; but that's WAY too much of a pain for me to do it without getting paid :). If you're determined, go and read up on the Implements keyword for VB6/VBA.
Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection
Dim descendants As New Collection
Dim results As New Collection
Dim i As Long
getDescendants el, descendants
For i = 1 To descendants.Count
If descendants(i).tagname = tagname Then
results.Add descendants(i)
End If
Next i
getSubElementsByTagName = results
End Function
Public Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection)
Dim i As Long
descendants.Add nd
For i = 1 To nd.Children.Length
getDescendants nd.Children.Item(i), descendants
Next i
End Function
这篇关于在 HTMLElement 上使用 getElementById 而不是 HTMLDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!