如何在变量中保存对由`querySelectorAll` 匹配的项目的引用,以允许您访问其方法? [英] How to hold a reference to the items matched by `querySelectorAll`, in a variable, that allows you to access its methods?

查看:31
本文介绍了如何在变量中保存对由`querySelectorAll` 匹配的项目的引用,以允许您访问其方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介:

你们中的一些人可能已经注意到与

然后,您可以访问与指定选择器组匹配的文档元素列表,使用 .item(index),并获取与 .Length<匹配的项目数/代码>.例如

Debug.Print nodeList1.item(0).innerTextDebug.Print nodeList1.Length


现在发生了什么?

尝试通过后期绑定的 Object 及其底层接口访问方法,在使用 .item() 时会导致 Object required 方法调用,或查询 .Length() 时的 Null.例如

nodeList1.item(0).innertext ' =>运行时错误424":需要对象Debug.Print nodeList1.Length ' =>空值

当您通过分配给变量来持有引用时会发生这种情况.


您可以做什么:

您可以使用 With 并处理 html,避免使用 Object

使用 html.querySelectorAll("a")对于 i = 0 到 .Length - 1Debug.Print .Item(i).innerText下一个结束于

所以,我认为问题主要在于Object 数据类型及其底层接口.并且可能,与 MSHTML 相关的某些事情已经被破坏,而且很可能是现在不再受支持的 Internet Explorer,它位于后台:

然而,这是不可取的,因为您在循环期间解析和重新解析相同的 HTML,失去了通过选择 css 选择器而不是传统方法获得的大部分效率,例如getElementsByClassName.那些传统方法保持不变.


为什么我们中的一些人会关心?

现代浏览器(甚至 IE8 以后)通过使用 css 选择器支持更快的节点匹配.假设这通过 MSHTML.HTMLDocument 传递到 DOM 解析器似乎是合理的.因此,您可以更快地匹配,并结合更具表现力和简洁的语法(没有那些长链方法调用,例如 getElementsByClassName(abc")(0).getElementsByTagName(def")(0).....),无需重复调用即可返回更多所需节点的能力(在前面的示例中,您将仅获得 def 作为类 abc,而不是带有 def 标签的所有子元素,属于 abc 类的所有元素,您可以使用 querySelectorAll(".abc def").而且,您失去了为节点匹配指定更复杂和特定模式的灵活性,例如 querySelectorAll(".abc > def + #ghi).对于那些感兴趣的人,您可以在 MSDN 上阅读有关这些选择器的更多信息.


问题:

那么,如何避免重新解析,并保留对返回的匹配节点列表的引用?尽管进行了大量搜索,但我在互联网上没有找到任何记录最近行为变化的内容.这也是最近发生的变化,可能只会影响一小部分用户群.

我希望以上内容满足证明对问题进行研究的需要.


我的设置:

操作系统名称 Microsoft Windows 10 Pro版本 10.0.19042 内部版本 19042系统类型基于 x64 的 PCMicrosoft® Excel® 2019 MSO (16.0.13929.20206) 32 位 (Microsoft Office Professional Plus)版本 2104 内部版本 13929.20373根据图像的 mshtml.dll 信息


不受影响(待定):

  1. Office Professional plus 2013.Win 7,32 位,MSHTML.dll 11.0.9600.19597

解决方案

不要对 VBA web-scrapers 绝望(我知道有几个!)我们仍然可以享受 css 选择器的奢侈和好处,尽管诚然有些有限在 VBA 中,他们带来了.

救援:

MSHTML免费 IE,提供了许多脚本对象接口.其中之一是 IHTMLDOMChildrenCollection 接口,继承自 IDispatch,其中:

<块引用>

提供访问集合中项目的方法.

这包括 .Length 属性和通过 .item(index) 访问项目.

Dim nodeList2 As MSHTML.IHTMLDOMChildrenCollection设置 nodeList2 = html.querySelectorAll("a")Debug.Print nodeList2.Length ' =>nDebug.Print nodeList2.Item(0).innerText

这在 Windows XP + 客户端和 Windows 2000 Server 以后的服务器上受支持.


VBA:

公共子评论NodeListMethods()'' 参考资料(VBE > 工具 > 参考资料):''微软 HTML 对象库''Microsoft XML 库(对我来说是 v.6)Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument 'XMLHTTP60 适用于 Excel 2016.根据您的版本进行更改,例如2013 年的 XMLHTTP)设置 http = 新 MSXML2.XMLHTTP60:设置 html = 新 MSHTML.HTMLDocument使用 http.打开GET"、http://books.toscrape.com/"、False.发送html.body.innerHTML = .responseText结束于Dim nodeList1 As Object, nodeList2 As MSHTML.IHTMLDOMChildrenCollection设置 nodeList1 = html.querySelectorAll("a")设置 nodeList2 = html.querySelectorAll("a")Debug.Print nodeList1.Length ' =>空值Debug.Print nodeList2.Length ' =>94Debug.Print nodeList2.Item(0).innerText' Dim i As Long'' 使用 html.querySelectorAll("a")' 对于 i = 0 到 .Length - 1' Debug.Print .Item(i).innerText'        下一个' 以'' ================警告:这会导致 Excel 崩溃 -===========================' Dim 节点作为 MSHTML.IHTMLDOMNode'' 对于 nodeList2 中的每个节点' Debug.Print node.innerText'    下一个'' ================警告:这会导致 Excel 崩溃 -===========================结束子


注意 仍然存在 潜在的问题集合枚举方法;如果您尝试 For Each 例如

,它会导致 Excel 崩溃

Dim node As MSHTML.IHTMLDOMNode对于 nodeList2 中的每个节点Debug.Print node.innerText下一个


更新旧的问题/答案:

  1. 您可以使用此SEDE 查询确定潜在的修订候选人.输入您的用户 ID 和搜索词querySelectorAll"
  2. 或者只需在搜索栏中使用以下内容:querySelectorAll user:是:答案 ;querySelectorAll user:是:问题

Intro:

Some of you may have noticed that something has broken in relation to the querySelectorAll method of MSHTML.HTMLDocument from MSHTML.Dll (via a Microsoft HTML Document Library reference). This, I believe, has happened in the last month. It may not affect all users and I will update this Q&A as I get more info on which versions etc are affected. Please feel free to comment below with your set-up and whether working or not for both late-bound and early-bound (as per code in answer)


Accessing DispStaticNodeList methods:

Traditionally, at least in my experience, it has been the norm to hold a reference to the DispStaticNodeList, which is what querySelectorAll returns, in a generic late-bound Object type:

E.g.

Dim nodeList1 As Object

Set nodeList1 = html.querySelectorAll("a")

where html is an instance of MSHTML.HTMLDocument.

As you can see from the Locals window, you get the expected nodeList shown:

You could then access the list of the document's elements, that match the specified group of selectors, with .item(index), and get the number of items matched with .Length. E.g.

Debug.Print nodeList1.item(0).innerText
Debug.Print nodeList1.Length


What happens now?

Attempts to access the methods, via late bound Object, and its underlying interfaces, lead to either an Object required, when using the .item() method call, or Null when querying the .Length(). E.g.

nodeList1.item(0).innertext  ' => Run-time error '424': Object required
Debug.Print nodeList1.Length ' => Null 

This happens when you hold a reference through assigning to a variable.


What you can do:

You can use With and work off html, avoiding the Object class

With html.querySelectorAll("a")
    For i = 0 To .Length - 1
       Debug.Print .Item(i).innerText
    Next
End With

So, I think the problem is very much about the Object data type and its underlying interfaces. And possibly, something about this has broken in relation to MSHTML, and most likely, the now no longer supported, Internet Explorer, which sits in the background:

However, this is not desirable, as you parse, and re-parse, the same HTML, during the loop, losing much of the efficiency that its gained by choosing css selectors over traditional methods e.g. getElementsByClassName. Those traditional methods remain intact.


Why do some of us care?

Modern browsers (and even IE8 onwards) support faster node matching through use of css selectors. It seems reasonable to assume that this carried over into the DOM parsers with MSHTML.HTMLDocument. So, you have faster matching, combined with more expressive and concise syntax (none of those long chained method calls e.g. getElementsByClassName("abc")(0).getElementsByTagName("def")(0).....), the ability to return more desired nodes, without repeated calls (in the prior example you will only get def as children of the first element with class abc, rather than all children, with tag def, of all elements with class abc, which you would get with querySelectorAll(".abc def"). And, you lose the flexibility to specify much more complex and specific patterns for node matching e.g. querySelectorAll(".abc > def + #ghi). For those interested, you can read more about those selectors on MSDN.


Question:

So, how does one avoid re-parsing, and hold the reference to the returned list of matched nodes? I have found nothing on the internet, despite quite a bit of searching, that documents this recent change in behaviour. It is also a very recent change and that likely only affects a small user base.

I hope the above satisfies the need to demonstrate research into the problem.


My set-up:

OS Name Microsoft Windows 10 Pro
Version 10.0.19042 Build 19042
System Type x64-based PC
Microsoft® Excel® 2019 MSO (16.0.13929.20206) 32-bit (Microsoft Office Professional Plus)
Version 2104 Build 13929.20373
mshtml.dll info as per image


Not affected (TBD):

  1. Office Professional plus 2013. Win 7, 32 bit, MSHTML.dll 11.0.9600.19597

解决方案

Do not despair VBA web-scrapers (I know there are a few!) We can still have the luxury of css selectors and the benefits, though admittedly somewhat limited in VBA, that they bring.

To the rescue:

MSHTML, gratias IE, offers a number of scripting object interfaces . One of which is the IHTMLDOMChildrenCollection interface, which inherits from IDispatch, and which:

provides methods to access items in the collection.

This includes the .Length property and access to items via .item(index).

Dim nodeList2 As MSHTML.IHTMLDOMChildrenCollection

Set nodeList2 = html.querySelectorAll("a")
Debug.Print nodeList2.Length                 ' => n 
Debug.Print nodeList2.Item(0).innerText

This is supported on clients Windows XP +, and servers from Windows 2000 Server onwards.


VBA:

Public Sub ReviewingNodeListMethods()
    '' References (VBE > Tools > References):
          ''Microsoft HTML object Library
          ''Microsoft XML library (v.6 for me)

    Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument   'XMLHTTP60 is for Excel 2016. Change according to your version e.g. XMLHTTP for 2013)
    
    Set http = New MSXML2.XMLHTTP60: Set html = New MSHTML.HTMLDocument
    
    With http
        .Open "GET", "http://books.toscrape.com/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim nodeList1 As Object, nodeList2 As MSHTML.IHTMLDOMChildrenCollection
    
    Set nodeList1 = html.querySelectorAll("a")
    Set nodeList2 = html.querySelectorAll("a")
  
    Debug.Print nodeList1.Length                 ' => Null
    Debug.Print nodeList2.Length                 ' => 94
    
    Debug.Print nodeList2.Item(0).innerText
    
    '    Dim i As Long
    '
    '    With html.querySelectorAll("a")
    '        For i = 0 To .Length - 1
    '           Debug.Print .Item(i).innerText
    '        Next
    '    End With
    
    '' ================Warning: This will crash Excel -============================

    '    Dim node As MSHTML.IHTMLDOMNode
    '
    '    For Each node In nodeList2
    '        Debug.Print node.innerText
    '    Next
    '' ================Warning: This will crash Excel -============================

End Sub


N.B. There is still the underlying problem of the collection enumeration method; it causes Excel to crash if you attempt a For Each e.g.

Dim node As MSHTML.IHTMLDOMNode

For Each node In nodeList2
    Debug.Print node.innerText
Next


Updating your old Questions/Answers:

  1. You can use this SEDE query to identify potential candidates for revision. Enter your userid and the search term "querySelectorAll"
  2. Or simply use the following in the search bar: querySelectorAll user:<userid> is:answer ; querySelectorAll user:<userid> is:question

这篇关于如何在变量中保存对由`querySelectorAll` 匹配的项目的引用,以允许您访问其方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆