Excel VBA HTML 嵌套查询选择器 [英] Excel VBA HTML Nested QuerySelector

查看:32
本文介绍了Excel VBA HTML 嵌套查询选择器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个 html 页面的摘录:

<html lang="zh-cn"><头><meta charset="UTF-8"><title>文档</title><身体><div class="BoxBody"> 找到 20 条记录.</span><p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar','2');"title="转到第 2 页">2[<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="最后一页" href="javascript:paginacao('paginar','last');">Last</a>]</span></p><br> 找到 25 条记录.</span><p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar2','2');"title="转到第 2 页">2[<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="最后一页" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>

</html>

我正在尝试获取具有下一个"页面href(如果有)的anchor 标记.

我使用 Firefox 在控制台中尝试了此操作,并且可以正常工作:

document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='下一页']")

我也使用 querySelector 提供了一个示例 VBA 代码,但它因 Invalid argument 而失败.

子测试()Dim oFSO 作为对象,分页器作为对象Dim oFS 作为对象,sText 作为字符串Set oFSO = CreateObject("Scripting.FileSystemObject")设置 oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\example.html")直到 oFS.AtEndOfStreamsText = oFS.ReadAll()环形Dim html As HTMLDocument, html2 As Object设置 html = 新建 HTMLDocument设置 html2 = htmlhtml2.Write sTextSet paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")结束子

这是什么原因造成的?p:nth-child(2) 标识符?我应该如何使用 VBA 提取该元素?

解决方案

nth-child(2) 在 VBA 中不受支持,并且确实导致了错误消息.您不能使用 :nth-child():nth-of-type().在处理伪类的库中几乎没有实现.您可以有趣地使用 first-child.您还会发现可以将 querySelector 链接到哪些对象上受到限制.

Dim ele As Object, iText As StringSet ele = html.querySelector(".BoxBody > p > span:first-child > a[title='下一页']")出错时继续下一步iText = ele.href出错时转到 0If iText = vbNullString Then '<== 这假设 href 有一个值,否则使用 On Error GoTo 处理错误并打印no href"Debug.Print无href"别的调试.打印href"万一


29/5/21 截至上个月的某个时间点(?),已经可以广泛使用 element.querySelector 以及大多数标准伪类选择器(至少对于 Windows 10、MSHTML.DLL 11.00.19041.985(修改日期为 12/5/21)

Consider this extract of an html page:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>

I am trying to get the anchor tag that has the "next" page href (if it has one).

I tried this in the console using Firefox and it works:

document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

I put up a sample VBA code using querySelector as well, but it fails with Invalid argument.

Sub test()

Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\example.html")

Do Until oFS.AtEndOfStream
    sText = oFS.ReadAll()
Loop


Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText

Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

End Sub

What is causing this? The p:nth-child(2) identifier? How should I go to extract that element using VBA?

解决方案

nth-child(2) is not supported in VBA and is indeed causing the error message. You can't use :nth-child() or :nth-of-type(). There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child interestingly. You will also find you are limited on which objects you can chain querySelector on.

Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
   
On Error Resume Next
iText = ele.href
On Error GoTo 0

If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
    Debug.Print "No href"
Else
   Debug.Print "href"
End If


EDIT: 29/5/21 As of some point in last month (?) it has become possible to use element.querySelector widely as well as the most of the standard pseudo-class selectors (at least for Windows 10, MSHTML.DLL 11.00.19041.985 (Date modified 12/5/21)

这篇关于Excel VBA HTML 嵌套查询选择器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆