VBA仅提取< div>之间的选择信息.标签 [英] VBA extracting only select info between <div> tags

查看:62
本文介绍了VBA仅提取< div>之间的选择信息.标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试检查html标签:

 < nobr>目标</nobr> 

存在于页面上,如果存在,则搜索html标记之间的文本:

 < div style ='width:555px;-ms-overflow-x:自动;-ms-overflow-y:隐藏;>...</div> 

div标签之间的文本看起来很混乱:

  ABC [HSA:< a href ="..."> ...</a>] [KO:< a href ="..."> ...</a>]< br/>GHI-JK [JKI:... 

我想获取并打印到电子表格中,但是有很多项目,但是我只想要项目名称(在上面的示例中,有2个项目-ABC和GHI-JK).

当然我下面的代码行不通.,我认为我没有正确使用queryselector,我也不确定如何只获取项目名称,而不是标签之间的全部信息.

 如果IE.document.querySelector("nobr").innerHTML ="Target"然后如果IE.document.querySelector("div [style ^ ='width:555px; -ms-overflow-x:auto; -ms-overflow-y:hidden;']")<>0然后单元格(1,15).Value = IE.document.querySelector("div [style ^ ='width:555px; -ms-overflow-x:auto; -ms-overflow-y:hidden;']").innerText万一万一 

解决方案

CSS选择器:

您可以使用CSS选择器组合来定位感兴趣的元素.

数据在 div 中,位于类 td51 的元素内.

您可以编写CSS选择器组合来定位以下模式:

  .td51 div 

这表示带有 div 标记的元素,其父元素是 td51 类.其中." 是类选择器.

元素空间元素模式称为

当检索到多个项目时,您可以使用 querySelectorAll 来应用CSS组合器,并检索您索引到的 nodeList 以获得感兴趣的项目.

由于只希望检索部分信息,因此可以使用 split 来切片"所需的信息.请注意, Kit 并不是单独的 Kit ,而是 Kit(CD117).


XMLHTTPRequest XHR:

 选项显式公共子GetInfo()Dim sResponse作为字符串,i作为长,html作为新的HTMLDocument,arr()作为字符串,ele作为对象使用CreateObject("MSXML2.XMLHTTP").打开"GET","https://www.kegg.jp/dbget-bin/www_bget?dr:D01441",False.发送sResponse = StrConv(.responseBody,vbUnicode)结束于与HTML.body.innerHTML = sResponse关于错误继续设置ele = .querySelectorAll(.td51 div")(6)出错时转到0如果ele什么都没有,则退出Subarr =拆分(ele.innerText,Chr $(10))结束于对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我结束子 


参考(VBE>工具>参考):

  1. Microsoft HTML对象库


Internet Explorer:

 选项显式公共子GetInfo()昏暗即作为新的InternetExplorer,html作为HTMLDocument,arr()作为字符串,ele作为对象,i与即.Visible = True.navigate"https://www.kegg.jp/dbget-bin/www_bget?dr:D01441"而.Busy或.readyState<4:DoEvents:Wend设置html = .document关于错误继续设置ele = html.querySelectorAll(.td51 div")(6)出错时转到0如果ele什么都没有,则退出Subarr =拆分(ele.innerText,Chr $(10))对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我'.Quit'< ==记住退出应用程序结束于结束子 


参考:

  1. Microsoft Internet控件
  2. Microsoft HTML对象库


这已经很长了,但是经过我们的调试才能与您的其他代码合并:

 选项显式公众号ht()昏暗即作为对象,ele作为对象,我只要Dim sourceSheet作为工作表,lastRow作为Long,rawString()作为String,rowIndex作为Long昏暗的arrayOfValues()为变量,html为HTMLDocument,arr()为字符串const URL as String ="https://www.genome.jp/kegg/drug/"设置sourceSheet = Worksheets("Sheet1")lastRow = sourceSheet.Range("A30000").End(xlUp).RowarrayOfValues = sourceSheet.Range("A1:A"& lastRow)设置ie = CreateObject("InternetExplorer.Application")与即.Visible = True对于rowIndex = 1到lastRow.navigate网址.readyState<>4或.忙碌:DoEvents:循环rawString = VBA.Strings.Split(VBA.Strings.LCase $(arrayOfValues(rowIndex,1)),:",-1,vbBinaryCompare)'MsgBox rawString(1).document.querySelector("input [name = q]").Value = rawString(1).document.querySelector("input [value = Go]").Click.readyState<>4或.忙碌:DoEvents:循环Dim ele2作为对象关于错误继续设置ele2 = .document.querySelector("a [href ^ ='/dbget-bin/www_bget?dr:']")出错时转到0如果ele2什么都没有,则转到NextLinkele2.Click.readyState<>4或.忙碌:DoEvents:循环设置html = .document关于错误继续设置ele = html.querySelectorAll(.td51 div")(6)出错时转到0如果不是,那么arr =拆分(ele.innerText,Chr $(10))对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我万一NextLink:下一行.放弃结束于结束子 

I'm trying to check if the html tag:

<nobr>Target</nobr>

exists on the page, and if it does, search for the text between the html tag:

<div style='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;> ... </div>

The text between the div tags look messy like:

ABC [HSA: 
<a href="...">...</a>
] [KO:
<a href="...">...</a>
]
<br />
GHI-JK [JKI:
...    

And I want to get and print to my spreadsheet however many items there are, but I only want the item name (in the above example, there're 2 items - ABC and GHI-JK).

Of course my code below doesn't work., I don't think I'm using queryselector correctly and I'm also not sure how to only grab the item names, instead of the entirety between the tags

If IE.document.querySelector("nobr").innerHTML = "Target" Then
    If IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']") <> 0 Then
         Cells(1, 15).Value = IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']").innerText
    End If
End If

解决方案

CSS selector:

You can use a CSS selector combination to target the element of interest.

The data is in a div, that is inside an element with class td51.

You can write a CSS selector combination to target this pattern of:

.td51 div

This says elements with div tag whose parent is td51 class. Where "." is a class selector.

The element space element pattern is known as a descendant combinator.


CSS query results:

This pattern matches multiple elements and you want the item as index 6.

As multiple items are retrieved you use the querySelectorAll to apply the CSS combinator and retrieve a nodeList you index into to get the item of interest.

As you only want part of the information retrieved you can use split to "slice" out the required info. Note that Kit is not Kit alone but is Kit (CD117).


XMLHTTPRequest XHR:

Option Explicit
Public Sub GetInfo()
    Dim sResponse As String, i As Long, html As New HTMLDocument, arr() As String, ele As Object

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    With html
        .body.innerHTML = sResponse
        On Error Resume Next
        Set ele = .querySelectorAll(".td51 div")(6)
        On Error GoTo 0
        If ele Is Nothing Then Exit Sub
        arr = Split(ele.innerText, Chr$(10))
    End With
    For i = LBound(arr) To UBound(arr)
        Debug.Print Split(arr(i), "[")(0)
    Next i
End Sub


References (VBE > Tools > References):

  1. Microsoft HTML Object Library


Internet Explorer:

Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer, html As HTMLDocument, arr() As String, ele As Object, i As Long

    With ie
        .Visible = True
        .navigate "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set html = .document
        On Error Resume Next
        Set ele = html.querySelectorAll(".td51 div")(6)
        On Error GoTo 0

        If ele Is Nothing Then Exit Sub  
        arr = Split(ele.innerText, Chr$(10))

        For i = LBound(arr) To UBound(arr)
            Debug.Print Split(arr(i), "[")(0)
        Next i
        '.Quit '<== Remember to quit application
    End With
End Sub


References:

  1. Microsoft Internet Controls
  2. Microsoft HTML Object Library


EDIT:

This has become rather long but following our debugging to merge with your other code:

Option Explicit
Public Sub ht()
    Dim ie As Object, ele As Object, i As Long
    Dim sourceSheet As Worksheet, lastRow As Long, rawString() As String, rowIndex As Long
    Dim arrayOfValues() As Variant, html As HTMLDocument, arr() As String
    Const URL As String = "https://www.genome.jp/kegg/drug/"
    Set sourceSheet = Worksheets("Sheet1")
    lastRow = sourceSheet.Range("A30000").End(xlUp).Row
    arrayOfValues = sourceSheet.Range("A1:A" & lastRow)

    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True

        For rowIndex = 1 To lastRow
            .navigate URL
            Do While .readyState <> 4 Or .Busy: DoEvents: Loop

            rawString = VBA.Strings.Split(VBA.Strings.LCase$(arrayOfValues(rowIndex, 1)), ": ", -1, vbBinaryCompare)

            'MsgBox rawString(1)
            .document.querySelector("input[name=q]").Value = rawString(1)
            .document.querySelector("input[value=Go]").Click

            Do While .readyState <> 4 Or .Busy: DoEvents: Loop

            Dim ele2 As Object
            On Error Resume Next
            Set ele2 = .document.querySelector("a[href^='/dbget-bin/www_bget?dr:']")
            On Error GoTo 0
            If ele2 Is Nothing Then GoTo NextLink
            ele2.Click

            Do While .readyState <> 4 Or .Busy: DoEvents: Loop

            Set html = .document
            On Error Resume Next
            Set ele = html.querySelectorAll(".td51 div")(6)
            On Error GoTo 0

            If Not ele Is Nothing Then
                arr = Split(ele.innerText, Chr$(10))
                For i = LBound(arr) To UBound(arr)
                    Debug.Print Split(arr(i), "[")(0)
                Next i
            End If
NextLink:
        Next rowIndex
        .Quit
    End With
End Sub

这篇关于VBA仅提取&lt; div&gt;之间的选择信息.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆