VBA仅提取< div>之间的选择信息.标签 [英] VBA extracting only select info between <div> tags
问题描述
我正在尝试检查html标签:
< nobr>目标</nobr>
存在于页面上,如果存在,则搜索html标记之间的文本:
< div style ='width:555px;-ms-overflow-x:自动;-ms-overflow-y:隐藏;>...</div>
div标签之间的文本看起来很混乱:
ABC [HSA:< a href ="..."> ...</a>] [KO:< a href ="..."> ...</a>]< br/>GHI-JK [JKI:...
我想获取并打印到电子表格中,但是有很多项目,但是我只想要项目名称(在上面的示例中,有2个项目-ABC和GHI-JK).
当然我下面的代码行不通.,我认为我没有正确使用queryselector,我也不确定如何只获取项目名称,而不是标签之间的全部信息.
如果IE.document.querySelector("nobr").innerHTML ="Target"然后如果IE.document.querySelector("div [style ^ ='width:555px; -ms-overflow-x:auto; -ms-overflow-y:hidden;']")<>0然后单元格(1,15).Value = IE.document.querySelector("div [style ^ ='width:555px; -ms-overflow-x:auto; -ms-overflow-y:hidden;']").innerText万一万一
CSS选择器:
您可以使用CSS选择器组合来定位感兴趣的元素.
数据在 div
中,位于类 td51
的元素内.
您可以编写CSS选择器组合来定位以下模式:
.td51 div
这表示带有 div
标记的元素,其父元素是 td51
类.其中."
是类选择器.
元素空间元素模式称为
当检索到多个项目时,您可以使用 querySelectorAll
来应用CSS组合器,并检索您索引到的 nodeList
以获得感兴趣的项目.
由于只希望检索部分信息,因此可以使用 split
来切片"所需的信息.请注意, Kit
并不是单独的 Kit
,而是 Kit(CD117)
.
XMLHTTPRequest XHR:
选项显式公共子GetInfo()Dim sResponse作为字符串,i作为长,html作为新的HTMLDocument,arr()作为字符串,ele作为对象使用CreateObject("MSXML2.XMLHTTP").打开"GET","https://www.kegg.jp/dbget-bin/www_bget?dr:D01441",False.发送sResponse = StrConv(.responseBody,vbUnicode)结束于与HTML.body.innerHTML = sResponse关于错误继续设置ele = .querySelectorAll(.td51 div")(6)出错时转到0如果ele什么都没有,则退出Subarr =拆分(ele.innerText,Chr $(10))结束于对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我结束子
参考(VBE>工具>参考):
- Microsoft HTML对象库
Internet Explorer:
选项显式公共子GetInfo()昏暗即作为新的InternetExplorer,html作为HTMLDocument,arr()作为字符串,ele作为对象,i与即.Visible = True.navigate"https://www.kegg.jp/dbget-bin/www_bget?dr:D01441"而.Busy或.readyState<4:DoEvents:Wend设置html = .document关于错误继续设置ele = html.querySelectorAll(.td51 div")(6)出错时转到0如果ele什么都没有,则退出Subarr =拆分(ele.innerText,Chr $(10))对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我'.Quit'< ==记住退出应用程序结束于结束子
参考:
- Microsoft Internet控件
- Microsoft HTML对象库
这已经很长了,但是经过我们的调试才能与您的其他代码合并:
选项显式公众号ht()昏暗即作为对象,ele作为对象,我只要Dim sourceSheet作为工作表,lastRow作为Long,rawString()作为String,rowIndex作为Long昏暗的arrayOfValues()为变量,html为HTMLDocument,arr()为字符串const URL as String ="https://www.genome.jp/kegg/drug/"设置sourceSheet = Worksheets("Sheet1")lastRow = sourceSheet.Range("A30000").End(xlUp).RowarrayOfValues = sourceSheet.Range("A1:A"& lastRow)设置ie = CreateObject("InternetExplorer.Application")与即.Visible = True对于rowIndex = 1到lastRow.navigate网址.readyState<>4或.忙碌:DoEvents:循环rawString = VBA.Strings.Split(VBA.Strings.LCase $(arrayOfValues(rowIndex,1)),:",-1,vbBinaryCompare)'MsgBox rawString(1).document.querySelector("input [name = q]").Value = rawString(1).document.querySelector("input [value = Go]").Click.readyState<>4或.忙碌:DoEvents:循环Dim ele2作为对象关于错误继续设置ele2 = .document.querySelector("a [href ^ ='/dbget-bin/www_bget?dr:']")出错时转到0如果ele2什么都没有,则转到NextLinkele2.Click.readyState<>4或.忙碌:DoEvents:循环设置html = .document关于错误继续设置ele = html.querySelectorAll(.td51 div")(6)出错时转到0如果不是,那么arr =拆分(ele.innerText,Chr $(10))对于i = LBound(arr)到UBound(arr)Debug.Print Split(arr(i),"[")(0)接下来我万一NextLink:下一行.放弃结束于结束子
I'm trying to check if the html tag:
<nobr>Target</nobr>
exists on the page, and if it does, search for the text between the html tag:
<div style='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;> ... </div>
The text between the div tags look messy like:
ABC [HSA:
<a href="...">...</a>
] [KO:
<a href="...">...</a>
]
<br />
GHI-JK [JKI:
...
And I want to get and print to my spreadsheet however many items there are, but I only want the item name (in the above example, there're 2 items - ABC and GHI-JK).
Of course my code below doesn't work., I don't think I'm using queryselector correctly and I'm also not sure how to only grab the item names, instead of the entirety between the tags
If IE.document.querySelector("nobr").innerHTML = "Target" Then
If IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']") <> 0 Then
Cells(1, 15).Value = IE.document.querySelector("div[style^='width: 555px; -ms-overflow-x: auto; -ms-overflow-y: hidden;']").innerText
End If
End If
CSS selector:
You can use a CSS selector combination to target the element of interest.
The data is in a div
, that is inside an element with class td51
.
You can write a CSS selector combination to target this pattern of:
.td51 div
This says elements with div
tag whose parent is td51
class. Where "."
is a class selector.
The element space element pattern is known as a descendant combinator.
CSS query results:
This pattern matches multiple elements and you want the item as index 6.
As multiple items are retrieved you use the querySelectorAll
to apply the CSS combinator and retrieve a nodeList
you index into to get the item of interest.
As you only want part of the information retrieved you can use split
to "slice" out the required info. Note that Kit
is not Kit
alone but is Kit (CD117)
.
XMLHTTPRequest XHR:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument, arr() As String, ele As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
With html
.body.innerHTML = sResponse
On Error Resume Next
Set ele = .querySelectorAll(".td51 div")(6)
On Error GoTo 0
If ele Is Nothing Then Exit Sub
arr = Split(ele.innerText, Chr$(10))
End With
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
End Sub
References (VBE > Tools > References):
- Microsoft HTML Object Library
Internet Explorer:
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, html As HTMLDocument, arr() As String, ele As Object, i As Long
With ie
.Visible = True
.navigate "https://www.kegg.jp/dbget-bin/www_bget?dr:D01441"
While .Busy Or .readyState < 4: DoEvents: Wend
Set html = .document
On Error Resume Next
Set ele = html.querySelectorAll(".td51 div")(6)
On Error GoTo 0
If ele Is Nothing Then Exit Sub
arr = Split(ele.innerText, Chr$(10))
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
'.Quit '<== Remember to quit application
End With
End Sub
References:
- Microsoft Internet Controls
- Microsoft HTML Object Library
EDIT:
This has become rather long but following our debugging to merge with your other code:
Option Explicit
Public Sub ht()
Dim ie As Object, ele As Object, i As Long
Dim sourceSheet As Worksheet, lastRow As Long, rawString() As String, rowIndex As Long
Dim arrayOfValues() As Variant, html As HTMLDocument, arr() As String
Const URL As String = "https://www.genome.jp/kegg/drug/"
Set sourceSheet = Worksheets("Sheet1")
lastRow = sourceSheet.Range("A30000").End(xlUp).Row
arrayOfValues = sourceSheet.Range("A1:A" & lastRow)
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
For rowIndex = 1 To lastRow
.navigate URL
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
rawString = VBA.Strings.Split(VBA.Strings.LCase$(arrayOfValues(rowIndex, 1)), ": ", -1, vbBinaryCompare)
'MsgBox rawString(1)
.document.querySelector("input[name=q]").Value = rawString(1)
.document.querySelector("input[value=Go]").Click
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
Dim ele2 As Object
On Error Resume Next
Set ele2 = .document.querySelector("a[href^='/dbget-bin/www_bget?dr:']")
On Error GoTo 0
If ele2 Is Nothing Then GoTo NextLink
ele2.Click
Do While .readyState <> 4 Or .Busy: DoEvents: Loop
Set html = .document
On Error Resume Next
Set ele = html.querySelectorAll(".td51 div")(6)
On Error GoTo 0
If Not ele Is Nothing Then
arr = Split(ele.innerText, Chr$(10))
For i = LBound(arr) To UBound(arr)
Debug.Print Split(arr(i), "[")(0)
Next i
End If
NextLink:
Next rowIndex
.Quit
End With
End Sub
这篇关于VBA仅提取< div>之间的选择信息.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!