< frameset>中的Excel VBA Web抓取表格元素.和< frame> [英] Excel VBA Web Scraping Table Elements from a <frameset> and a <frame>
问题描述
我正在尝试将网站中的一些表格项粘贴到Excel中.
I am trying to scrape some table-looking items from a website into Excel.
尽管我在Excel方面对VBA还是很陌生,但我对编码通常并不陌生:)
I'm no stranger to coding in general, though I'm pretty new to VBA in an Excel sense :)
我尝试使用Excel的数据>从Web界面,它无法识别表格.我猜是因为它是使用(或至少是我的Google-Fu让我理解的)构建的.
I have tried using Excel's Data>From Web interface, it's not recognizing the table. I'm guessing it's because it's built using (or at least that's what my Google-Fu has lead me to understand).
<html>
<frame title="links" ...>...</frame>
<frame title="queue">
#document
<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>
<table id="oTable">
<colgroup>...</colgroup>
<thead>...</thead>
<tbody>
<tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">12345</a></td>
<td nowrap=""><a href="`" target="_Blank">28/08/2018 17:00:49</a></td>
<td nowrap=""><a href="URL" target="_Blank">11/09/2018 16:28:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">5,599,976</a></td>
<td nowrap=""><a href="URL" target="_Blank">dijm</a></td></tr>
<tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">23456</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:24</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:35:30</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,287</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">78901</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:02</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:39</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,339</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
<tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
<td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a> <a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">34567</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:17</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:43</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,344</a></td>
<td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
我尝试了各种看起来像这样的解决方案:使用vba从网站抓取数据
I have tried various solutions that look somewhat like this: https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpage and Scraping data from website using vba
并尝试定义框架本身以尝试从那里获取信息?(再次:是Excel VBA的新功能)
and trying to define the frames themselves to try and get the info from there? (again: new to Excel VBA)
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document
使用上面的代码块,我得到了运行时错误'438'没有上面的块,我会收到运行时错误'1004'
With the above block of code I'm getting a "Run-time error '438' Without the above block I'm getting a "Run-time error '1004'
我最终想要的信息在每一行中:
The info I eventually want is in each row:
</td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
<td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
<td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
<td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
理想情况下,我想将每个元素转储到单元格中
Ideally I'd like to dump each element into a cell
67890 |25/06/2019 11:01:01 |09/09/2019 10:32:32 |6,443,276
67890 | 25/06/2019 11:01:01 | 09/07/2019 10:32:32 | 6,443,276
每页上有20行(有一个按钮可以按下以转到下一页,稍后我会弄清楚...希望是哈哈)
There's 20 of these rows on each page (there's a button to press to get to the next page which I'll figure out later...hopefully haha)
大量提示,感谢任何可以帮助您的人:)
Massive premptive Thank You to anyone who can help :)
-编辑-这是我目前正在使用的代码(对此并不珍贵:P)
-EDIT- This is the code that I'm currently working with (not precious about it :P )
Private Sub CommandButton1_Click()
Dim ie As Object
Dim html As Object
Dim objElementTR As Object
Dim objTR As Object
Dim objElementsTD As Object
Dim objTD As Object
Dim result As String
Dim intRow As Long
Dim intCol As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate "URL"
ie.Visible = True ' loop until page is loaded
Do Until (ie.ReadyState = 4 And Not ie.Busy)
DoEvents
Loop
'set myHTMLDoc to the main pages IE document
Dim myHTMLDoc As HTMLDocument
Set myHTMLDoc = ie.Document
'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
Dim myHTMLFrame2 As HTMLDocument
Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")
result = myHTMLFrame2
Set html = CreateObject("htmlfile")
myHTMLFrame2 = result
Set objElementTR = html.getElementsByTagName("tr")
ReDim myarray(0 To objElementTR.Length, 0 To 10)
For Each objTR In objElementTR
intRow = intRow + 1
Set objElementsTD = objTR.getElementsByTagName("td")
For Each objTD In objElementsTD
myarray(intRow, intCol) = objTD.innerText
intCol = intCol + 1
Next objTD
intCol = 0
Next objTR
With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
.Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
End With
End Sub
推荐答案
您可以尝试通过其title属性隔离框架,然后通过contentDocument并通过id获取表
You could try isolating the frame by its title attribute, then go via contentDocument and get the table by id
ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")
然后将结尾 .querySelector(#oTable")
与 .getElementById("oTable")
然后我将通过剪贴板转储表的表的 .outerHTML
,以便粘贴将表直接放入工作表.
I would then dump the .outerHTML
of the table via clipboard so as to paste table direct into sheet.
这篇关于< frameset>中的Excel VBA Web抓取表格元素.和< frame>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!