< frameset>中的Excel VBA Web抓取表格元素.和< frame> [英] Excel VBA Web Scraping Table Elements from a <frameset> and a <frame>

查看:68
本文介绍了< frameset>中的Excel VBA Web抓取表格元素.和< frame>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将网站中的一些表格项粘贴到Excel中.

I am trying to scrape some table-looking items from a website into Excel.

尽管我在Excel方面对VBA还是很陌生,但我对编码通常并不陌生:)

I'm no stranger to coding in general, though I'm pretty new to VBA in an Excel sense :)

我尝试使用Excel的数据>从Web界面,它无法识别表格.我猜是因为它是使用(或至少是我的Google-Fu让我理解的)构建的.

I have tried using Excel's Data>From Web interface, it's not recognizing the table. I'm guessing it's because it's built using (or at least that's what my Google-Fu has lead me to understand).

第二张表的截图

<html>

<frame title="links" ...>...</frame>

<frame title="queue">
#document

<head>...</head>
<body>
<div id="container>
<script>...</script>
<div>

<table id="oTable">

<colgroup>...</colgroup>

<thead>...</thead>


<tbody>
  <tr onclick="changeHighlight( 'eid0' )" id="eid0" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.5599976.5599976');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">12345</a></td>
    <td nowrap=""><a href="`" target="_Blank">28/08/2018 17:00:49</a></td>
    <td nowrap=""><a href="URL" target="_Blank">11/09/2018 16:28:39</a></td>
    <td nowrap=""><a href="URL" target="_Blank">5,599,976</a></td>
    <td nowrap=""><a href="URL" target="_Blank">dijm</a></td></tr>
  <tr onclick="changeHighlight( 'eid1' )" id="eid1" class="queryunshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443276.6443276');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443276.6443276','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid2' )" id="eid2" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443287.6443287');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443287.6443287','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">23456</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:24</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:35:30</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,287</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid3' )" id="eid3" class="queryunshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443339.6443339');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443339.6443339','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">78901</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:02</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:39</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,339</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>
  <tr onclick="changeHighlight( 'eid4' )" id="eid4" class="queryshaded">
    <td nowrap=""><a onclick="javascript:window.open('IWViewer.jsp?id=3.6443344.6443344');" title="Open Image" href="javascript:doNothing();"><img title="Open Image" border="0" alt="Open Image" src="URL.gif"></a>&nbsp;<a onclick="javascript:window.open('URL;id=3.6443344.6443344','_newtab');" title="Open Workitem" href="javascript:doNothing();"><img title="Open Workitem" border="0" alt="Open Workitem" src="URL.gif"></a>
    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">34567</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:06:17</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:40:43</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,344</a></td>
    <td nowrap=""><a href="URL" target="_Blank"></a></td></tr>

我尝试了各种看起来像这样的解决方案:使用vba从网站抓取数据

I have tried various solutions that look somewhat like this: https://www.ozgrid.com/forum/forum/other-software-applications/excel-and-web-browsers-help/131683-extracting-data-from-a-grid-on-webpage and Scraping data from website using vba

并尝试定义框架本身以尝试从那里获取信息?(再次:是Excel VBA的新功能)

and trying to define the frames themselves to try and get the info from there? (again: new to Excel VBA)

    'set myHTMLDoc to the main pages IE document
    Dim myHTMLDoc As HTMLDocument
    Set myHTMLDoc = ie.Document

    'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
    Dim myHTMLFrame2 As HTMLDocument
    Set myHTMLFrame2 = myHTMLDoc.Frames(1).Document

使用上面的代码块,我得到了运行时错误'438'没有上面的块,我会收到运行时错误'1004'

With the above block of code I'm getting a "Run-time error '438' Without the above block I'm getting a "Run-time error '1004'

我最终想要的信息在每一行中:

The info I eventually want is in each row:

    </td><td scope="row" nowrap=""><a href="URL" target="_Blank">67890</a></td>
    <td nowrap=""><a href="URL" target="_Blank">25/06/2019 11:01:01</a></td>
    <td nowrap=""><a href="URL" target="_Blank">09/07/2019 10:32:32</a></td>
    <td nowrap=""><a href="URL" target="_Blank">6,443,276</a></td>

理想情况下,我想将每个元素转储到单元格中

Ideally I'd like to dump each element into a cell

67890 |25/06/2019 11:01:01 |09/09/2019 10:32:32 |6,443,276

67890 | 25/06/2019 11:01:01 | 09/07/2019 10:32:32 | 6,443,276

每页上有20行(有一个按钮可以按下以转到下一页,稍后我会弄清楚...希望是哈哈)

There's 20 of these rows on each page (there's a button to press to get to the next page which I'll figure out later...hopefully haha)

大量提示,感谢任何可以帮助您的人:)

Massive premptive Thank You to anyone who can help :)

-编辑-这是我目前正在使用的代码(对此并不珍贵:P)

-EDIT- This is the code that I'm currently working with (not precious about it :P )

Private Sub CommandButton1_Click()


    Dim ie     As Object
    Dim html   As Object
    Dim objElementTR As Object
    Dim objTR  As Object
    Dim objElementsTD As Object
    Dim objTD  As Object
    Dim result As String
    Dim intRow As Long
    Dim intCol As Long

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Navigate "URL"
    ie.Visible = True     ' loop until page is loaded
    Do Until (ie.ReadyState = 4 And Not ie.Busy)
        DoEvents
    Loop

    'set myHTMLDoc to the main pages IE document
    Dim myHTMLDoc As HTMLDocument
    Set myHTMLDoc = ie.Document

    'set myHTMLFrame2 as the 2nd frame of the main page (index starts at 0)
    Dim myHTMLFrame2 As HTMLDocument
    Set myHTMLFrame2 = ie.Document.querySelector("[title=queue]").contentDocument.getElementById("oTable")

    result = myHTMLFrame2
    Set html = CreateObject("htmlfile")
    myHTMLFrame2 = result
    Set objElementTR = html.getElementsByTagName("tr")
    ReDim myarray(0 To objElementTR.Length, 0 To 10)
    For Each objTR In objElementTR
        intRow = intRow + 1
        Set objElementsTD = objTR.getElementsByTagName("td")
        For Each objTD In objElementsTD
            myarray(intRow, intCol) = objTD.innerText
            intCol = intCol + 1
        Next objTD
        intCol = 0
    Next objTR
    With Sheets(1).Cells(1, 1).Cells(Rows.Count, "A").End(xlUp).Offset(1, 0)
        .Resize(UBound(myarray), UBound(myarray, 2)).Value = myarray
    End With



End Sub

推荐答案

您可以尝试通过其title属性隔离框架,然后通过contentDocument并通过id获取表

You could try isolating the frame by its title attribute, then go via contentDocument and get the table by id

ie.document.querySelector("[title=queue]").contentDocument.querySelector("#oTable")

然后将结尾 .querySelector(#oTable") .getElementById("oTable")

然后我将通过剪贴板转储表的表的 .outerHTML ,以便粘贴将表直接放入工作表.

I would then dump the .outerHTML of the table via clipboard so as to paste table direct into sheet.

这篇关于&lt; frameset&gt;中的Excel VBA Web抓取表格元素.和&lt; frame&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆