VBA:从HTML表中收集信息 [英] VBA: Scraping information from HTML Table

查看:93
本文介绍了VBA:从HTML表中收集信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从html表中提取信息。我想将表中的每个元素添加到集合中。

I'm trying to pull information from an html table. I want to add each element with in the table to a collection. This is what I have so far.

Dim htmlTable As Object
Dim coll2 As Collection
Set coll2 = New Collection
Set IE = New InternetExplorerMedium

With IE
'.AddressBar = False
'.MenuBar = False
.Navigate ("PASSWORDED SITE")
.Visible = True
End With

Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set tableItem = IE.Document.getElementsByTagName("td")
With coll2
For Each tableItem In htmlTable.innerHTML
   .Add tableItem
Next
End With

此行有问题对于htmlTable.innerText中的每个tableItem 我尝试了 htmlTable.innerText 每个都引发不同的错误。

I have a problem with this line For Each tableItem In htmlTable.innerText I tried diffent variations of htmlTable.innerText each throwing differant errors.

这是表格的HTML提取。

This is the HTML Extract for the table.

<table class="Grid" id="ctl00_ContentPlaceHolder1_gvExtract" style="border-collapse: collapse;" border="1" rules="all" cellspacing="0">
        <tbody><tr class="GridHeader" style="font-weight: bold;">
            <th scope="col">Delete</th><th scope="col">Download</th><th scope="col">Extract Date</th><th scope="col">User Id Owner</th>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942854)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942854")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:50:37</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942836)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942836")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:39:29</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941835)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941835")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:23:54</td><td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941827)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941827")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:16:16</td><td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941822)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941822")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:14:06</td><td>MBMAYO</td>
        </tr>
    </tbody></table>

目标是存储每个< td> 作为收集项,然后从中检索日期,例如< td> 06/20/2017 07:50:37< / td> 。该表增长了,所以我认为数组是不可能的吗?

The goal is to store each <td> as an item for a collection and then retrieve the date for example <td>06/20/2017 07:50:37</td> from it. This table Grows so I think an array is out of the question?

编辑从评论:

我一直在尝试调用此函数,我得到一个对象不支持此方法错误:

I have been trying call this function, I'm getting a object does not support this method error:

Public Function htmlCell(id As String) As String 
    htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
                    .get‌​ElementsByTagName("t‌​d")(id).innerHTML 
End Function


推荐答案

您可能需要的是这样的东西。 HTH

What you probably need is something like this. HTH

Dim htmlTable As MSHTML.htmlTable
Dim htmlTableCells As MSHTML.IHTMLElementCollection
Dim htmlTableCell As MSHTML.htmlTableCell
Dim htmlAnchor As MSHTML.HTMLAnchorElement

Set htmlTable = ie.document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set htmlTableCells = htmlTable.getElementsByTagName("td")
With coll2
    For Each htmlTableCell In htmlTableCells
        If VBA.TypeName(htmlTableCell.FirstChild) = "HTMLAnchorElement" Then
            Set htmlAnchor = htmlTableCell.FirstChild
            .Add htmlAnchor.innerHTML
        Else
            .Add htmlTableCell.innerHTML
        End If
    Next
End With




结果

Result



Dim el
For Each el In coll2
    Debug.Print el
Next el

输出:

Delete
Work Order Inquiry - Work Order
06/20/2017 07:50:37
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:39:29
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:23:54
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:16:16
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:14:06
MBMAYO

这篇关于VBA:从HTML表中收集信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆