使用VBA从网页中提取表格 [英] Extract table from webpage using VBA

查看:68
本文介绍了使用VBA从网页中提取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用VBA将表格从html代码提取到Excel中.

I would like to extract the table from html code into Excel using VBA.

我尝试了以下代码,并更改了一些代码,但不断出错.

I have tried the following code several times with changing some of the code but keep on getting error.

Sub GrabTable()

    'dimension (set aside memory for) our variables
    Dim objIE As InternetExplorer
    Dim ele As Object
    Dim y As Integer

    'start a new browser instance
    Set objIE = New InternetExplorer
    'make browser visible
    objIE.Visible = False

    'navigate to page with needed data
    objIE.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
    'wait for page to load
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

    'we will output data to excel, starting on row 1
    y = 1

    'look at all the 'tr' elements in the 'table' with id 'InputTable2',
    'and evaluate each, one at a time, using 'ele' variable
    For Each ele In objIE.document.getElementByClassName("InputTable2").getElementsByTagName("tr")
        'show the text content of 'td' element being looked at
        Debug.Print ele.textContent
        'each 'tr' (table row) element contains 2 children ('td') elements
        'put text of 1st 'td' in col A
        Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
        'put text of 2nd 'td' in col B
        Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent

        y = y + 1
    'repeat until last ele has been evaluated
    Next


End Sub

推荐答案

我向您展示了两种方法:

I show you two methods:

  1. 使用IE:数据位于需要协商的iframe内

  1. Using IE: The data is inside an iframe which needs to be negotiated

使用XMLHTTP请求-更快,而且无需打开浏览器.它使用iframe文档网址的第一部分,即iframe导航到的内容.

Using XMLHTTP request - much faster and without browser opening. It uses the first part of the iframe document URL which is what the iframe is navigating to.

在两种情况下,我都访问包含公司名称的表,然后访问公开信息表.对于公开的主要信息表,我将externalHTML复制到剪贴板,然后粘贴到Excel,以避免循环所有行和列.您只需在其中设置循环tr(表行)和td(表单元格)即可.

In both cases I access the tables containing the company name and then the disclosure info table. For the disclosure main info table I copy the outerHTML to the clipboard and paste to Excel to avoid looping all the rows and columns. You can simply set loop the tr (table rows) and td (table cells) within instead.

IE:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, clipboard As Object
    With IE
        .Visible = True
        .navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        With .document.getElementById("bm_ann_detail_iframe").contentDocument
            ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .getElementsByClassName("company_name")(0).innerText
            clipboard.SetText .getElementsByTagName("table")(1).outerHTML
            clipboard.PutInClipboard
        End With

        ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
        .Quit
    End With
End Sub


XMLHTTP:

您可以从iframe网址的前端提取另一个网址,并按如下所示使用该网址.

You can extract a different URL from the front-end of the iframe URL and use that as shown below.

这是原始HTML的一部分,其中显示了iframe和相关的新网址信息:

Here is the section of your original HTML that shows the iframe and the associated new URL info:

Option Explicit
Public Sub GetTable()
    Dim sResponse As String, html As HTMLDocument, clipboard As Object
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    html.body.innerHTML = sResponse

    With html
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .querySelector(".company_name").innerText
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .querySelector(".InputTable2").outerHTML
        clipboard.PutInClipboard
    End With

    ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial

End Sub

这篇关于使用VBA从网页中提取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆