使用VBA从网页中提取表格 [英] Extract table from webpage using VBA
问题描述
我想使用VBA将表格从html代码提取到Excel中.
I would like to extract the table from html code into Excel using VBA.
我尝试了以下代码,并更改了一些代码,但不断出错.
I have tried the following code several times with changing some of the code but keep on getting error.
Sub GrabTable()
'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = False
'navigate to page with needed data
objIE.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'we will output data to excel, starting on row 1
y = 1
'look at all the 'tr' elements in the 'table' with id 'InputTable2',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementByClassName("InputTable2").getElementsByTagName("tr")
'show the text content of 'td' element being looked at
Debug.Print ele.textContent
'each 'tr' (table row) element contains 2 children ('td') elements
'put text of 1st 'td' in col A
Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
'put text of 2nd 'td' in col B
Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
y = y + 1
'repeat until last ele has been evaluated
Next
End Sub
推荐答案
我向您展示了两种方法:
I show you two methods:
-
使用IE:数据位于需要协商的iframe内
Using IE: The data is inside an iframe which needs to be negotiated
使用XMLHTTP请求-更快,而且无需打开浏览器.它使用iframe文档网址的第一部分,即iframe导航到的内容.
Using XMLHTTP request - much faster and without browser opening. It uses the first part of the iframe document URL which is what the iframe is navigating to.
在两种情况下,我都访问包含公司名称的表,然后访问公开信息表.对于公开的主要信息表,我将externalHTML复制到剪贴板,然后粘贴到Excel,以避免循环所有行和列.您只需在其中设置循环tr(表行)和td(表单元格)即可.
In both cases I access the tables containing the company name and then the disclosure info table. For the disclosure main info table I copy the outerHTML to the clipboard and paste to Excel to avoid looping all the rows and columns. You can simply set loop the tr (table rows) and td (table cells) within instead.
IE:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, clipboard As Object
With IE
.Visible = True
.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
While .Busy Or .readyState < 4: DoEvents: Wend
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With .document.getElementById("bm_ann_detail_iframe").contentDocument
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .getElementsByClassName("company_name")(0).innerText
clipboard.SetText .getElementsByTagName("table")(1).outerHTML
clipboard.PutInClipboard
End With
ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
.Quit
End With
End Sub
XMLHTTP:
您可以从iframe网址的前端提取另一个网址,并按如下所示使用该网址.
You can extract a different URL from the front-end of the iframe URL and use that as shown below.
这是原始HTML的一部分,其中显示了iframe和相关的新网址信息:
Here is the section of your original HTML that shows the iframe and the associated new URL info:
Option Explicit
Public Sub GetTable()
Dim sResponse As String, html As HTMLDocument, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = sResponse
With html
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .querySelector(".company_name").innerText
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText .querySelector(".InputTable2").outerHTML
clipboard.PutInClipboard
End With
ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
End Sub
这篇关于使用VBA从网页中提取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!