Webscrape VBA-列表 [英] Webscrape VBA - List
问题描述
我正在尝试设置一个抓网VBA代码,以从以下网站将数据导入Excel: https://www.thewindpower.net/windfarms_list_en.php
I am trying to set up a webscraping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php
我希望启动此网页,选择一个国家/地区,然后从下表中抓取数据(包括名称列中的网址).
I wish to launch this webpage, select a country and then scrape the data from the table below (including url from the name column).
但是,我有几点要坚持:
Yet, I am stuck with several points:
- 如何在VBA代码中选择所需的国家/地区?
- 由于标签中没有id或class,我该如何选择表?
- 如何导入名称列中包含的URL?
这是我已经准备好的代码(基于网络上的一些研究:
Here is the code I have already prepared (based on some research on the web:
Sub Grabdata()
'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True
'navigate to page with needed data
objIE.navigate "https://www.thewindpower.net/windfarms_list_en.php"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'we will output data to excel, starting on row 1
y = 1
'look at all the 'tr' elements in the 'table' with id 'myTable',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
'show the text content of 'tr' element being looked at
Debug.Print ele.textContent
'each 'tr' (table row) element contains 4 children ('td') elements
'put text of 1st 'td' in col A
Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
'put text of 2nd 'td' in col B
Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
'put text of 3rd 'td' in col C
Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
'put text of 4th 'td' in col D
Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
'increment row counter by 1
y = y + 1
'repeat until last ele has been evaluated
Next
'save the Excel workbook
ActiveWorkbook.Save
结束子
推荐答案
大多数可抓取页面几乎总是具有静态页面布局,因此使用它们的索引选择元素是相当安全的.
Most scrapeable pages will almost always have a static page layout so it's fairly safe to select elements using their index.
下面的代码选择ID为 bloc_texte
的容器元素,然后选择其中的第二个表.
The code below selects the container element with id bloc_texte
and then selects the second table inside.
如果您打算按照评论的建议进行大量请求,则应添加一些代码来减慢请求的速度( Application.wait
类型处理).在请求之后解除请求很可能会惹恼主机.
If you're planning on doing a lot of requests as your comment suggests, you should add some code to slow down your requests (Application.wait
type deal). Firing off request after request is likely to annoy the host.
' Required References
' Microsoft HTML Object Library
' Microsoft XML, v6.0
Sub Main()
GetData ("GB")
End Sub
Sub GetData(ByVal Location As String)
Dim Request As MSXML2.ServerXMLHTTP60: Set Request = New MSXML2.ServerXMLHTTP60
Dim Result As HTMLDocument: Set Result = New HTMLDocument
Request.Open "POST", "https://www.thewindpower.net/windfarms_list_en.php", False
Request.setRequestHeader "content-type", "application/x-www-form-urlencoded"
Request.send "action=submit&pays=" & Location
Result.body.innerHTML = Request.responseText
Dim oRows As MSHTML.IHTMLElementCollection
Dim oRow As MSHTML.IHTMLElement
Dim oCells As MSHTML.IHTMLElementCollection
Dim oCell As MSHTML.IHTMLElement
Dim oLinks As MSHTML.IHTMLElementCollection
Set oRows = Result.getElementById("bloc_texte").getElementsByTagName("table")(2).getElementsByTagName("tr")
Dim iRow As Integer 'output row counter
Dim iColumn As Integer 'output column counter
Dim Sheet As Worksheet 'output sheet
Set Sheet = ThisWorkbook.Worksheets("Sheet1")
iRow = 1
iColumn = 1
For Each oRow In oRows
If Not oRow.className = "puce_texte" Then
Set oCells = oRow.getElementsByTagName("td")
For Each oCell In oCells
Set oLinks = oCell.getElementsByTagName("a")
If oLinks.Length = 0 Then
Sheet.Cells(iRow, iColumn).Value = oCell.innerText
Else
Sheet.Cells(iRow, iColumn).Value = Replace(oLinks(0).getAttribute("href"), "about:", "")
End If
iColumn = iColumn + 1
Next oCell
iRow = iRow + 1
iColumn = 1
End If
Next oRow
End Sub
这篇关于Webscrape VBA-列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!