Webscrape VBA-列表 [英] Webscrape VBA - List

查看:31
本文介绍了Webscrape VBA-列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试设置一个抓网VBA代码,以从以下网站将数据导入Excel: https://www.thewindpower.net/windfarms_list_en.php

I am trying to set up a webscraping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php

我希望启动此网页,选择一个国家/地区,然后从下表中抓取数据(包括名称列中的网址).

I wish to launch this webpage, select a country and then scrape the data from the table below (including url from the name column).

但是,我有几点要坚持:

Yet, I am stuck with several points:

  • 如何在VBA代码中选择所需的国家/地区?
  • 由于标签中没有id或class,我该如何选择表?
  • 如何导入名称列中包含的URL?

这是我已经准备好的代码(基于网络上的一些研究:

Here is the code I have already prepared (based on some research on the web:

Sub Grabdata()

'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer

'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True

'navigate to page with needed data
objIE.navigate "https://www.thewindpower.net/windfarms_list_en.php"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

'we will output data to excel, starting on row 1
y = 1

'look at all the 'tr' elements in the 'table' with id 'myTable',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
    'show the text content of 'tr' element being looked at
    Debug.Print ele.textContent
    'each 'tr' (table row) element contains 4 children ('td') elements
    'put text of 1st 'td' in col A
    Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
    'put text of 2nd 'td' in col B
    Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
    'put text of 3rd 'td' in col C
    Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
    'put text of 4th 'td' in col D
    Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
    'increment row counter by 1
    y = y + 1
'repeat until last ele has been evaluated
Next

'save the Excel workbook
ActiveWorkbook.Save

结束子

推荐答案

大多数可抓取页面几乎总是具有静态页面布局,因此使用它们的索引选择元素是相当安全的.

Most scrapeable pages will almost always have a static page layout so it's fairly safe to select elements using their index.

下面的代码选择ID为 bloc_texte 的容器元素,然后选择其中的第二个表.

The code below selects the container element with id bloc_texte and then selects the second table inside.

如果您打算按照评论的建议进行大量请求,则应添加一些代码来减慢请求的速度( Application.wait 类型处理).在请求之后解除请求很可能会惹恼主机.

If you're planning on doing a lot of requests as your comment suggests, you should add some code to slow down your requests (Application.wait type deal). Firing off request after request is likely to annoy the host.

' Required References
' Microsoft HTML Object Library
' Microsoft XML, v6.0

Sub Main()
    GetData ("GB")
End Sub

Sub GetData(ByVal Location As String)

Dim Request As MSXML2.ServerXMLHTTP60: Set Request = New MSXML2.ServerXMLHTTP60

Dim Result As HTMLDocument: Set Result = New HTMLDocument

Request.Open "POST", "https://www.thewindpower.net/windfarms_list_en.php", False
Request.setRequestHeader "content-type", "application/x-www-form-urlencoded"
Request.send "action=submit&pays=" & Location

Result.body.innerHTML = Request.responseText

Dim oRows As MSHTML.IHTMLElementCollection
Dim oRow As MSHTML.IHTMLElement

Dim oCells As MSHTML.IHTMLElementCollection
Dim oCell As MSHTML.IHTMLElement

Dim oLinks As MSHTML.IHTMLElementCollection

Set oRows = Result.getElementById("bloc_texte").getElementsByTagName("table")(2).getElementsByTagName("tr")

Dim iRow As Integer 'output row counter
Dim iColumn As Integer 'output column counter
Dim Sheet As Worksheet 'output sheet

Set Sheet = ThisWorkbook.Worksheets("Sheet1")
iRow = 1
iColumn = 1

For Each oRow In oRows
    If Not oRow.className = "puce_texte" Then
        Set oCells = oRow.getElementsByTagName("td")
        For Each oCell In oCells
            Set oLinks = oCell.getElementsByTagName("a")
            If oLinks.Length = 0 Then
                Sheet.Cells(iRow, iColumn).Value = oCell.innerText
            Else
                Sheet.Cells(iRow, iColumn).Value = Replace(oLinks(0).getAttribute("href"), "about:", "")
            End If
            iColumn = iColumn + 1
        Next oCell
        iRow = iRow + 1
        iColumn = 1
    End If
Next oRow

End Sub

这篇关于Webscrape VBA-列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆