使用 excel 和 VBA 进行网页抓取 [英] web scraping using excel and VBA
问题描述
我在 excel 表中编写了我的 VBA 代码,如下所示,但它不是为我抓取数据,而且我不知道为什么请任何人帮助我.它让我感到点击她阅读更多",只是我想抓取所有数据,例如名字姓氏州邮政编码等
子提取表数据()Dim IE As Object, obj As Object将 myState 调暗为字符串Dim r As Integer, c As Integer, t As IntegerDim elemCollection 作为对象Set IE = CreateObject("InternetExplorer.Application")myState = InputBox("请输入您希望工作的城市")用 IE.可见 = 真.navigate ("http://www.funeralhomes.com/go/listing/Search?name=&city=&state=&country=USA&zip=&radius=")而 IE.readyState <>4事件温德对于 IE.document.all.item("state").Options 中的每个 obj如果 obj.innerText = myState 那么obj.Selected = 真万一下一个对象IE.document.getElementsByValue("Search").item.ClickDo While IE.Busy: DoEvents: LoopThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents设置 elemCollection = IE.document.getElementsByTagName("TABLE")对于 t = 0 到 (elemCollection.Length - 1)对于 r = 0 到 (elemCollection(t).Rows.Length - 1)对于 c = 0 到 (elemCollection(t).Rows(r).Cells.Length - 1)ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText下一个下一个下一个结束于设置 IE = 无结束子
使用与已给出答案相同的 URL,您也可以使用 CSS 选择器进行选择以获取感兴趣的元素,并使用 split 获取名称和地址正文部分.我们还可以完全取消浏览器,以便更快地从第一个结果页面获得结果.
<小时>公司名称:
您可以使用以下选择器获取名称(使用付费列表示例):
div.paid-listing .listing-title
此选择(示例视图)
i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim myState As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myState = InputBox("Enter the city where you wish to work")
With IE
.Visible = True
.navigate ("http://www.funeralhomes.com/go/listing/Search? name=&city=&state=&country=USA&zip=&radius=")
While IE.readyState <> 4
DoEvents
Wend
For Each obj In IE.document.all.item("state").Options
If obj.innerText = myState Then
obj.Selected = True
End If
Next obj
IE.document.getElementsByValue("Search").item.Click
Do While IE.Busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents
Set elemCollection = IE.document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.
Business name:
You can get the name with the following selector (using paid listing example):
div.paid-listing .listing-title
This selects (sample view)
Address info:
The associated descriptive information can be retrieved with the selector:
div.paid-listing .address-summary
And then using split we can parse this into just the address information.
Code:
Option Explicit
Public Sub GetTitleAndAddress()
Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")
With Worksheets("Sheet3")
.UsedRange.ClearContents
For i = 0 To nodeList1.Length - 1
.Range("A" & i + 1) = nodeList1.Item(i).innerText
.Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
Next i
End With
End Sub
Example output:
这篇关于使用 excel 和 VBA 进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!