使用 excel 和 VBA 进行网页抓取 [英] web scraping using excel and VBA

查看:229
本文介绍了使用 excel 和 VBA 进行网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 excel 表中编写了我的 VBA 代码,如下所示,但它不是为我抓取数据,而且我不知道为什么请任何人帮助我.它让我感到点击她阅读更多",只是我想抓取所有数据,例如名字姓氏州邮政编码等

子提取表数据()Dim IE As Object, obj As Object将 myState 调暗为字符串Dim r As Integer, c As Integer, t As IntegerDim elemCollection 作为对象Set IE = CreateObject("InternetExplorer.Application")myState = InputBox("请输入您希望工作的城市")用 IE.可见 = 真.navigate ("http://www.funeralhomes.com/go/listing/Search?name=&city=&state=&country=USA&zip=&radius=")而 IE.readyState <>4事件温德对于 IE.document.all.item("state").Options 中的每个 obj如果 obj.innerText = myState 那么obj.Selected = 真万一下一个对象IE.document.getElementsByValue("Search").item.ClickDo While IE.Busy: DoEvents: LoopThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents设置 elemCollection = IE.document.getElementsByTagName("TABLE")对于 t = 0 到 (elemCollection.Length - 1)对于 r = 0 到 (elemCollection(t).Rows.Length - 1)对于 c = 0 到 (elemCollection(t).Rows(r).Cells.Length - 1)ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText下一个下一个下一个结束于设置 IE = 无结束子

解决方案

使用与已给出答案相同的 URL,您也可以使用 CSS 选择器进行选择以获取感兴趣的元素,并使用 split 获取名称和地址正文部分.我们还可以完全取消浏览器,以便更快地从第一个结果页面获得结果.

<小时>

公司名称:

您可以使用以下选择器获取名称(使用付费列表示例):

div.paid-listing .listing-title

此选择(示例视图)

i wrote my VBA code in excel sheet as below but it is not scrape data for me and also i don't know why please any one help me. it gave me reullt as "click her to read more" onlyi want to scrape enitre data such as first name last name state zip code and so on

Sub extractTablesData()
    Dim IE As Object, obj As Object
    Dim myState As String
    Dim r As Integer, c As Integer, t As Integer
    Dim elemCollection As Object

    Set IE = CreateObject("InternetExplorer.Application")

    myState = InputBox("Enter the city where you wish to work")

    With IE

        .Visible = True
        .navigate ("http://www.funeralhomes.com/go/listing/Search?  name=&city=&state=&country=USA&zip=&radius=")

        While IE.readyState <> 4
            DoEvents
        Wend

        For Each obj In IE.document.all.item("state").Options
            If obj.innerText = myState Then
                obj.Selected = True
            End If
        Next obj

        IE.document.getElementsByValue("Search").item.Click

        Do While IE.Busy: DoEvents: Loop

        ThisWorkbook.Sheets("Sheet1").Range("A1:K1500").ClearContents

        Set elemCollection = IE.document.getElementsByTagName("TABLE")

        For t = 0 To (elemCollection.Length - 1)

            For r = 0 To (elemCollection(t).Rows.Length - 1)
                For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                    ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
                Next c
            Next r
        Next t

    End With
    Set IE = Nothing
End Sub

解决方案

Using the same URL as the answer already given you could alternatively select with CSS selectors to get the elements of interest, and use split to get just the names and address parts from the text. We can also do away with the browser altogether to get faster results from first results page.


Business name:

You can get the name with the following selector (using paid listing example):

div.paid-listing .listing-title

This selects (sample view)

Try


Address info:

The associated descriptive information can be retrieved with the selector:

div.paid-listing .address-summary

And then using split we can parse this into just the address information.


Code:

Option Explicit
Public Sub GetTitleAndAddress()
    Dim oHtml As HTMLDocument, nodeList1 As Object, nodeList2 As Object, i As Long
    Const URL As String = "http://www.funeralhomes.com/go/listing/ShowListing/USA/New%20York/New%20York"
    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", URL, False
        .send
        oHtml.body.innerHTML = .responseText
    End With

    Set nodeList1 = oHtml.querySelectorAll("div.paid-listing .listing-title")
    Set nodeList2 = oHtml.querySelectorAll("div.paid-listing .address-summary")

    With Worksheets("Sheet3")
        .UsedRange.ClearContents
        For i = 0 To nodeList1.Length - 1
            .Range("A" & i + 1) = nodeList1.Item(i).innerText
            .Range("B" & i + 1) = Split(nodeList2.Item(i).innerText, Chr$(10))(0)
        Next i
    End With
End Sub


Example output:

这篇关于使用 excel 和 VBA 进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆