使用XMLHTTP进行刮擦会在特定的类名上引发错误 [英] Scrape using XMLHTTP throws error at specific class name

查看:39
本文介绍了使用XMLHTTP进行刮擦会在特定的类名上引发错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用此代码抓取网站以提取姓名和联系人...

I am trying to scrape a site with this code to extract names and contacts ...

Sub Test()
Dim htmlDoc         As Object
Dim htmlDoc2        As Object
Dim elem            As Variant
Dim tag             As Variant
Dim dns             As String
Dim pageSource      As String
Dim pageSource2     As String
Dim url             As String
Dim row             As Long

row = 2
dns = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", dns, True
    .send

    While .readyState <> 4: DoEvents: Wend

    If .statusText <> "OK" Then
        MsgBox "ERROR" & .Status & " - " & .statusText, vbExclamation
        Exit Sub
    End If

    pageSource = .responseText
End With

Set htmlDoc = CreateObject("htmlfile")
htmlDoc.body.innerHTML = pageSource

Dim xx'这里有错误设置xx = htmlDoc.getElementsByClassName("ldb-contact-summary")

Dim xx 'Got error here Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")

Set htmlDoc = Nothing
Set htmlDoc2 = Nothing
End Sub

在尝试使用此行时

Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")

我收到错误消息对象不支持该属性或方法"(438)因为我不太擅长抓取问题,您能帮我吗?

I got an error 'Object doesn't support that property or method' (438) Can you help me please as I am not so good at scraping issues?

推荐答案

要获取名称及其对应的电话号码,可以尝试以下代码段:

To get the names and their corresponding phone numbers, you can try the below snippet:

Sub GetProfileInfo()
    Const URL$ = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page="
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim post As HTMLDivElement, R&, P&

    For p = 1 To 3 'put here the highest number you wanna traverse
        With Http
            .Open "GET", URL & p, False
            .send
            Html.body.innerHTML = .responseText
        End With

        For Each post In Html.getElementsByClassName("ldb-contact-summary")
            With post.querySelectorAll(".ldb-contact-name a")
                If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
            End With

            With post.getElementsByClassName("ldb-phone-number")
                If .Length Then Cells(R, 2) = .item(0).innerText
            End With
        Next post
    Next p
End Sub

添加到库中以执行上述脚本的参考:

Reference to add to the library to execute the above script:

Microsoft xml, v6.0
Microsoft Html Object Library

这篇关于使用XMLHTTP进行刮擦会在特定的类名上引发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆