使用XMLHTTP进行刮擦会在特定的类名上引发错误 [英] Scrape using XMLHTTP throws error at specific class name
本文介绍了使用XMLHTTP进行刮擦会在特定的类名上引发错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用此代码抓取网站以提取姓名和联系人...
I am trying to scrape a site with this code to extract names and contacts ...
Sub Test()
Dim htmlDoc As Object
Dim htmlDoc2 As Object
Dim elem As Variant
Dim tag As Variant
Dim dns As String
Dim pageSource As String
Dim pageSource2 As String
Dim url As String
Dim row As Long
row = 2
dns = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", dns, True
.send
While .readyState <> 4: DoEvents: Wend
If .statusText <> "OK" Then
MsgBox "ERROR" & .Status & " - " & .statusText, vbExclamation
Exit Sub
End If
pageSource = .responseText
End With
Set htmlDoc = CreateObject("htmlfile")
htmlDoc.body.innerHTML = pageSource
Dim xx'这里有错误设置xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
Dim xx 'Got error here Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
Set htmlDoc = Nothing
Set htmlDoc2 = Nothing
End Sub
在尝试使用此行时
Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
我收到错误消息对象不支持该属性或方法"(438)因为我不太擅长抓取问题,您能帮我吗?
I got an error 'Object doesn't support that property or method' (438) Can you help me please as I am not so good at scraping issues?
推荐答案
要获取名称及其对应的电话号码,可以尝试以下代码段:
To get the names and their corresponding phone numbers, you can try the below snippet:
Sub GetProfileInfo()
Const URL$ = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page="
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLDivElement, R&, P&
For p = 1 To 3 'put here the highest number you wanna traverse
With Http
.Open "GET", URL & p, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByClassName("ldb-contact-summary")
With post.querySelectorAll(".ldb-contact-name a")
If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
End With
With post.getElementsByClassName("ldb-phone-number")
If .Length Then Cells(R, 2) = .item(0).innerText
End With
Next post
Next p
End Sub
添加到库中以执行上述脚本的参考:
Reference to add to the library to execute the above script:
Microsoft xml, v6.0
Microsoft Html Object Library
这篇关于使用XMLHTTP进行刮擦会在特定的类名上引发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文