从网站上的列表中获取数据以优化VBA [英] Get data from listings on a website to excel VBA
问题描述
我试图找到一种从yelp.com获取数据的方法
I am trying to find a way to get the data from yelp.com
我有一个电子表格,有几个关键字和位置。我正在寻找根据这些关键字和位置已经在我的电子表格中从yelp列表中提取数据。
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
我已经创建了以下代码,但似乎得到荒谬的数据和不是我正在寻找的确切信息。
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
我想要获得商家名称,地址和电话号码,但我所得到的只是一切。如果有人在这里可以帮助我解决这个问题。
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
推荐答案
如果您右键单击IE,并执行 V iew来源
,很明显,网站上提供的数据不是文档的 .Body.innerText
属性的一部分。我注意到动态服务的数据通常是这种情况,对于大多数网络抓取,这种方法真的太简单了。
If you right click in IE, and do View Source
, it is apparent that the data served on the site is not part of the document's .Body.innerText
property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
我在Google Chrome中打开它,并检查元素了解我正在寻找什么,以及如何使用DOM / HTML解析器找到它;您将需要添加对Microsoft HTML对象库的引用。
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
我想你可以得到它来返回< DIV>
标签,然后在循环中使用如果
状态检查类名。
I think you can get it to return a collection of the <DIV>
tags, and then check those for the classname with an If
statment inside the loop.
我做了一些修改我原来的答案,这应该在一个新的单元格中打印每个记录:
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub
这篇关于从网站上的列表中获取数据以优化VBA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!