从网站上的列表中获取数据以优化VBA [英] Get data from listings on a website to excel VBA

查看:113
本文介绍了从网站上的列表中获取数据以优化VBA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一种从yelp.com获取数据的方法

I am trying to find a way to get the data from yelp.com

我有一个电子表格,有几个关键字和位置。我正在寻找根据这些关键字和位置已经在我的电子表格中从yelp列表中提取数据。

I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.

我已经创建了以下代码,但似乎得到荒谬的数据和不是我正在寻找的确切信息。

I have created the following code, but it seems to get absurd data and not the exact information I am looking for.

我想要获得商家名称,地址和电话号码,但我所得到的只是一切。如果有人在这里可以帮助我解决这个问题。

I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.

Sub find()

Dim ie As Object
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        ie.Visible = False
        ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
        ' Don't show window
    ie.Visible = False

    'Wait until IE is done loading page
    Do While ie.Busy
        Application.StatusBar = "Downloading information, lease wait..."
        DoEvents
    Loop

    ' Make a string from IE content
    Set mDoc = ie.Document
    peopleData = mDoc.body.innerText
    ActiveSheet.Cells(1, 1).Value = peopleData
End With

peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub


推荐答案

如果您右键单击IE,并执行 V iew来源,很明显,网站上提供的数据不是文档的 .Body.innerText 属性的一部分。我注意到动态服务的数据通常是这种情况,对于大多数网络抓取,这种方法真的太简单了。

If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.

我在Google Chrome中打开它,并检查元素了解我正在寻找什么,以及如何使用DOM / HTML解析器找到它;您将需要添加对Microsoft HTML对象库的引用。

I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.

我想你可以得到它来返回< DIV> 标签,然后在循环中使用如果状态检查类名。

I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.

我做了一些修改我原来的答案,这应该在一个新的单元格中打印每个记录:

I made some revisions to my original answer, this should print each record in a new cell:

Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library 
'  and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = False
        .Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
        ' Don't show window
        'Wait until IE is done loading page
        Do While .readyState <> 4
            Application.StatusBar = "Downloading information, Please wait..."
            DoEvents
            Sleep 200
        Loop
        Set html = .Document
    End With
    Set Listings = html.getElementsByTagName("LI") ' ## returns the list
    For Each l In Listings
        '## make sure this list item looks like the listings Div Class:
        '   then, build the string to put in your cell
        If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
            Range("A1").Offset(r, 0).Value = l.innerText
            r = r + 1
        End If
    Next

Set html = Nothing
Set ie = Nothing
End Sub

这篇关于从网站上的列表中获取数据以优化VBA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆