从使用VBA excel多个类名的网站拉数据 [英] pull data from website using VBA excel multiple classname

查看:195
本文介绍了从使用VBA excel多个类名的网站拉数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这已经被问了很多次,但是没有看到一个明确的答案,循环通过一个div和findind标签具有相同的类名。



我的第一个问题:



如果我有这样的话:

 code>< div id =carousel> 
< div id =images>

< div class =imageElement>
< img src =img / image1.jpg>
< / div>

< div class =imageElement>
< img src =img / image2.jpg>
< / div>

< div class =imageElement>
< img src =img / image3.jpg>
< / div>

< / div>

< / div>

所以我想让所有的img Src在div图像中与其他东西在imageElement类名,并将其复制到excel中的某些单元格。



第二个问题:
我已经看到两种方式来拉取网页内容使用VBA,一个使用IE和另一个代码使用东西但浏览器。

  Private Sub pullData_Click()

Dim x As Long,y As Long
Dim htm As Object

设置htm = CreateObject(htmlFile)

使用CreateObject(msxml2.xmlhttp )
。打开GET,http://website.html,False
.send
htm.body.innerHTML = .responsetext
结束

End Sub

第二种方式:

  Set ie = New InternetExplorer 
With ie
.navigatehttp://eoddata.com/stockquote/NASDAQ/AAPL.htm
.Visible = False
虽然.Busy或.readyState& LT;> READYSTATE_COMPLETE
DoEvents
Wend
设置objHTML = .document
DoEvents
结束
设置elementONE = objHTML.getElementsByTagName(TD)
对于i = 1 To elementONE.Length
elementTWO = elementONE.Item(i).innerText
如果elementTWO =08/10/12然后
MsgBox(elementONE.Item(i + 1 ).innerText)
退出
结束如果
下一个i
DoEvents
ie.Quit
DoEvents
设置ie = Nothing

哪一个更好,为什么?



所以如果



提前谢谢。

解决方案

您的第一个选项通常是可取的,因为它比第二种方法快得多,它直接向Web服务器发送请求并返回响应。这比自动化Internet Explorer更有效率(第二个选项);自动化IE非常慢,因为您只是浏览网站 - 这将不可避免地导致更多下载,因为它必须加载页面中的所有资源 - 图像,脚本,CSS文件等。它还将在页面上运行任何Javascript - 所有这一切通常是没有用的,你必须等待它在解析页面之前完成。但是这是一把双刃剑 - 而且要慢一些,如果你不熟悉html请求,自动化Internet Explorer比第一种方法要容易得多,特别是当动态生成元素或页面依赖于AJAX时。当您需要访问需要登录的站点中的数据时,自动化IE也会更容易,因为它会处理相关的Cookie。这并不是说网页抓取不能用第一种方法完成,而不是需要更深入地了解网络技术和网站架构。



更好的第一种方法的选择是使用不同的对象来处理请求和响应,使用WinHTTP库提供比MSXML库更多的弹性,并且通常也会自动处理任何cookie。



对于解析数据,在第一种方法中,您已经使用后期绑定来创建HTML对象(htmlfile),而这减少了对引用的需求,同时也减少了功能。例如,当使用后期绑定时,如果用户安装了IE9,则会丢失添加的功能,特别是在这种情况下,getElementsByClass名称函数。



第三个选项(和我的首选方法):

  Dim oHtml As HTMLDocument 
Dim oElement As Object

设置oHtml =新的HTMLDocument


使用CreateObject(WINHTTP.WinHTTPRequest.5.1)
。打开GET,http://www.someurl。 com,False
.send
oHtml.body.innerHTML = .responseText
结束

对于每个oElement在oHtml.getElementsByClassName(imageElement)
Debug.Print oElement.Children(0).src
下一个oElement

'IE 8 alternative
'对于每个oElement在oHtml.getElementsByTagName(div)
'如果oElement.className =imageElement然后
'Debug.Print oElement.Children(0).src
'End If
'Next oElement

这将需要使用 Microsoft HTML对象库的参考设置 - 如果用户没有安装IE9,则会失败,但这可以被处理并且变得越来越不相关>

I know this has been asked many times, but haven't seen a clear answer for looping thru a div and findind tags with the same classname.

My first question:

If I have something like this:

<div id="carousel">
   <div id="images">

       <div class="imageElement">
          <img src="img/image1.jpg">
       </div>

       <div class="imageElement">
          <img src="img/image2.jpg">
       </div>

       <div class="imageElement">
           <img src="img/image3.jpg">
       </div>

   </div>

</div>

So I want to get all the img Src in the div "images" along with other stuff in the imageElement classnames and copy them to some cells in excel.

Second question: I've seen two ways in pulling web content with VBA, one using IE and another code using something but a browser.

Private Sub pullData_Click()

    Dim x As Long, y As Long
    Dim htm As Object

    Set htm = CreateObject("htmlFile")

    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "http://website.html", False
        .send
        htm.body.innerHTML = .responsetext
    End With

End Sub

And second way:

Set ie = New InternetExplorer
    With ie
        .navigate "http://eoddata.com/stockquote/NASDAQ/AAPL.htm"
        .Visible = False
        While .Busy Or .readyState <> READYSTATE_COMPLETE
           DoEvents
        Wend
        Set objHTML = .document
        DoEvents
    End With
    Set elementONE = objHTML.getElementsByTagName("TD")
    For i = 1 To elementONE.Length
        elementTWO = elementONE.Item(i).innerText           
        If elementTWO = "08/10/12" Then
            MsgBox (elementONE.Item(i + 1).innerText)
            Exit For
        End If
    Next i
    DoEvents
    ie.Quit
    DoEvents
    Set ie = Nothing

Which one is better and why?

So if you can help me I'd appreciate.

Thank you in advance.

解决方案

Your first option is usually preferable since it is much faster than the second method, it sends a request directly to the web server and returns the response. This is much more efficient than automating Internet Explorer (the second option); automating IE is very slow, since you are effectively just browsing the site - it will inevitably result in more downloads as it must load all the resources in the page - images, scripts, css files etc. It will also run any Javascript on the page - all of this is usually not useful and you have to wait for it to finish before parsing the page.

This however is a bit of a double edged sword - whilst much slower, if you are not familiar with html requests, automating Internet Explorer is substantially easier than the first method, especially when elements are generated dynamically or the page has a reliance on AJAX. It is also easier to automate IE when you need to access data in a site that requires you to log in since it will handle the relevant cookies for you. This is not to say that web scraping cannot be done with the first method, rather than it requires a deeper understanding of web technologies and the architecture of the site.

A better option to the first method would be to use a different object to handle the request and response, using the WinHTTP library offers more resilience than the MSXML library and will generally handle any cookies automatically as well.

As for parsing the data, in your first approach you have used late binding to create the HTML Object (htmlfile), whilst this reduces the need for a reference, it also reduces functionality. For example, when using late binding, you are missing out on the features added if the user has IE9 installed, specifically in this case the getElementsByClass name function.

As such a third option (and my preferred method):

Dim oHtml       As HTMLDocument
Dim oElement    As Object

Set oHtml = New HTMLDocument


With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", "http://www.someurl.com", False
    .send
    oHtml.body.innerHTML = .responseText
End With

For Each oElement In oHtml.getElementsByClassName("imageElement")
    Debug.Print oElement.Children(0).src
Next oElement

'IE 8 alternative
'For Each oElement In oHtml.getElementsByTagName("div")
'    If oElement.className = "imageElement" Then
'        Debug.Print oElement.Children(0).src
'    End If
'Next oElement

This will require a reference setting to the Microsoft HTML Object Library - it will fail if the user does not have IE9 installed, but this can be handled and is becoming increasingly less relevant

这篇关于从使用VBA excel多个类名的网站拉数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆