使用 VBA 对屏蔽 URL 进行 Web 抓取 [英] Web scraping of masked URL using VBA

查看:19
本文介绍了使用 VBA 对屏蔽 URL 进行 Web 抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从一个网站

您需要通过 VBA 重现该请求并解析 HTML 响应.有一个例子展示了如何做到这一点:

选项显式子测试()' 添加引用' Microsoft HTML 对象库' 微软 XML,v6.0将 sResp 调暗为字符串将 rOutputCell 调暗为范围暗淡元素将 cElements 调暗为 IHTMLElementCollectionDim oTableRowDioTableCell' 从网站检索 HTML使用新的 MSXML2.XMLHTTP60' 发送请求.打开POST",https://dps.psx.com.pk/webpages/SL_main_page.php",真.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded".发送symbolCode=EFOODS"直到.ReadyState = 4:DoEvents:循环sResp = .ResponseText结束于' 解析响应并输出使用新的 HTMLDocument' 将响应 HTML 加载到 DOM.body.innerHTML = sResp' 清除输出的第一个工作表ThisWorkbook.Sheets(1).Cells.Delete' 解析 SL_cmpInfo 表并输出设置 rOutputCell = ThisWorkbook.Sheets(1).Cells(1, 1)设置 oElememnt = .getElementsByClassName("SL_cmpText")(0)rOutputCell.Value = oElememnt.innerText' 解析 SL_mktStats1 表并输出设置 rOutputCell = Cells(3, 1)设置 cElements = .getElementsByClassName("SL_mktStats1")对于数组中的每个元素 (cElements(1), cElements(2), cElements(3))对于 oElememnt.getElementsByTagName("tr") 中的每个 oTableRow对于 oTableRow.getElementsByTagName("td") 中的每个 oTableCellrOutputCell.Value = oTableCell.innerText设置 rOutputCell = rOutputCell.Offset(0, 1)下一个设置 rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)下一个下一个' 解析 SL_announce 表并输出设置 rOutputCell = rOutputCell.Offset(1, 0)设置 oElememnt = .getElementsByClassName("SL_announce")(0)对于 oElememnt.getElementsByTagName("tr") 中的每个 oTableRow对于 oTableRow.getElementsByTagName("td") 中的每个 oTableCellrOutputCell.Value = oTableCell.innerText设置 rOutputCell = rOutputCell.Offset(0, 1)下一个设置 rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)下一个结束于MsgBox "完成"结束子

不要忘记添加必要的引用:

我的输出如下:

根据需要:

I want to scrape some stock data from a website https://dps.psx.com.pk/ using VBA in Excel, but the problem is the URL of this website does not change.

When I click on market summary as highlighted in image#1 Image#1

that will return the whole market summary, I just need to scrape data in Excel using VBA as highlighted in the image#2. Image#2

I tried to examine the network with fiddler as shown in image#3 Image#3

and develop the following code in VBA.

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell



    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "GET", "https://dps.psx.com.pk/webpages/mktSummary.php?r=REG", True
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    
    
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        
        Set rOutputCell = Cells(3, 1)
        Set oElememnt = .getElementsByClassName("tableHead")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    
      End With

    MsgBox "Completed"

End Sub

But when I run this code it just shows running but nothing happens even after waiting for sometime. I don't know whether it got stuck in Event Loop or some other problem is there please help.

解决方案

All the necessary info to scrape that data you may find in captured by Fiddler request which is logged after you made an input of the quote symbol in a browser manual:

You need to reproduce that request via VBA and parse HTML response. There is the example showing how that might be done:

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell

    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "POST", "https://dps.psx.com.pk/webpages/SL_main_page.php", True
        .SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .Send "symbolCode=EFOODS"
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        ' Parse SL_cmpInfo table and output
        Set rOutputCell = ThisWorkbook.Sheets(1).Cells(1, 1)
        Set oElememnt = .getElementsByClassName("SL_cmpText")(0)
        rOutputCell.Value = oElememnt.innerText
        ' Parse SL_mktStats1 tables and output
        Set rOutputCell = Cells(3, 1)
        Set cElements = .getElementsByClassName("SL_mktStats1")
        For Each oElememnt In Array(cElements(1), cElements(2), cElements(3))
            For Each oTableRow In oElememnt.getElementsByTagName("tr")
                For Each oTableCell In oTableRow.getElementsByTagName("td")
                    rOutputCell.Value = oTableCell.innerText
                    Set rOutputCell = rOutputCell.Offset(0, 1)
                Next
                Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
            Next
        Next
        ' Parse SL_announce table and output
        Set rOutputCell = rOutputCell.Offset(1, 0)
        Set oElememnt = .getElementsByClassName("SL_announce")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    End With
    MsgBox "Completed"

End Sub

Don't forget to add the necessary references:

The output for me is as follows:

As required:

这篇关于使用 VBA 对屏蔽 URL 进行 Web 抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆