使用VBA进行Web屏蔽URL的刮 [英] Web scraping of masked URL using VBA

查看:54
本文介绍了使用VBA进行Web屏蔽URL的刮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从网站

您需要通过VBA重现该请求并解析HTML响应.有一个示例显示了如何执行此操作:

  Option Explicit子测试()'添加参考'Microsoft HTML对象库'Microsoft XML,v6.0Dim sResp作为字符串Dim rOutputCell作为范围点心将cElements设置为IHTMLElementCollectionDim oTableRowDim oTableCell'从网站检索HTML使用新的MSXML2.XMLHTTP60' 发送请求.打开"POST","https://dps.psx.com.pk/webpages/SL_main_page.php",True.SetRequestHeader内容类型",应用程序/x-www-form-urlencoded".发送"symbolCode = EFOODS"做到.ReadyState = 4:DoEvents:循环sResp = .ResponseText结束于解析响应和输出使用新的HTMLDocument'将响应HTML加载到DOM中.body.innerHTML = sResp'清除第一个工作表以进行输出ThisWorkbook.Sheets(1).Cells.Delete'解析SL_cmpInfo表并输出设置rOutputCell = ThisWorkbook.Sheets(1).Cells(1,1)设置oElememnt = .getElementsByClassName("SL_cmpText")(0)rOutputCell.Value = oElememnt.innerText'解析SL_mktStats1表并输出设置rOutputCell = Cells(3,1)设置cElements = .getElementsByClassName("SL_mktStats1")对于数组中的每个元素(cElements(1),cElements(2),cElements(3))对于oElememnt.getElementsByTagName("tr")中的每个oTableRow对于oTableRow.getElementsByTagName("td")中的每个oTableCellrOutputCell.Value = oTableCell.innerText设置rOutputCell = rOutputCell.Offset(0,1)下一个设置rOutputCell = rOutputCell.Offset(1、0).EntireRow.Cells(1、1)下一个下一个'解析SL_announce表并输出设置rOutputCell = rOutputCell.Offset(1,0)设置oElememnt = .getElementsByClassName("SL_announce")(0)对于oElememnt.getElementsByTagName("tr")中的每个oTableRow对于oTableRow.getElementsByTagName("td")中的每个oTableCellrOutputCell.Value = oTableCell.innerText设置rOutputCell = rOutputCell.Offset(0,1)下一个设置rOutputCell = rOutputCell.Offset(1、0).EntireRow.Cells(1、1)下一个结束于MsgBox已完成"结束子 

别忘了添加必要的参考文献:

对我来说输出如下:

根据需要:

I want to scrape some stock data from a website https://dps.psx.com.pk/ using VBA in Excel, but the problem is the URL of this website does not change.

When I click on market summary as highlighted in image#1 Image#1

that will return the whole market summary, I just need to scrape data in Excel using VBA as highlighted in the image#2. Image#2

I tried to examine the network with fiddler as shown in image#3 Image#3

and develop the following code in VBA.

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell



    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "GET", "https://dps.psx.com.pk/webpages/mktSummary.php?r=REG", True
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    
    
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        
        Set rOutputCell = Cells(3, 1)
        Set oElememnt = .getElementsByClassName("tableHead")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    
      End With

    MsgBox "Completed"

End Sub

But when I run this code it just shows running but nothing happens even after waiting for sometime. I don't know whether it got stuck in Event Loop or some other problem is there please help.

解决方案

All the necessary info to scrape that data you may find in captured by Fiddler request which is logged after you made an input of the quote symbol in a browser manual:

You need to reproduce that request via VBA and parse HTML response. There is the example showing how that might be done:

Option Explicit

Sub Test()

    ' Add references
    ' Microsoft HTML Object Library
    ' Microsoft XML, v6.0

    Dim sResp As String
    Dim rOutputCell As Range
    Dim oElememnt
    Dim cElements As IHTMLElementCollection
    Dim oTableRow
    Dim oTableCell

    ' Retrieve HTML from website
    With New MSXML2.XMLHTTP60
        ' Send request
        .Open "POST", "https://dps.psx.com.pk/webpages/SL_main_page.php", True
        .SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .Send "symbolCode=EFOODS"
        Do Until .ReadyState = 4: DoEvents: Loop
        sResp = .ResponseText
    End With
    ' Parse response and output
    With New HTMLDocument
        ' Load response HTML into DOM
        .body.innerHTML = sResp
        ' Clear first worksheet for output
        ThisWorkbook.Sheets(1).Cells.Delete
        ' Parse SL_cmpInfo table and output
        Set rOutputCell = ThisWorkbook.Sheets(1).Cells(1, 1)
        Set oElememnt = .getElementsByClassName("SL_cmpText")(0)
        rOutputCell.Value = oElememnt.innerText
        ' Parse SL_mktStats1 tables and output
        Set rOutputCell = Cells(3, 1)
        Set cElements = .getElementsByClassName("SL_mktStats1")
        For Each oElememnt In Array(cElements(1), cElements(2), cElements(3))
            For Each oTableRow In oElememnt.getElementsByTagName("tr")
                For Each oTableCell In oTableRow.getElementsByTagName("td")
                    rOutputCell.Value = oTableCell.innerText
                    Set rOutputCell = rOutputCell.Offset(0, 1)
                Next
                Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
            Next
        Next
        ' Parse SL_announce table and output
        Set rOutputCell = rOutputCell.Offset(1, 0)
        Set oElememnt = .getElementsByClassName("SL_announce")(0)
        For Each oTableRow In oElememnt.getElementsByTagName("tr")
            For Each oTableCell In oTableRow.getElementsByTagName("td")
                rOutputCell.Value = oTableCell.innerText
                Set rOutputCell = rOutputCell.Offset(0, 1)
            Next
            Set rOutputCell = rOutputCell.Offset(1, 0).EntireRow.Cells(1, 1)
        Next
    End With
    MsgBox "Completed"

End Sub

Don't forget to add the necessary references:

The output for me is as follows:

As required:

这篇关于使用VBA进行Web屏蔽URL的刮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆