使用VBA抓取AJAX页面 [英] Scraping an AJAX page using VBA

查看:308
本文介绍了使用VBA抓取AJAX页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试抓取 entire HTML正文并将其分配为字符串变量,然后再处理该字符串以填充excel文件-这将在aa循环上完成,以更新每个日期每隔5分钟.

I've been trying to Scrape the entire HTML body and assign it as a string variable before manipulating that string to populate an excel file - this will be done on a a loop to update the date every 5 minute interval.

这些页面是AJAX页面,因此运行类似于JavaScript的代码(尽管我对JS一点都不熟悉).

These pages are AJAX pages, so run what looks like JavaScript (I'm not familiar with JS at all though).

我尝试使用XMLHttpRequest对象(下面的代码),但是t返回了JS调用:

I've tried using the XMLHttpRequest object (code below) but t returns the JS Calls:

Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE", False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Debug.Print XMLHTTP.ResponseText

我尝试使用以下代码创建IE对象,但是同样,出现同样的问题:

I've tried creating an IE object with the below code but, again, same issue:

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE"
While IE.Busy Or IE.ReadyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.Document
Debug.Print = HTMLdoc.Body.innerHTML

我想要的是当我按F12键并进入检查器选项卡时完全可用的文本(即,下面黄色部分中的全部文本)-如果可以(完全展开),我可以从那里.任何帮助将不胜感激.

What I want it exactly text available to me when I hit F12 and got to the inspector tab (ie. the entirety of the text within the yellow section below) - If I could get this (full expanded) I could work from there. Any help would be massively appreciated.

在上面的示例(Google财经)中,索引价格异步更新-我想在分配字符串时捕获这些价格.

In the above example (Google finance), the index prices update asynchronously - I want to capture these at the time at which I assign the string.

推荐答案

对于任何动态加载的数据,您只需检查网页执行的XHR,找到包含相关数据的数据,并进行相同的XHR(两个站点都提供API或不提供API)并解析响应,或者在IE自动化的情况下,您添加了额外的等待循环,直到可以访问目标元素为止,然后从DOM中检索它.

For any dynamically loaded data you just inspect XHRs the webpage does, find the one containing the relevant data, make the same XHR (either site provides API or not) and parse response, or in case of IE automation you add extra wait loop until a target element becomes accessible, then retrieve it from DOM.

在某些情况下,您可以通过Google财经API获取数据.

In that certain case you can get the data via Google Finance API.

方法1.

要发出请求,您必须知道股票代码,可以在网页HTML内容或e中轻松找到该股票代码. G.如果单击CAC 40,则在打开的页面中将显示标题CAC 40(INDEXEURO:PX1).

To make the request you have to know stock symbols, which could be easily find within webpage HTML content, or e. g. if you click on CAC 40, in opened page there will be a title CAC 40 (INDEXEURO:PX1).

该页面的世界市场"表中有以下股票和证券交易所代码:

There are the following stock and stock exchange symbols in the World markets table on that page:

Shanghai            SHA:000001
S&P 500             INDEXSP:.INX
Nikkei 225          INDEXNIKKEI:NI225
Hang Seng Index     INDEXHANGSENG:HSI
TSEC                TPE:TAIEX
EURO STOXX 50       INDEXSTOXX:SX5E
CAC 40              INDEXEURO:PX1
S&P TSX             INDEXTSI:OSPTX
S&P/ASX 200         INDEXASX:XJO
BSE Sensex          INDEXBOM:SENSEX
SMI                 INDEXSWX:SMI
ATX                 INDEXVIE:ATX
IBOVESPA            INDEXBVMF:IBOV
SET                 INDEXBKK:SET
BIST100             INDEXIST:XU100
IBEX                INDEXBME:IB
WIG                 WSE:WIG
TASI                TADAWUL:TASI
MERVAL              BCBA:IAR
IPC                 INDEXBMV:ME
IDX Composite       IDX:COMPOSITE

将其放入网址:

响应包含JSON数据,如下所示:

The response contains JSON data, like this:

[
    {
        "id": "7521596",
        "t": "000001",
        "e": "SHA",
        "l": "3,222.51",
        "l_fix": "3222.51",
        "l_cur": "CN¥3,222.51",
        "s": "0",
        "ltt": "3:01PM GMT+8",
        "lt": "Mar 31, 3:01PM GMT+8",
        "lt_dts": "2017-03-31T15:01:15Z",
        "c": "+12.28",
        "c_fix": "12.28",
        "cp": "0.38",
        "cp_fix": "0.38",
        "ccol": "chg",
        "pcls_fix": "3210.2368"
    },
    ...
]

您可以使用下面的VBA代码解析响应并输出结果.它要求将 JSON.bas 模块导入到VBA项目中以进行JSON处理.

You may use the below VBA code to parse response and output result. It requires JSON.bas module to be imported to VBA project for JSON processing.

Sub GoogleFinanceData()

    Dim sJSONString As String
    Dim vJSON As Variant
    Dim sState As String
    Dim aData()
    Dim aHeader()

    ' Retrieve Google Finance data
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE", False
        .Send
        If .Status <> 200 Then Exit Sub
        sJSONString = .responseText
    End With
    ' Trim extraneous chars
    sJSONString = Mid(sJSONString, InStr(sJSONString, "["))
    ' Parse JSON string
    JSON.Parse sJSONString, vJSON, sState
    If sState = "Error" Then Exit Sub
    ' Convert to table format
    JSON.ToArray vJSON, aData, aHeader
    ' Results output
    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
        If UBound(aHeader) >= 0 Then OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
    End With

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

因此,您需要的数据位于l_fixc_fixcp_fix列中.

As a result the data you need is located in l_fix, c_fix, cp_fix columns.

方法2.

此外,您还可以通过类似于CAC 40的URL来制作XHR:

Also you can make XHR by the URL like this one for CAC 40:

https://www.google.co.uk/finance/getprices?q=PX1&x=INDEXEURO&i=120&p=20m&f=d,c,v ,o,h,l

尤其是URL用于PX1股票和INDEXEURO股票交易所代码,间隔120秒,周期20分钟,响应数据d,c,v,o,h,l用于DATE(UNIX时间戳),CLOSE,VOLUME,OPEN ,高,低.

Particularly that URL is for PX1 stock and INDEXEURO stock exchange symbols, 120 sec interval, 20 minutes period, response data d,c,v,o,h,l is for DATE (UNIX TimeStamp), CLOSE, VOLUME, OPEN, HIGH, LOW.

响应格式如下:

EXCHANGE%3DINDEXEURO
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=1050
INTERVAL=120
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=120
a1491405000,5098.75,5099.92,5098.75,5099.92,0
1,5100.51,5100.51,5098.09,5098.09,0
2,5099.63,5101.2,5099.29,5100.68,0
3,5099.83,5100.04,5099.07,5099.28,0
4,5098.19,5098.9,5097.71,5098.9,0
5,5098.56,5099.24,5097.99,5099.24,0
6,5097.34,5098.2,5096.14,5098.2,0
7,5096.52,5097.38,5095.66,5097.38,0
8,5093.27,5095.39,5093.27,5095.39,0
9,5094.43,5094.43,5092.07,5093.17,0
10,5088.18,5092.72,5087.68,5092.72,0

应该对列表中的每个股票代码执行XHR,然后将结果合并到表中.

The XHR should be done for each stock symbol in the list, then results should be consolidated into table.

这篇关于使用VBA抓取AJAX页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆