使用VBA抓取AJAX页面 [英] Scraping an AJAX page using VBA
问题描述
我一直在尝试抓取 entire HTML正文并将其分配为字符串变量,然后再处理该字符串以填充excel文件-这将在aa循环上完成,以更新每个日期每隔5分钟.
I've been trying to Scrape the entire HTML body and assign it as a string variable before manipulating that string to populate an excel file - this will be done on a a loop to update the date every 5 minute interval.
这些页面是AJAX页面,因此运行类似于JavaScript的代码(尽管我对JS一点都不熟悉).
These pages are AJAX pages, so run what looks like JavaScript (I'm not familiar with JS at all though).
我尝试使用XMLHttpRequest
对象(下面的代码),但是t返回了JS调用:
I've tried using the XMLHttpRequest
object (code below) but t returns the JS Calls:
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE", False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Debug.Print XMLHTTP.ResponseText
我尝试使用以下代码创建IE
对象,但是同样,出现同样的问题:
I've tried creating an IE
object with the below code but, again, same issue:
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "https://www.google.co.uk/finance?ei=bQ_iWLnjOoS_UeWcqsgE"
While IE.Busy Or IE.ReadyState <> 4: DoEvents: Wend
Set HTMLdoc = IE.Document
Debug.Print = HTMLdoc.Body.innerHTML
我想要的是当我按F12键并进入检查器选项卡时完全可用的文本(即,下面黄色部分中的全部文本)-如果可以(完全展开),我可以从那里.任何帮助将不胜感激.
What I want it exactly text available to me when I hit F12 and got to the inspector tab (ie. the entirety of the text within the yellow section below) - If I could get this (full expanded) I could work from there. Any help would be massively appreciated.
在上面的示例(Google财经)中,索引价格异步更新-我想在分配字符串时捕获这些价格.
In the above example (Google finance), the index prices update asynchronously - I want to capture these at the time at which I assign the string.
推荐答案
对于任何动态加载的数据,您只需检查网页执行的XHR,找到包含相关数据的数据,并进行相同的XHR(两个站点都提供API或不提供API)并解析响应,或者在IE自动化的情况下,您添加了额外的等待循环,直到可以访问目标元素为止,然后从DOM中检索它.
For any dynamically loaded data you just inspect XHRs the webpage does, find the one containing the relevant data, make the same XHR (either site provides API or not) and parse response, or in case of IE automation you add extra wait loop until a target element becomes accessible, then retrieve it from DOM.
在某些情况下,您可以通过Google财经API获取数据.
In that certain case you can get the data via Google Finance API.
方法1.
要发出请求,您必须知道股票代码,可以在网页HTML内容或e中轻松找到该股票代码. G.如果单击CAC 40,则在打开的页面中将显示标题CAC 40(INDEXEURO:PX1).
To make the request you have to know stock symbols, which could be easily find within webpage HTML content, or e. g. if you click on CAC 40, in opened page there will be a title CAC 40 (INDEXEURO:PX1).
该页面的世界市场"表中有以下股票和证券交易所代码:
There are the following stock and stock exchange symbols in the World markets table on that page:
Shanghai SHA:000001
S&P 500 INDEXSP:.INX
Nikkei 225 INDEXNIKKEI:NI225
Hang Seng Index INDEXHANGSENG:HSI
TSEC TPE:TAIEX
EURO STOXX 50 INDEXSTOXX:SX5E
CAC 40 INDEXEURO:PX1
S&P TSX INDEXTSI:OSPTX
S&P/ASX 200 INDEXASX:XJO
BSE Sensex INDEXBOM:SENSEX
SMI INDEXSWX:SMI
ATX INDEXVIE:ATX
IBOVESPA INDEXBVMF:IBOV
SET INDEXBKK:SET
BIST100 INDEXIST:XU100
IBEX INDEXBME:IB
WIG WSE:WIG
TASI TADAWUL:TASI
MERVAL BCBA:IAR
IPC INDEXBMV:ME
IDX Composite IDX:COMPOSITE
将其放入网址:
The response contains JSON data, like this:
[
{
"id": "7521596",
"t": "000001",
"e": "SHA",
"l": "3,222.51",
"l_fix": "3222.51",
"l_cur": "CN¥3,222.51",
"s": "0",
"ltt": "3:01PM GMT+8",
"lt": "Mar 31, 3:01PM GMT+8",
"lt_dts": "2017-03-31T15:01:15Z",
"c": "+12.28",
"c_fix": "12.28",
"cp": "0.38",
"cp_fix": "0.38",
"ccol": "chg",
"pcls_fix": "3210.2368"
},
...
]
您可以使用下面的VBA代码解析响应并输出结果.它要求将 JSON.bas 模块导入到VBA项目中以进行JSON处理.>
You may use the below VBA code to parse response and output result. It requires JSON.bas module to be imported to VBA project for JSON processing.
Sub GoogleFinanceData()
Dim sJSONString As String
Dim vJSON As Variant
Dim sState As String
Dim aData()
Dim aHeader()
' Retrieve Google Finance data
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://finance.google.com/finance/info?q=SHA:000001,INDEXSP:.INX,INDEXNIKKEI:NI225,INDEXHANGSENG:HSI,TPE:TAIEX,INDEXSTOXX:SX5E,INDEXEURO:PX1,INDEXTSI:OSPTX,INDEXASX:XJO,INDEXBOM:SENSEX,INDEXSWX:SMI,INDEXVIE:ATX,INDEXBVMF:IBOV,INDEXBKK:SET,INDEXIST:XU100,INDEXBME:IB,WSE:WIG,TADAWUL:TASI,BCBA:IAR,INDEXBMV:ME,IDX:COMPOSITE", False
.Send
If .Status <> 200 Then Exit Sub
sJSONString = .responseText
End With
' Trim extraneous chars
sJSONString = Mid(sJSONString, InStr(sJSONString, "["))
' Parse JSON string
JSON.Parse sJSONString, vJSON, sState
If sState = "Error" Then Exit Sub
' Convert to table format
JSON.ToArray vJSON, aData, aHeader
' Results output
With Sheets(1)
.Cells.Delete
.Cells.WrapText = False
If UBound(aHeader) >= 0 Then OutputArray .Cells(1, 1), aHeader
Output2DArray .Cells(2, 1), aData
.Columns.AutoFit
End With
End Sub
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
因此,您需要的数据位于l_fix
,c_fix
,cp_fix
列中.
As a result the data you need is located in l_fix
, c_fix
, cp_fix
columns.
方法2.
此外,您还可以通过类似于CAC 40的URL来制作XHR:
Also you can make XHR by the URL like this one for CAC 40:
https://www.google.co.uk/finance/getprices?q=PX1&x=INDEXEURO&i=120&p=20m&f=d,c,v ,o,h,l
尤其是URL用于PX1股票和INDEXEURO股票交易所代码,间隔120秒,周期20分钟,响应数据d,c,v,o,h,l用于DATE(UNIX时间戳),CLOSE,VOLUME,OPEN ,高,低.
Particularly that URL is for PX1 stock and INDEXEURO stock exchange symbols, 120 sec interval, 20 minutes period, response data d,c,v,o,h,l is for DATE (UNIX TimeStamp), CLOSE, VOLUME, OPEN, HIGH, LOW.
响应格式如下:
EXCHANGE%3DINDEXEURO
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=1050
INTERVAL=120
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=120
a1491405000,5098.75,5099.92,5098.75,5099.92,0
1,5100.51,5100.51,5098.09,5098.09,0
2,5099.63,5101.2,5099.29,5100.68,0
3,5099.83,5100.04,5099.07,5099.28,0
4,5098.19,5098.9,5097.71,5098.9,0
5,5098.56,5099.24,5097.99,5099.24,0
6,5097.34,5098.2,5096.14,5098.2,0
7,5096.52,5097.38,5095.66,5097.38,0
8,5093.27,5095.39,5093.27,5095.39,0
9,5094.43,5094.43,5092.07,5093.17,0
10,5088.18,5092.72,5087.68,5092.72,0
应该对列表中的每个股票代码执行XHR,然后将结果合并到表中.
The XHR should be done for each stock symbol in the list, then results should be consolidated into table.
这篇关于使用VBA抓取AJAX页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!