Excel VBA Web源代码-如何将多个字段提取到一张工作表 [英] Excel VBA web source code - how to extract multiple fields to one sheet
问题描述
大家下午好.在对先前由QHarr解决的查询进行的后续操作中,我想对源代码中的多个字段(而不是一个字段)运行已解决的查询.
我使用的URL是:现在可以提供Yahoo Finance API用于*的信息.在
关于 GetInfo
方法和CSS选择器的说明:
GetInfo
的类方法使用CSS组合选择器从每个网页提取信息,以定位页面样式.
我们在每个页面上需要的信息位于两个相邻表中,例如:
我不会乱搞多个表,而只是将表主体元素内的所有表单元格定位为使用 tbody td
的选择器组合.
CSS选择器组合是通过 HTMLDocument
的 querySelectorAll
方法应用的,并返回静态的 nodeList
.
返回的 nodeList
项在偶数索引处具有标头,而在奇数索引处具有所需的数据.我只想要信息的前两个表,所以当我给出感兴趣的标头长度的两倍时,就终止了返回的 nodeList
上的循环.我使用从索引1开始的第2步循环来仅检索感兴趣的数据,减去标题.
返回的 nodeList
的示例如下:
参考(VBE>工具>参考):
- Microsoft HTML对象库
Alpha Vantage API:
快速浏览时间序列API
调用显示可以使用字符串
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey
这会产生一个JSON响应,该响应在整个返回的字典的 Time Series(Daily)
子字典中,返回了199个日期.每个日期都有以下信息:
稍稍浏览一下文档,就会发现是否可以捆绑股票行情自动收录,我无法很快看到,以及是否可以通过不同的查询字符串获得更多您感兴趣的初始项目.
还有更多信息,例如,在URL调用中使用 TIME_SERIES_DAILY_ADJUSTED
函数
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey
在这里,您将得到以下信息:
您可以使用JSON解析器来解析JSON响应,例如 JSONConverter.bas ,还有用于csv下载的选项.
* 值得进行一些研究,以了解哪些API提供了最多的商品覆盖率.Alpha Vantage似乎没有覆盖我上面检索到的代码.
Good afternoon guys. In a follow up to a previous query which was very much solved by QHarr, I was wanting to run the solved query against multiple fields from the source code rather than just one.
The URL I am using is: https://finance.yahoo.com/quote/AAPL/?p=AAPL
and the VBA code which takes the 'Previous Close'
price is:
Option Explicit
Sub PreviousClose()
Dim html As HTMLDocument, http As Object, ticker As Range
Set html = New HTMLDocument
Set http = CreateObject("WINHTTP.WinHTTPRequest.5.1")
Dim lastRow As Long, myrng As Range
With ThisWorkbook.Worksheets("Tickers")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Set myrng = .Range("A2:A" & lastRow)
For Each ticker In myrng
If Not IsEmpty(ticker) Then
With http
.Open "GET", "https://finance.yahoo.com/quote/" & ticker.Value & "?p=" & ticker.Value, False
.send
html.body.innerHTML = .responseText
End With
On Error Resume Next
ticker.Offset(, 1) = html.querySelector("[data-test=PREV_CLOSE-value]").innertext
On Error GoTo 0
End If
Next
End With
End Sub
Anyway, each field would ideally be in a row right of the ticker used for the stock.
Screenshot of Sheet:
Any help would be very much appreciated.
Thanks.
tl;dr;
The code below works for the given test cases. With much longer lists please see the ToDo
section.
API:
You want to look into an API to provide this info if possible. I believe Alpha Vantage now provide info the Yahoo Finance API used to* . There is a nice JS tutorial here. Alpha Vantage documentation here. At the very bottom of this answer, I have a quick look at the time series functions available via the API.
WEBSERVICE function:
With an API key, you can also potentially use the webservice function in Excel to retrieve and parse data. Example here. Not tested.
XMLHTTPRequest and class:
However, I will show you a way using a class and a loop over URLs. You can improve on this. I use a bare bones class called clsHTTP
to hold the XMLHTTP request object. I give it 2 methods. One, GetHTMLDoc
, to return the request response in an html document, and the other, GetInfo
, to return an array of the items of interest from the page.
Using a class in this way means we save on the overhead of repeatedly creating and destroying the xmlhttp object and provides a nice descriptive set of exposed methods to handle the required tasks.
It is assumed your data is as shown, with header row being row 2.
ToDo:
The immediately obvious development, IMO, is you will want to add some error handling in. For example, you might want to develop the class to handle server errors.
VBA:
So, in your project you add a class module called clsHTTP
and put the following:
clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTMLDoc(ByVal URL As String) As HTMLDocument
Dim html As HTMLDocument
Set html = New HTMLDocument
With http
.Open "GET", URL, False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
Set GetHTMLDoc = html
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument, ByVal endPoint As Long) As Variant
Dim nodeList As Object, i As Long, result(), counter As Long
Set nodeList = html.querySelectorAll("tbody td")
ReDim result(0 To endPoint - 1)
For i = 1 To 2 * endPoint Step 2
result(counter) = nodeList.item(i).innerText
counter = counter + 1
Next
GetInfo = result
End Function
In a standard module (module 1)
Option Explicit
Public Sub GetYahooInfo()
Dim tickers(), ticker As Long, lastRow As Long, headers()
Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument
Application.ScreenUpdating = False
Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers
Set http = New clsHTTP
headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
"Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")
With wsSource
lastRow = GetLastRow(wsSource, 1)
Select Case lastRow
Case Is < 3
Exit Sub
Case 3
ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
Case Is > 3
tickers = .Range("A3:A" & lastRow).Value
End Select
ReDim results(0 To UBound(tickers, 1) - 1)
Dim i As Long, endPoint As Long
endPoint = UBound(headers)
For ticker = LBound(tickers, 1) To UBound(tickers, 1)
If Not IsEmpty(tickers(ticker, 1)) Then
Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
results(ticker - 1) = http.GetInfo(html, endPoint)
Set html = Nothing
Else
results(ticker) = vbNullString
End If
Next
.Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
For i = LBound(results) To UBound(results)
.Cells(3 + i, 2).Resize(1, endPoint-1) = results(i)
Next
End With
Application.ScreenUpdating = True
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
Results:
Notes on GetInfo
method and CSS selectors:
The class method of GetInfo
extracts the info from each webpage using a css combination selector to target the page styling.
The info we are after on each page is house in two adjacent tables, for example:
Rather than mess around with multiple tables I simply target all the table cells, within table body elements, with a selector combination of tbody td
.
The CSS selector combination is applied via the querySelectorAll
method of HTMLDocument
, returning a static nodeList
.
The returned nodeList
items have headers at even indices and the required data at odd indices. I only want the first two tables of info so I terminate the loop over the returned nodeList
when I gave gone twice the length of the headers of interest. I use a step 2 loop from index 1 to retrieve only the data of interest, minus the headers.
A sample of what the returned nodeList
looks like:
References (VBE > Tools > References):
- Microsoft HTML Object Library
Alpha Vantage API:
A quick look at the time series API
call shows that a string can be used
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey
This yields a JSON response that in the Time Series (Daily)
sub dictionary of the overall returned dictionary, has 199 dates returned. Each date has the following info:
A little digging through the documentation will unveil whether bundling of tickers is possible, I couldn't see this quickly, and whether more of your initial items of interest are available via a different query string.
There is more info, for example, using the TIME_SERIES_DAILY_ADJUSTED
function in the URL call
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey
Here, you then get the following:
You can parse the JSON response using a JSON parser such as JSONConverter.bas and there are also options for csv download.
* Worth doing some research on which APIs provide the most coverage of your items. Alpha Vantage doesn't appear to cover as many as my code above retrieves.
这篇关于Excel VBA Web源代码-如何将多个字段提取到一张工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!