Excel VBA web 源代码 - 如何将多个字段提取到一张表 [英] Excel VBA web source code - how to extract multiple fields to one sheet
问题描述
各位下午好.在 QHarr 非常解决的上一个查询的后续操作中,我想针对源代码中的多个字段运行已解决的查询,而不仅仅是一个.
我使用的 URL 是: 现在提供了 Yahoo Finance API 用于*的信息.有一个不错的 JS 教程
<小时>关于 GetInfo
方法和 CSS 选择器的注意事项:
GetInfo
的类方法使用 css 组合选择器从每个网页中提取信息,以定位页面样式.
我们在每一页上寻找的信息都在两个相邻的表格中,例如:
我没有处理多个表格,而是使用 tbody td
的选择器组合来定位表格主体元素内的所有表格单元格.
通过HTMLDocument
的querySelectorAll
方法应用CSS 选择器组合,返回一个静态nodeList
.
返回的 nodeList
项目在偶数索引处具有标题,在奇数索引处具有所需数据.我只想要前两个信息表,所以当我给出感兴趣的标题长度的两倍时,我终止了返回的 nodeList
的循环.我使用第 2 步循环从索引 1 只检索感兴趣的数据,减去标题.
返回的 nodeList
示例:
参考资料(VBE > 工具 > 参考资料):
- 微软 HTML 对象库
<小时>
Alpha Vantage API:
快速浏览时间序列API
调用表明可以使用字符串
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey
这会生成一个 JSON 响应,在整个返回字典的 Time Series (Daily)
子字典中,返回了 199 个日期.每个日期都有以下信息:
对文档进行一点挖掘将揭示是否可以捆绑股票代码,我无法很快看到这一点,以及是否可以通过不同的查询字符串获得更多您感兴趣的初始项目.
还有更多信息,例如在URL调用中使用TIME_SERIES_DAILY_ADJUSTED
函数
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey
在这里,您将获得以下信息:
您可以使用 JSON 解析器解析 JSON 响应,例如 JSONConverter.bas 并且还有用于 csv 下载的选项.
* 值得对哪些 API 提供最多覆盖您的项目进行一些研究.Alpha Vantage 似乎没有我上面的代码检索到的那么多.
Good afternoon guys. In a follow up to a previous query which was very much solved by QHarr, I was wanting to run the solved query against multiple fields from the source code rather than just one.
The URL I am using is: https://finance.yahoo.com/quote/AAPL/?p=AAPL
and the VBA code which takes the 'Previous Close'
price is:
Option Explicit
Sub PreviousClose()
Dim html As HTMLDocument, http As Object, ticker As Range
Set html = New HTMLDocument
Set http = CreateObject("WINHTTP.WinHTTPRequest.5.1")
Dim lastRow As Long, myrng As Range
With ThisWorkbook.Worksheets("Tickers")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Set myrng = .Range("A2:A" & lastRow)
For Each ticker In myrng
If Not IsEmpty(ticker) Then
With http
.Open "GET", "https://finance.yahoo.com/quote/" & ticker.Value & "?p=" & ticker.Value, False
.send
html.body.innerHTML = .responseText
End With
On Error Resume Next
ticker.Offset(, 1) = html.querySelector("[data-test=PREV_CLOSE-value]").innertext
On Error GoTo 0
End If
Next
End With
End Sub
Anyway, each field would ideally be in a row right of the ticker used for the stock.
Screenshot of Sheet:
Any help would be very much appreciated.
Thanks.
tl;dr;
The code below works for the given test cases. With much longer lists please see the ToDo
section.
API:
You want to look into an API to provide this info if possible. I believe Alpha Vantage now provide info the Yahoo Finance API used to* . There is a nice JS tutorial here. Alpha Vantage documentation here. At the very bottom of this answer, I have a quick look at the time series functions available via the API.
WEBSERVICE function:
With an API key, you can also potentially use the webservice function in Excel to retrieve and parse data. Example here. Not tested.
XMLHTTPRequest and class:
However, I will show you a way using a class and a loop over URLs. You can improve on this. I use a bare bones class called clsHTTP
to hold the XMLHTTP request object. I give it 2 methods. One, GetHTMLDoc
, to return the request response in an html document, and the other, GetInfo
, to return an array of the items of interest from the page.
Using a class in this way means we save on the overhead of repeatedly creating and destroying the xmlhttp object and provides a nice descriptive set of exposed methods to handle the required tasks.
It is assumed your data is as shown, with header row being row 2.
ToDo:
The immediately obvious development, IMO, is you will want to add some error handling in. For example, you might want to develop the class to handle server errors.
VBA:
So, in your project you add a class module called clsHTTP
and put the following:
clsHTTP
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetHTMLDoc(ByVal URL As String) As HTMLDocument
Dim html As HTMLDocument
Set html = New HTMLDocument
With http
.Open "GET", URL, False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
Set GetHTMLDoc = html
End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument, ByVal endPoint As Long) As Variant
Dim nodeList As Object, i As Long, result(), counter As Long
Set nodeList = html.querySelectorAll("tbody td")
ReDim result(0 To endPoint - 1)
For i = 1 To 2 * endPoint Step 2
result(counter) = nodeList.item(i).innerText
counter = counter + 1
Next
GetInfo = result
End Function
In a standard module (module 1)
Option Explicit
Public Sub GetYahooInfo()
Dim tickers(), ticker As Long, lastRow As Long, headers()
Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument
Application.ScreenUpdating = False
Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers
Set http = New clsHTTP
headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
"Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")
With wsSource
lastRow = GetLastRow(wsSource, 1)
Select Case lastRow
Case Is < 3
Exit Sub
Case 3
ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
Case Is > 3
tickers = .Range("A3:A" & lastRow).Value
End Select
ReDim results(0 To UBound(tickers, 1) - 1)
Dim i As Long, endPoint As Long
endPoint = UBound(headers)
For ticker = LBound(tickers, 1) To UBound(tickers, 1)
If Not IsEmpty(tickers(ticker, 1)) Then
Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
results(ticker - 1) = http.GetInfo(html, endPoint)
Set html = Nothing
Else
results(ticker) = vbNullString
End If
Next
.Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
For i = LBound(results) To UBound(results)
.Cells(3 + i, 2).Resize(1, endPoint-1) = results(i)
Next
End With
Application.ScreenUpdating = True
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
Results:
Notes on GetInfo
method and CSS selectors:
The class method of GetInfo
extracts the info from each webpage using a css combination selector to target the page styling.
The info we are after on each page is house in two adjacent tables, for example:
Rather than mess around with multiple tables I simply target all the table cells, within table body elements, with a selector combination of tbody td
.
The CSS selector combination is applied via the querySelectorAll
method of HTMLDocument
, returning a static nodeList
.
The returned nodeList
items have headers at even indices and the required data at odd indices. I only want the first two tables of info so I terminate the loop over the returned nodeList
when I gave gone twice the length of the headers of interest. I use a step 2 loop from index 1 to retrieve only the data of interest, minus the headers.
A sample of what the returned nodeList
looks like:
References (VBE > Tools > References):
- Microsoft HTML Object Library
Alpha Vantage API:
A quick look at the time series API
call shows that a string can be used
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey
This yields a JSON response that in the Time Series (Daily)
sub dictionary of the overall returned dictionary, has 199 dates returned. Each date has the following info:
A little digging through the documentation will unveil whether bundling of tickers is possible, I couldn't see this quickly, and whether more of your initial items of interest are available via a different query string.
There is more info, for example, using the TIME_SERIES_DAILY_ADJUSTED
function in the URL call
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey
Here, you then get the following:
You can parse the JSON response using a JSON parser such as JSONConverter.bas and there are also options for csv download.
* Worth doing some research on which APIs provide the most coverage of your items. Alpha Vantage doesn't appear to cover as many as my code above retrieves.
这篇关于Excel VBA web 源代码 - 如何将多个字段提取到一张表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!