Excel VBA web 源代码 - 如何将多个字段提取到一张表 [英] Excel VBA web source code - how to extract multiple fields to one sheet

查看:30
本文介绍了Excel VBA web 源代码 - 如何将多个字段提取到一张表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

各位下午好.在 QHarr 非常解决的上一个查询的后续操作中,我想针对源代码中的多个字段运行已解决的查询,而不仅仅是一个.

我使用的 URL 是: 现在提供了 Yahoo Finance API 用于*的信息.有一个不错的 JS 教程

<小时>

关于 GetInfo 方法和 CSS 选择器的注意事项:

GetInfo 的类方法使用 css 组合选择器从每个网页中提取信息,以定位页面样式.

我们在每一页上寻找的信息都在两个相邻的表格中,例如:

我没有处理多个表格,而是使用 tbody td 的选择器组合来定位表格主体元素内的所有表格单元格.

通过HTMLDocumentquerySelectorAll 方法应用CSS 选择器组合,返回一个静态nodeList.

返回的 nodeList 项目在偶数索引处具有标题,在奇数索引处具有所需数据.我只想要前两个信息表,所以当我给出感兴趣的标题长度的两倍时,我终止了返回的 nodeList 的循环.我使用第 2 步循环从索引 1 只检索感兴趣的数据,减去标题.

返回的 nodeList 示例:

<小时>

参考资料(VBE > 工具 > 参考资料):

  1. 微软 HTML 对象库

<小时>

Alpha Vantage API:

快速浏览时间序列API调用表明可以使用字符串

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey

这会生成一个 JSON 响应,在整个返回字典的 Time Series (Daily) 子字典中,返回了 199 个日期.每个日期都有以下信息:

对文档进行一点挖掘将揭示是否可以捆绑股票代码,我无法很快看到这一点,以及是否可以通过不同的查询字符串获得更多您感兴趣的初始项目.

还有更多信息,例如在URL调用中使用TIME_SERIES_DAILY_ADJUSTED函数

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey

在这里,您将获得以下信息:

您可以使用 JSON 解析器解析 JSON 响应,例如 JSONConverter.bas 并且还有用于 csv 下载的选项.

* 值得对哪些 API 提供最多覆盖您的项目进行一些研究.Alpha Vantage 似乎没有我上面的代码检索到的那么多.

Good afternoon guys. In a follow up to a previous query which was very much solved by QHarr, I was wanting to run the solved query against multiple fields from the source code rather than just one.

The URL I am using is: https://finance.yahoo.com/quote/AAPL/?p=AAPL

and the VBA code which takes the 'Previous Close' price is:

Option Explicit

    Sub PreviousClose()
        Dim html As HTMLDocument, http As Object, ticker As Range
        Set html = New HTMLDocument
        Set http = CreateObject("WINHTTP.WinHTTPRequest.5.1")

    Dim lastRow As Long, myrng As Range
    With ThisWorkbook.Worksheets("Tickers")

        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        Set myrng = .Range("A2:A" & lastRow)

        For Each ticker In myrng
            If Not IsEmpty(ticker) Then
                With http
                    .Open "GET", "https://finance.yahoo.com/quote/" & ticker.Value & "?p=" & ticker.Value, False
                    .send
                    html.body.innerHTML = .responseText
                End With
                On Error Resume Next
                ticker.Offset(, 1) = html.querySelector("[data-test=PREV_CLOSE-value]").innertext

                On Error GoTo 0
            End If
        Next

    End With
End Sub

Anyway, each field would ideally be in a row right of the ticker used for the stock.

Screenshot of Sheet:

Any help would be very much appreciated.
Thanks.

解决方案

tl;dr;

The code below works for the given test cases. With much longer lists please see the ToDo section.

API:

You want to look into an API to provide this info if possible. I believe Alpha Vantage now provide info the Yahoo Finance API used to* . There is a nice JS tutorial here. Alpha Vantage documentation here. At the very bottom of this answer, I have a quick look at the time series functions available via the API.

WEBSERVICE function:

With an API key, you can also potentially use the webservice function in Excel to retrieve and parse data. Example here. Not tested.

XMLHTTPRequest and class:

However, I will show you a way using a class and a loop over URLs. You can improve on this. I use a bare bones class called clsHTTP to hold the XMLHTTP request object. I give it 2 methods. One, GetHTMLDoc, to return the request response in an html document, and the other, GetInfo, to return an array of the items of interest from the page.

Using a class in this way means we save on the overhead of repeatedly creating and destroying the xmlhttp object and provides a nice descriptive set of exposed methods to handle the required tasks.

It is assumed your data is as shown, with header row being row 2.

ToDo:

The immediately obvious development, IMO, is you will want to add some error handling in. For example, you might want to develop the class to handle server errors.


VBA:

So, in your project you add a class module called clsHTTP and put the following:

clsHTTP

Option Explicit

Private http As Object
Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetHTMLDoc(ByVal URL As String) As HTMLDocument
    Dim html As HTMLDocument
    Set html = New HTMLDocument
    With http
        .Open "GET", URL, False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
        Set GetHTMLDoc = html
    End With
End Function
Public Function GetInfo(ByVal html As HTMLDocument, ByVal endPoint As Long) As Variant
    Dim nodeList As Object, i As Long, result(), counter As Long
    Set nodeList = html.querySelectorAll("tbody td")
    ReDim result(0 To endPoint - 1)
    For i = 1 To 2 * endPoint Step 2
        result(counter) = nodeList.item(i).innerText
        counter = counter + 1
    Next    
    GetInfo = result
End Function

In a standard module (module 1)

Option Explicit
Public Sub GetYahooInfo()
    Dim tickers(), ticker As Long, lastRow As Long, headers()
    Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument

    Application.ScreenUpdating = False

    Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers
    Set http = New clsHTTP

    headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
                    "Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")

    With wsSource
        lastRow = GetLastRow(wsSource, 1)
        Select Case lastRow
        Case Is < 3
            Exit Sub
        Case 3
            ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
        Case Is > 3
            tickers = .Range("A3:A" & lastRow).Value
        End Select

        ReDim results(0 To UBound(tickers, 1) - 1)
        Dim i As Long, endPoint As Long
        endPoint = UBound(headers)

        For ticker = LBound(tickers, 1) To UBound(tickers, 1)
            If Not IsEmpty(tickers(ticker, 1)) Then
                Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
                results(ticker - 1) = http.GetInfo(html, endPoint)
                Set html = Nothing
            Else
                results(ticker) = vbNullString
            End If
        Next

        .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
        For i = LBound(results) To UBound(results)
            .Cells(3 + i, 2).Resize(1, endPoint-1) = results(i)
        Next
    End With   
    Application.ScreenUpdating = True
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function


Results:


Notes on GetInfo method and CSS selectors:

The class method of GetInfo extracts the info from each webpage using a css combination selector to target the page styling.

The info we are after on each page is house in two adjacent tables, for example:

Rather than mess around with multiple tables I simply target all the table cells, within table body elements, with a selector combination of tbody td.

The CSS selector combination is applied via the querySelectorAll method of HTMLDocument, returning a static nodeList.

The returned nodeList items have headers at even indices and the required data at odd indices. I only want the first two tables of info so I terminate the loop over the returned nodeList when I gave gone twice the length of the headers of interest. I use a step 2 loop from index 1 to retrieve only the data of interest, minus the headers.

A sample of what the returned nodeList looks like:


References (VBE > Tools > References):

  1. Microsoft HTML Object Library


Alpha Vantage API:

A quick look at the time series API call shows that a string can be used

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey

This yields a JSON response that in the Time Series (Daily) sub dictionary of the overall returned dictionary, has 199 dates returned. Each date has the following info:

A little digging through the documentation will unveil whether bundling of tickers is possible, I couldn't see this quickly, and whether more of your initial items of interest are available via a different query string.

There is more info, for example, using the TIME_SERIES_DAILY_ADJUSTED function in the URL call

https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey

Here, you then get the following:

You can parse the JSON response using a JSON parser such as JSONConverter.bas and there are also options for csv download.

* Worth doing some research on which APIs provide the most coverage of your items. Alpha Vantage doesn't appear to cover as many as my code above retrieves.

这篇关于Excel VBA web 源代码 - 如何将多个字段提取到一张表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆