无法从网页获取特定表格,代码无法正常工作 [英] cannot get a particular table from a webpage, code use to work

查看:64
本文介绍了无法从网页获取特定表格,代码无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在excel vba中有一些代码,通过从特定URL上名为"Insider Transactions"的表格中获取所有单元格,从而对数百页进行了网络抓取.以下是一个示例网址:


使用浏览器并将结果设置为每页100个:

以下内容将忽略登录消息.

 选项显式公共子GetData()昏暗即作为对象,剪贴板作为对象,ws作为工作表设置ws = ThisWorkbook.Worksheets("Sheet1")设置剪贴板= GetObject(新建:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")设置ie = CreateObject("InternetExplorer.Application")与即.Visible = True.Navigate2"https://www.gurufocus.com/stock/HIL/insider"而.Busy或.readyState<4:DoEvents:Wend使用.document如果是.querySelectorAll(.login-card").长度>0然后.querySelector(.login-card .el-icon-close").Click万一.querySelector(.el-icon-caret-bottom").点击.querySelector(.aio-popover-item:nth-​​of-type(6)").点击结束于而.Busy或.readyState<4:DoEvents:WendClipboard.SetText .document.querySelector(.data-table").outerHTML剪贴板.PutInClipboardws.Range("A1").PasteSpecial.放弃结束于结束子 

I had some code in excel vba that webscraped hundreds of pages by grabbing all the cells from a table called "Insider Transactions" on particular urls. The following is an example url: https://www.gurufocus.com/stock/HIL/insider

For some reason my code below no longer works. I cannot for the life of me work out why. The class I am trying to grab still seems to be called "normal-table data-table"

I have tried getting rid of the (0) as there appears to be only one table with the class name normal-table data-table now.

Set code is:

Set allCells = doc.body.getElementsByClassName("normal-table data-table")(0).getElementsByTagName("td")

no error messages are given when I run my current code, but it is clear that allCells is not being set to anything because my code doesn't work and allCells.length doesn't return anything. Thanks

解决方案

XMLHTTP:

Faster than a browser and providing more info is xhr.

The data is provided from an API call. You can scrape the token for this and pass in a subsequent request. A few helper functions to get the token and handle results as well as a json parser to handle json response from API.

This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime.

Option Explicit

Public Sub GetInfo()
    Dim json As Object, headers(), ws As Worksheet, i As Long, results()
    Dim re As Object, r As Long, c As Long, dict As Object, p As String, token As String, s As String

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    p = "password_grant_custom\.client"":""(.*?)"""
    Set re = CreateObject("VBScript.RegExp")

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.gurufocus.com/stock/HIL/insider", False
        .send
        token = GetToken(re, .responseText, p)
        If token = "Not found" Then Exit Sub
        .Open "GET", "https://www.gurufocus.com/reader/_api/stocks/NYSE:HIL/insider?page=1&per_page=1000&sort=date%7Cdesc", False
        .setRequestHeader "authorization", "Bearer " & token
        .send
        s = .responseText
        Set json = JsonConverter.ParseJson(.responseText)("data")
        headers = json(1).keys
        ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
        For Each dict In json
            r = r + 1: c = 1
            For i = LBound(headers) To UBound(headers)
                If headers(i) <> "ownership_details" Then
                    results(r, c) = dict(headers(i))
                Else
                    results(r, c) = EmptyDict(dict(headers(i)))
                End If
                c = c + 1
            Next
        Next
    End With
    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function EmptyDict(ByVal dict As Object, Optional r As String, Optional key As Variant) As String
    Dim s As String
    For Each key In dict
        If TypeName(dict(key)) = "Dictionary" Then
            r = EmptyDict(dict(key), r, key)
        Else
            s = IIf(key = "D", "Direct ", key)
            r = r & s & " " & dict(key) & Chr$(10)
        End If
    Next
    EmptyDict = r
End Function

Public Function GetToken(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
    With re
        .Global = True
        .pattern = pattern
        If .test(inputString) Then               ' returns True if the regex pattern can be matched agaist the provided string
            GetToken = .Execute(inputString)(0).SubMatches(0)
        Else
            GetToken = "Not found"
        End If
    End With
End Function


Sample of output:


Using browser and also setting results to 100 per page:

The following dimisses login message if present.

Option Explicit
Public Sub GetData()
    Dim ie As Object, clipboard As Object, ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        .Navigate2 "https://www.gurufocus.com/stock/HIL/insider"

        While .Busy Or .readyState < 4: DoEvents: Wend
        With .document
            If .querySelectorAll(".login-card").Length > 0 Then
                .querySelector(".login-card .el-icon-close").Click
            End If
            .querySelector(".el-icon-caret-bottom").Click
            .querySelector(".aio-popover-item:nth-of-type(6)").Click
        End With
        While .Busy Or .readyState < 4: DoEvents: Wend

        clipboard.SetText .document.querySelector(".data-table").outerHTML
        clipboard.PutInClipboard
        ws.Range("A1").PasteSpecial
        .Quit
    End With
End Sub

这篇关于无法从网页获取特定表格,代码无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆