VBA将内部文本从html页面传输到excel [英] VBA to transfer inner text from html page to excel

查看：95 发布时间：2020/11/1 3:25:00 html vba excel web-scraping

本文介绍了VBA将内部文本从html页面传输到excel的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

此图显示了我使用以下宏的内容

但是此代码在打开两个或三个URL之后停止，并且我们看到以下错误消息，
1.运行时错误91
2.对象变量或未设置块的

But this code stops after the opening of two or three URLs and we see the following error message,
1. run-time err 91
2. object variable or with block not set

Sub test()

Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim n As Integer
Dim i As Integer
Dim HtmlToText As String
Dim result
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row

For i = 2 To lastrow  'Start the loop on the second row of column A. Until the last URL..

    Set wb = CreateObject("internetExplorer.Application")
    sURL = Cells(i, 1)
    wb.navigate sURL
    wb.Visible = False
    While wb.Busy
      DoEvents
    Wend
    'HTML document
    Set doc = wb.document
    Dim Name As Variant
    Dim Posts As Variant
    Dim Followers As Variant
    Dim Following As Variant
    Dim DivValue As Variant
    Dim DivValueSplit As Variant
    Dim DivValueResult As Variant
    Dim Biography As Variant

    Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText
    Posts = doc.getElementsByClassName("g47SY")(0).innerText
    Followers = doc.getElementsByClassName("g47SY")(1).innerText
    Following = doc.getElementsByClassName("g47SY")(2).innerText
    'dd = web.document.querySelector("div.-vDIg span").innerText
    DivValue = doc.getElementsByClassName("-vDIg")(0).innerText

    'DivValueSplit = Split(DivValue, "<br>")
    'If UBound(DivValueSplit) = 2 Then
    '   DivValueResult = DivValueSplit(1) & DivValueSplit(2)
     '  j = InStr(DivValueResult, "</span>")
      ' Biography = Mid(DivValueResult, 7, j - 7)
    'ElseIf sURL = "https://www.instagram.com/philipplein/" Then
     ' DivValueResult = DivValueSplit(0)
      'j = InStr(DivValueResult, "</h1>")
      'Biography = Mid(DivValueResult, 19, j - 5)
    'Else
     '   DivValueResult = DivValueSplit(1)
      '  j = InStr(DivValueResult, "</span>")
       ' Biography = Mid(DivValueResult, 7, j - 7)
    'End If

    Worksheets("sheet1").Cells(i, 2) = Name
    Worksheets("sheet1").Cells(i, 3) = Followers
    Worksheets("sheet1").Cells(i, 4) = Following
    Worksheets("sheet1").Cells(i, 5) = Posts
    Worksheets("sheet1").Cells(i, 6) = DivValue
    'Biography = Replace(re1, "<span>", "")

    'Cells(i, 2) = HtmlToText
    ' myarray = Split(Data, vbCrLf)
err_clear:



      If Err <> 0 Then
          Err.Clear
          Resume Next
        End If
        wb.Quit
    Next i

End Sub

推荐答案

概述:

两种方法.一个没有打开浏览器，发出 XMLHTTP请求，另一个则使用Internet Explorer.

Two methods. One with no browser opening, issuing XMLHTTP request, the other using Internet Explorer.

如果有一种API方式可以做到这一点，那么我肯定会这么做.以下两种方法当前适用于所有显示的URL.

If there is an API way to get this done I would definitely go with that. The following 2 methods currently work for all your shown URLs.

注意:

这些基于工作表中URL的末尾部分，即人的名字.见底部图片.

These are based on end part of URL in sheet i.e. the person's name. See image at bottom.

这使用自定义类clsHTTP来保存XMLHTTP object.它有2种方法. GetString，用于发出请求并解析出响应的一部分.另一个GetInfo用于获取GetString返回的字符串并解析出感兴趣的元素，并将它们返回到数组中.

This uses a custom class clsHTTP to hold the XMLHTTP object. It has 2 methods. One, GetString, to issue the request and parse out a section of the response. The other, GetInfo, to take the string returned by GetString and parse out the elements of interest and return them in an array.

待办事项:

可以开发该类.它是裸露的骨头.特别是，它可以添加错误处理，例如处理服务器连接丢失.

The class can be developed. It is bare bones. In particular, it could do with error handling added in, for example, to handle loss of server connection.

VBA:

clsHTTP类:

Option Explicit

Private http As Object
Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal URL As String) As String
    Dim sResponse As String
    With http
        .Open "GET", URL, False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
        GetString = Split(Split(sResponse, "ProfilePage"":")(1), "comments_disabled")(0)   
    End With
End Function

Public Function GetInfo(ByVal sResponse As String) As Variant
    Dim results(0 To 4)
    'Name, Followers,  Following,Posts,Biography
    On Error Resume Next
    results(0) = Split(sResponse, """full_name"":""")(1)
    results(1) = Split(Split(sResponse, """count"":")(1), "}")(0)
    results(2) = Split(Split(sResponse, """count"":")(2), "}")(0)
    results(3) = Split(Split(sResponse, """count"":")(4), ",")(0)
    results(4) = Split(Split(sResponse, """biography"":""")(1), """,")(0)
    On Error GoTo 0
    GetInfo = results
End Function

标准模块module 1:

Option Explicit
Public Sub GetInfo()
    Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults()
    Set http = New clsHTTP
    Const BASE_URL As String = "https://www.instagram.com/"

    With ThisWorkbook.Worksheets("Sheet1")
        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        Select Case lastRow
        Case 1
            Exit Sub
        Case 2
            ReDim arr(1, 1): arr(1, 1) = .Range("A2").Value
        Case Else
            arr = .Range("A2:A" & lastRow).Value
        End Select

        ReDim groupResults(0 To lastRow - 2)
        Dim results(0 To 4), counter As Long, i As Long
        With http
            For i = LBound(arr, 1) To UBound(arr, 1)
                If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
                     sResponse = .GetString(BASE_URL & arr(i, 1))
                    groupResults(counter) = .GetInfo(sResponse)
                    sResponse = vbNullString
                    counter = counter + 1
                End If
            Next
        End With

        For i = LBound(groupResults) To UBound(groupResults)
            .Cells(2 + i, "B").Resize(1, UBound(results) + 1) = groupResults(i)
        Next
    End With
End Sub

Internet Explorer:

稍后我会写一些更好的东西，但是以下内容将循环放置在您创建Internet Explorer对象的位置，因此您不必继续创建和销毁对象.它介绍了等待元素出现以及页面加载的过程.

Internet Explorer:

I will write something better a little later but the following places the loop inside of where you have create the Internet Explorer object so you don't keep creating and destroying. It introduces waits for elements to be present and also for page loading.

待办事项:

我会进行一些初始更改:

Some initial changes I would make:

重构代码以具有处理数据提取的独立功能/子程序；
添加方法来管理失败的连接/超时.

VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, lastRow As Long, arr(), groupResults()

    Const BASE_URL As String = "https://www.instagram.com/"

    With ThisWorkbook.Worksheets("Sheet1")
        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        Select Case lastRow
        Case 1
            Exit Sub
        Case 2
            ReDim arr(1, 1): arr(1, 1) = .Range("A2").Value
        Case Else
            arr = .Range("A2:A" & lastRow).Value
        End Select

        ReDim groupResults(0 To lastRow - 2)
        Dim results(0 To 4), counter As Long, i As Long
        With IE
            .Visible = True

            For i = LBound(arr, 1) To UBound(arr, 1)
                If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
                    .navigate BASE_URL & arr(i, 1)

                    While .Busy Or .readyState < 4: DoEvents: Wend
                    'Name, Followers,  Following,Posts,Biography
                    Dim aNodeList As Object, ele As Object, t As Date
                    Const MAX_WAIT_SEC As Long = 5

                    t = Timer

                    Do
                        DoEvents
                        On Error Resume Next
                        Set ele = .document.querySelector(".rhpdm")
                        On Error GoTo 0
                        If Timer - t > MAX_WAIT_SEC Then Exit Do
                    Loop While ele Is Nothing

                    '   Application.Wait Now + TimeSerial(0, 0, 2)
                    results(0) = ele.innerText
                    Set aNodeList = .document.querySelectorAll(".g47SY")
                    results(1) = aNodeList.item(0).innerText
                    results(2) = aNodeList.item(1).innerText
                    results(3) = aNodeList.item(2).innerText
                    results(4) = .document.querySelector(".rhpdm ~ span").innerText
                    Set aNodeList = Nothing : Set ele = Nothing
                    groupResults(counter) = results
                    counter = counter + 1
                End If 
            Next        
            .Quit '<== Remember to quit application
        End With

        For i = LBound(groupResults) To UBound(groupResults)
            .Cells(2 + i, "B").Resize(1, UBound(results) + 1) = groupResults(i)
        Next

    End With
End Sub

结果:

这篇关于VBA将内部文本从html页面传输到excel的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

VBA将内部文本从html页面传输到excel [英] VBA to transfer inner text from html page to excel

问题描述

推荐答案

Internet Explorer:

Internet Explorer:

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

VBA将内部文本从html页面传输到excel [英] VBA to transfer inner text from html page to excel

问题描述

推荐答案

Internet Explorer:

Internet Explorer:

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭