VBA将内部文本从html页面传输到excel [英] VBA to transfer inner text from html page to excel
问题描述
此图显示了我使用以下宏的内容
但是此代码在打开两个或三个URL之后停止,并且我们看到以下错误消息,
1.运行时错误91
2.对象变量或未设置块的
But this code stops after the opening of two or three URLs and we see the following error message,
1. run-time err 91
2. object variable or with block not set
Sub test()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Dim lastrow As Long
Dim n As Integer
Dim i As Integer
Dim HtmlToText As String
Dim result
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow 'Start the loop on the second row of column A. Until the last URL..
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1)
wb.navigate sURL
wb.Visible = False
While wb.Busy
DoEvents
Wend
'HTML document
Set doc = wb.document
Dim Name As Variant
Dim Posts As Variant
Dim Followers As Variant
Dim Following As Variant
Dim DivValue As Variant
Dim DivValueSplit As Variant
Dim DivValueResult As Variant
Dim Biography As Variant
Name = doc.getElementsByClassName("AC5d8 notranslate")(0).innerText
Posts = doc.getElementsByClassName("g47SY")(0).innerText
Followers = doc.getElementsByClassName("g47SY")(1).innerText
Following = doc.getElementsByClassName("g47SY")(2).innerText
'dd = web.document.querySelector("div.-vDIg span").innerText
DivValue = doc.getElementsByClassName("-vDIg")(0).innerText
'DivValueSplit = Split(DivValue, "<br>")
'If UBound(DivValueSplit) = 2 Then
' DivValueResult = DivValueSplit(1) & DivValueSplit(2)
' j = InStr(DivValueResult, "</span>")
' Biography = Mid(DivValueResult, 7, j - 7)
'ElseIf sURL = "https://www.instagram.com/philipplein/" Then
' DivValueResult = DivValueSplit(0)
'j = InStr(DivValueResult, "</h1>")
'Biography = Mid(DivValueResult, 19, j - 5)
'Else
' DivValueResult = DivValueSplit(1)
' j = InStr(DivValueResult, "</span>")
' Biography = Mid(DivValueResult, 7, j - 7)
'End If
Worksheets("sheet1").Cells(i, 2) = Name
Worksheets("sheet1").Cells(i, 3) = Followers
Worksheets("sheet1").Cells(i, 4) = Following
Worksheets("sheet1").Cells(i, 5) = Posts
Worksheets("sheet1").Cells(i, 6) = DivValue
'Biography = Replace(re1, "<span>", "")
'Cells(i, 2) = HtmlToText
' myarray = Split(Data, vbCrLf)
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
wb.Quit
Next i
End Sub
推荐答案
概述:
两种方法.一个没有打开浏览器,发出 XMLHTTP请求,另一个则使用Internet Explorer.
Two methods. One with no browser opening, issuing XMLHTTP request, the other using Internet Explorer.
如果有一种API方式可以做到这一点,那么我肯定会这么做.以下两种方法 当前 适用于所有显示的URL.
If there is an API way to get this done I would definitely go with that. The following 2 methods currently work for all your shown URLs.
注意:
这些基于工作表中URL的末尾部分,即人的名字.见底部图片.
These are based on end part of URL in sheet i.e. the person's name. See image at bottom.
这使用自定义类clsHTTP
来保存XMLHTTP object
.它有2种方法. GetString
,用于发出请求并解析出响应的一部分.另一个GetInfo
用于获取GetString
返回的字符串并解析出感兴趣的元素,并将它们返回到数组中.
This uses a custom class clsHTTP
to hold the XMLHTTP object
. It has 2 methods. One, GetString
, to issue the request and parse out a section of the response. The other, GetInfo
, to take the string returned by GetString
and parse out the elements of interest and return them in an array.
待办事项:
可以开发该类.它是裸露的骨头.特别是,它可以添加错误处理,例如处理服务器连接丢失.
The class can be developed. It is bare bones. In particular, it could do with error handling added in, for example, to handle loss of server connection.
VBA:
clsHTTP
类:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal URL As String) As String
Dim sResponse As String
With http
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
GetString = Split(Split(sResponse, "ProfilePage"":")(1), "comments_disabled")(0)
End With
End Function
Public Function GetInfo(ByVal sResponse As String) As Variant
Dim results(0 To 4)
'Name, Followers, Following,Posts,Biography
On Error Resume Next
results(0) = Split(sResponse, """full_name"":""")(1)
results(1) = Split(Split(sResponse, """count"":")(1), "}")(0)
results(2) = Split(Split(sResponse, """count"":")(2), "}")(0)
results(3) = Split(Split(sResponse, """count"":")(4), ",")(0)
results(4) = Split(Split(sResponse, """biography"":""")(1), """,")(0)
On Error GoTo 0
GetInfo = results
End Function
标准模块module 1
:
Option Explicit
Public Sub GetInfo()
Dim http As clsHTTP, sResponse As String, lastRow As Long, groupResults()
Set http = New clsHTTP
Const BASE_URL As String = "https://www.instagram.com/"
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("A2").Value
Case Else
arr = .Range("A2:A" & lastRow).Value
End Select
ReDim groupResults(0 To lastRow - 2)
Dim results(0 To 4), counter As Long, i As Long
With http
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
sResponse = .GetString(BASE_URL & arr(i, 1))
groupResults(counter) = .GetInfo(sResponse)
sResponse = vbNullString
counter = counter + 1
End If
Next
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(2 + i, "B").Resize(1, UBound(results) + 1) = groupResults(i)
Next
End With
End Sub
Internet Explorer:
稍后我会写一些更好的东西,但是以下内容将循环放置在您创建Internet Explorer对象的位置,因此您不必继续创建和销毁对象.它介绍了等待元素出现以及页面加载的过程.
Internet Explorer:
I will write something better a little later but the following places the loop inside of where you have create the Internet Explorer object so you don't keep creating and destroying. It introduces waits for elements to be present and also for page loading.
待办事项:
我会进行一些初始更改:
Some initial changes I would make:
- 重构代码以具有处理数据提取的独立功能/子程序;
- 添加方法来管理失败的连接/超时.
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, lastRow As Long, arr(), groupResults()
Const BASE_URL As String = "https://www.instagram.com/"
With ThisWorkbook.Worksheets("Sheet1")
lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
Select Case lastRow
Case 1
Exit Sub
Case 2
ReDim arr(1, 1): arr(1, 1) = .Range("A2").Value
Case Else
arr = .Range("A2:A" & lastRow).Value
End Select
ReDim groupResults(0 To lastRow - 2)
Dim results(0 To 4), counter As Long, i As Long
With IE
.Visible = True
For i = LBound(arr, 1) To UBound(arr, 1)
If Len(BASE_URL & arr(i, 1)) > Len(BASE_URL) Then
.navigate BASE_URL & arr(i, 1)
While .Busy Or .readyState < 4: DoEvents: Wend
'Name, Followers, Following,Posts,Biography
Dim aNodeList As Object, ele As Object, t As Date
Const MAX_WAIT_SEC As Long = 5
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = .document.querySelector(".rhpdm")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While ele Is Nothing
' Application.Wait Now + TimeSerial(0, 0, 2)
results(0) = ele.innerText
Set aNodeList = .document.querySelectorAll(".g47SY")
results(1) = aNodeList.item(0).innerText
results(2) = aNodeList.item(1).innerText
results(3) = aNodeList.item(2).innerText
results(4) = .document.querySelector(".rhpdm ~ span").innerText
Set aNodeList = Nothing : Set ele = Nothing
groupResults(counter) = results
counter = counter + 1
End If
Next
.Quit '<== Remember to quit application
End With
For i = LBound(groupResults) To UBound(groupResults)
.Cells(2 + i, "B").Resize(1, UBound(results) + 1) = groupResults(i)
Next
End With
End Sub
结果:
这篇关于VBA将内部文本从html页面传输到excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!