在for循环内进行Webscrape-后续操作 [英] Webscrape inside a for loop - Follow up

查看：36 发布时间：2021/5/5 19:05:22 excel vba web-scraping

本文介绍了在for循环内进行Webscrape-后续操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

按照我的上一个问题使用条件抓取VBA ，我开始尝试自动执行此网站此处中的网址列表操作在我的Excel文档中准备.当我尝试使用20和30 url时，它可以正常工作，但是当我增加它时，会出现脚本超出范围错误"提示.在GetNodesTextAsArray中发生了与ReDim有关的问题，您知道为什么吗?经过一番研究，我试图用for循环替换它，但是它并没有改变任何东西.

Following my previous question Webscrape VBA with condition, I started trying to automate the procedure for a list of url from this website here that I prepared in my excel document. When I tried for 20 and 30 url it worked perfectly, yet when I increased it, a "Script out of range error" occurred concerning the ReDim in the GetNodesTextAsArray, do you have any idea why ? After some research I Tried to replace it by a for loop but it doesn't chagne anything.

Public Sub WindInfo()
'VBE> Tools > References:
'1. Microsoft, XML v6
'2. Microsoft HTML Object Library
'3. Microsoft Scripting Runtime
Dim xhr As MSXML2.XMLHTTP60: Set xhr = New MSXML2.XMLHTTP60
Dim html As MSHTML.HTMLDocument: Set html = New MSHTML.HTMLDocument
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim url As String
Dim j As Integer
Dim r As Long


r = 1

For j = 1 To 20

url = Worksheets("List").Cells(j, 1).Value

    With xhr
        .Open "GET", url, False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim generalities As Object, arrGen(), partsList As Object
    
    

    Set generalities = html.querySelectorAll("#bloc_texte table ~ table li")
    arrGen = GetNodesTextAsArray(generalities)
    
    Dim parts As Object, numberOfParts As Long
    
    Set partsList = html.querySelectorAll("h1 ~ h3, ul ~ h3")
    
    
    If partsList.Length > 0 Then
    
        numberOfParts = html.querySelectorAll("h1 ~ h3, ul ~ h3").Length / 2
    
        Set parts = html.querySelectorAll("h3 + ul")
       
        Dim i As Long, liNodes As Object, arr()
        Dim html2 As MSHTML.HTMLDocument: Set html2 = New MSHTML.HTMLDocument
        
        For i = 0 To numberOfParts - 1
            ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
            html2.body.innerHTML = parts.Item(i).outerHTML & parts.Item(i + numberOfParts).outerHTML
            Set liNodes = html2.querySelectorAll("li")
            arr = GetNodesTextAsArray(liNodes)
            ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
            r = r + 1
        Next
        
    Else
        arr = GetNodesTextAsArray(html.querySelectorAll("#bloc_texte h1 + ul").Item(1).getElementsByTagName("li"))
        ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
        ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
        r = r + 1
    End If
    Application.Wait (Now + TimeValue("0:00:01"))
Next

End Sub


Public Function GetNodesTextAsArray(ByVal nodeList As Object) As Variant()
Dim i As Long, results()

ReDim results(1 To nodeList.Length)

   

For i = 0 To nodeList.Length - 1
    results(i + 1) = nodeList.Item(i).innerText
Next i
GetNodesTextAsArray = results
End Function

在for循环内进行Webscrape-后续操作 [英] Webscrape inside a for loop - Follow up

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在for循环内进行Webscrape-后续操作 [英] Webscrape inside a for loop - Follow up

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭