一切完成后,Scraper引发错误而不是退出浏览器 [英] Scraper throws errors instead of quitting the browser when everything is done

查看:101
本文介绍了一切完成后,Scraper引发错误而不是退出浏览器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个抓取器来分析torrent网站中的电影信息.我使用了IEqueryselector.

I've written a scraper to parse movie information from a torrent site. I used IE and queryselector.

我的代码可以解析所有内容.一切完成后,它抛出错误而不是退出浏览器.如果取消错误框,则可以看到结果.

My code does parse everything. It throws errors instead of quitting the browser when everything is done. If I cancel the error box then I can see the results.

这是完整的代码:

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.querySelectorAll(".browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub

我上传了两张图片以显示错误.

I have uploaded two images to show the errors.

这两个错误同时出现.

我正在使用Internet Explorer 11.

I'm using Internet Explorer 11.

如果我尝试如下所示,它将成功地带来结果,而不会出现任何问题.

If I try like below it brings the results successfully with no issues.

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.getElementsByClassName("browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub

引用添加到库中

  1. Microsoft Internet控件
  2. Microsoft HTML对象库

是否有任何要添加到库中的引用以消除错误?

Is there any reference to add to the library to shake off errors?

推荐答案

好,因此该网页存在严重不友好的地方.它一直在为我崩溃.因此,我不得不在脚本引擎/脚本控件中运行一个javascript程序,并且它可以正常工作.

Ok, so there is something seriously unfriendly about that webpage. It kept crashing for me. So I have resorted to running a javascript program within scripting engine/scripting control and it works.

希望您能遵循它.逻辑在添加到ScriptEngine的javascript中.我得到两个节点列表,一个电影列表和一个年份列表.然后我同步浏览每个数组,并将它们作为键值对添加到Microsoft脚本字典中.

I hope you can follow it. The logic is in the javascript added to the ScriptEngine. I get two lists of nodes, one list of films and one list of years; then I step through each array in sync and add them as key value pair to a Microsoft Scripting Dictionary.

Option Explicit

'*Tools->References
'*    Microsoft Scripting Runtime
'*    Microsoft Scripting Control
'*    Microsoft Internet Controls
'*    Microsoft HTML Object Library

Sub Torrent_Data()
    Dim row As Long
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE:
            DoEvents
        Loop
        Set html = .document
    End With

    Dim dicFilms As Scripting.Dictionary
    Set dicFilms = New Scripting.Dictionary

    Call GetScriptEngine.Run("getMovies", html, dicFilms)

    Dim vFilms As Variant
    vFilms = dicFilms.Keys

    Dim vYears As Variant
    vYears = dicFilms.Items

    Dim lRowLoop As Long
    For lRowLoop = 0 To dicFilms.Count - 1

        Cells(lRowLoop + 1, 1) = vFilms(lRowLoop)
        Cells(lRowLoop + 1, 2) = vYears(lRowLoop)

    Next lRowLoop

    Stop

    IE.Quit
End Sub

Private Function GetScriptEngine() As ScriptControl
    '* see code from this SO Q & A
    ' https://stackoverflow.com/questions/37711073/in-excel-vba-on-windows-how-to-get-stringified-json-respresentation-instead-of
    Static soScriptEngine As ScriptControl
    If soScriptEngine Is Nothing Then
        Set soScriptEngine = New ScriptControl
        soScriptEngine.Language = "JScript"

        soScriptEngine.AddCode "function getMovies(htmlDocument, microsoftDict) { " & _
                                    "var titles = htmlDocument.querySelectorAll('a.browse-movie-title'), i;" & _
                                    "var years = htmlDocument.querySelectorAll('div.browse-movie-year'), j;" & _
                                    "if ( years.length === years.length) {" & _
                                    "for (i=0; i< years.length; ++i) {" & _
                                    "   var film = titles[i].innerText;" & _
                                    "   var year = years[i].innerText;" & _
                                    "   microsoftDict.Add(film, year);" & _
                                    "}}}"

    End If
    Set GetScriptEngine = soScriptEngine
End Function

这篇关于一切完成后,Scraper引发错误而不是退出浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆