A 列中所有 URL 的 Webscrape 循环 [英] Webscrape loop on all URLs in Column A

查看:30
本文介绍了A 列中所有 URL 的 Webscrape 循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 URL 列表中抓取 Facebook 视频标题.

I'm trying to scrape the Facebook Video Titles from a list of URL's.

我的宏用于单个视频,其中 URL 内置于代码中.我希望脚本改为循环遍历 A 列中的每个 URL 并将视频标题输出到 B 列中.有帮助吗?

I've got my macro working for a single video in which the URL is built into the code. I'd like the script to instead loop through each URL in Column A and output the Video Title into Column B. Any help?

当前代码:

Sub ScrapeVideoTitle()    
    Dim appIE As Object
    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .navigate "https://www.facebook.com/rankertotalnerd/videos/276505496352731/"
        .Visible = True

        Do While appIE.Busy        
            DoEvents
        Loop

        'Add Video Title to Column B
        Range("B2").Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

        appIE.Quit
        Set appIE = Nothing
    End With
End Sub

推荐答案

如果您可以转到 VBE > 工具 > 参考 > 添加对 Microsoft HTML 对象库的引用,您可以执行以下操作:

Provided you can go VBE > Tools > References > Add a reference to Microsoft HTML Object Library you can do the following:

将所有 url 读入一个数组.循环数组并使用 xmlhttp 向页面发出 GET 请求.将响应读入 HTMLDocument 变量并使用 css 选择器提取标题并存储在数组中.在循环结束时将所有结果一次性写入工作表.

Read all the urls into an array. Loop the array and use xmlhttp to issue GET request to page. Read the response into an HTMLDocument variable and use css selector to extract the title and store in an array. At the end of the loop write all results out to sheet in one go.

Option Explicit
Public Sub GetTitles()
    Dim urls(), ws As Worksheet, lastRow As Long, results(), i As Long, html As HTMLDocument

    Set html = New HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ws
        lastRow = .Cells(.rows.Count, "A").End(xlUp).Row
        urls = Application.Transpose(.Range("A2:A" & lastRow).Value)
    End With
    ReDim results(1 To UBound(urls))
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                .Open "GET", urls(i), False
                .send
                html.body.innerHTML = .responseText
                results(i) = html.querySelector(".uiHeaderTitle span").innerText
            End If
        Next
    End With
    ws.Cells(2, 2).Resize(UBound(results), 1) = Application.Transpose(results)
End Sub

<小时>

css 选择器与页面的匹配:

这篇关于A 列中所有 URL 的 Webscrape 循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆