VBA - Google 新闻搜索结果的数量 [英] VBA - Number of Google News Search Results

查看:23
本文介绍了VBA - Google 新闻搜索结果的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个单元格,其中包含我想在 Google 新闻中搜索的内容.我希望代码返回该搜索的结果数.目前我有这个代码,我在网站的其他地方找到了它并且不使用谷歌新闻,但即便如此,我有时也会得到一个

<块引用>

运行时错误 -2147024891 (80070005)

经过 70 次左右的搜索,我无法再次运行.

Sub HawkishSearch()Dim url As String, lastRow As LongDim XMLHTTP As Object, html As ObjectDim start_time As DateDim end_time As DatelastRow = Range("B" & Rows.Count).End(xlUp).RowDim cookie As StringDim result_cookie As String开始时间 = 时间Debug.Print "start_time:" &开始时间对于 i = 2 到 lastRowurl = "https://www.google.co.in/search?q=" &细胞(i, 2) &"&rnd=" &WorksheetFunction.RandBetween(1, 10000)Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")XMLHTTP.Open "GET", url, FalseXMLHTTP.setRequestHeader "Content-Type", "text/xml"XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"XMLHTTP.send设置 html = CreateObject("htmlfile")html.body.innerHTML = XMLHTTP.ResponseText如果 html.getElementById("resultStats") 是什么,那么str_text = "0 结果"别的str_text = html.getElementById("resultStats").innerText万一单元格(i, 3) = str_text事件下一个end_time = 时间Debug.Print "end_time:" &时间结束Debug.Print完成"&所用时间:" &DateDiff("n", start_time, end_time)MsgBox "完成" &所用时间:" &DateDiff("n", start_time, end_time)结束子

解决方案

最佳选择 (IMO) 是使用

{} 表示您通过键访问的字典,[] 表示您通过索引或 For Each 访问的集合> 循环.

我使用键 totalResults 从 API 返回的初始字典中检索总结果计数.

然后我循环字典(文章)集合并提取故事标题和 URL.

然后您可以在本地窗口中检查结果或打印出来

本地窗口中的结果示例:

<小时>

选项显式公共子 GetStories()Dim 文章作为集合,文章作为对象Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)设置 finalResults = 新集合searchTerm = "奥巴马"使用 CreateObject("MSXML2.XMLHTTP").打开GET",https://newsapi.org/v2/everything?q="&搜索词"&apiKey=yourAPIkey", False.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送设置 json = JsonConverter.ParseJson(.responseText)结束于Debug.Print "总结果 = " &json(总结果")设置文章 = json("文章")对于每篇文章在文章中arr(0) = 文章(标题")arr(1) = 文章("url")finalResults.Add arr下一个停止 '<== 稍后删除我结束子

<小时>

循环:

如果在循环中部署,您可以使用 clsHTTP 类来保存 XMLHTTP 对象.这比创建和销毁更有效.我为这个类提供了一个方法 GetString 来从 API 中检索 JSON 响应,以及一个 GetInfo 方法来解析 JSON 并检索结果计数和 API 结果 URL 和标题.

本地窗口中的结果结构示例:

类 clsHTTP:

选项显式私有 http 作为对象私有子类_Initialize()设置 http = CreateObject("MSXML2.XMLHTTP")结束子公共函数 GetString(ByVal url As String) As String使用 http.打开GET",网址,假.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送GetString = .responseText结束于结束函数公共函数 GetInfo(ByVal json As Object) 作为 VariantDim results()、counter As Long、finalResults(0 To 1)、文章作为对象、文章作为对象finalResults(0) = json("totalResults")设置文章 = json("文章")ReDim 结果(1 到文章数,1 到 2)对于每篇文章在文章中计数器 = 计数器 + 1结果(计数器,1)=文章(标题")结果(计数器,2)=文章(网址")下一个finalResults(1) = 结果获取信息 = 最终结果结束函数

标准模块:

选项显式公共子 GetStories()Dim http As clsHTTP, json As ObjectDim finalResults(), searchTerms(), searchTerm As Long, url As String设置 http = 新 clsHTTP使用 ThisWorkbook.Worksheets("Sheet1")searchTerms = Application.Transpose(.Range("A1:A2")) '<== 更改为包含搜索词的适当范围结束于ReDim finalResults(1 To UBound(searchTerms))对于 searchTerm = LBound(searchTerms, 1) 到 UBound(searchTerms, 1)url = "https://newsapi.org/v2/everything?q=" &searchTerms(searchTerm) &"&apiKey=你的APIkey"设置 json = JsonConverter.ParseJson(http.GetString(url))finalResults(searchTerm) = http.GetInfo(json)设置 json = 无下一个停止'<==稍后删除我结束子'

<小时>

否则:

我会使用以下内容,通过它们的类名获取故事链接.我得到了计数并写了一个集合的链接

选项显式公共子 GetStories()Dim sResponse 作为字符串,html 作为 HTMLDocument,文章作为集合Const BASE_URL As String = "https://news.google.com/"使用 CreateObject("MSXML2.XMLHTTP").打开GET",https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en",假.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送sResponse = StrConv(.responseBody, vbUnicode)结束于设置 html = 新建 HTMLDocument:设置文章 = 新建集合Dim numberOfStories As Long,nodeList As Object,i As Long用 html.body.innerHTML = sResponse设置 nodeList = .querySelectorAll(".VDXfz")numberOfStories = nodeList.LengthDebug.Print "故事数 = " &故事数对于 i = 0 到 nodeList.Length - 1文章.添加替换$(替换$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)下一个结束于调试.打印文章.计数结束子

<小时>

标准 Google 搜索:

以下是标准谷歌搜索的示例,但根据您的搜索词,您不会总是获得相同的 HTML 结构.您需要提供一些失败案例,以帮助我确定是否有可以应用的一致选择器方法.

选项显式公共子 GetResultsCount()Dim sResponse As String, html As HTMLDocument使用 CreateObject("MSXML2.XMLHTTP").打开"GET", "https://www.google.com/search?q=mitsubishi", False.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送sResponse = StrConv(.responseBody, vbUnicode)结束于设置 html = 新建 HTMLDocument用 html.body.innerHTML = sResponseDebug.Print .querySelector("#resultStats").innerText结束于结束子

I have a cell that contains something I would like searched in google news. I want the code to return the number of results for that search. Currently I have this code which I found elsewhere on the site and does not use google news but even then I sometimes get a

runtime error -2147024891 (80070005)

after 70 or so searched and I can't run again.

Sub HawkishSearch()

Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date

lastRow = Range("B" & Rows.Count).End(xlUp).Row

Dim cookie As String
Dim result_cookie As String

start_time = Time
Debug.Print "start_time:" & start_time

For i = 2 To lastRow

    url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText

If html.getElementById("resultStats") Is Nothing Then
    str_text = "0 Results"
Else
    str_text = html.getElementById("resultStats").innerText
End If
    Cells(i, 3) = str_text
    DoEvents
Next

end_time = Time
Debug.Print "end_time:" & end_time

Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

解决方案

Best option (IMO) is to use the Google News API and register for an API key. You can then use a queryString including your search term and parse the JSON response to get the result count. I do that below and also populate a collection with the article titles and links. I use a JSON parser called JSONConverter.bas which you download and add to your project. You can then go to VBE > Tools > References > add a reference to Microsoft Scripting Runtime.


Sample JSON response from API:

The {} denotes a dictionary which you access by key, the [] denotes a collection which you access by index or by For Each loop over.

I use the key totalResults to retrieve the total results count from the initial dictionary returned by the API.

I then loop the collection of dictionaries (articles) and pull the story titles and URLs.

You can then inspect the results in the locals window or print out

Sample of results in locals window:


Option Explicit

Public Sub GetStories()
    Dim articles As Collection, article As Object
    Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
    Set finalResults = New Collection
    searchTerm = "Obama"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        Set json = JsonConverter.ParseJson(.responseText)
    End With

    Debug.Print "total results = " & json("totalResults")

    Set articles = json("articles")
    For Each article In articles
       arr(0) = article("title")
       arr(1) = article("url")
       finalResults.Add arr
    Next

    Stop '<== Delete me later

End Sub


Loop:

If deploying in a loop you can use a class clsHTTP to hold the XMLHTTP object. This is more efficient than creating and destroying. I supply this class with a method GetString to retrieve the JSON response from the API, and a GetInfo method to parse the JSON and retrieve the results count and the API results URLs and Titles.

Example of results structure in locals window:

Class clsHTTP:

Option Explicit   
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        GetString = .responseText
    End With
End Function

Public Function GetInfo(ByVal json As Object) As Variant
    Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object

    finalResults(0) = json("totalResults")
    Set articles = json("articles")

    ReDim results(1 To articles.Count, 1 To 2)

    For Each article In articles
        counter = counter + 1
        results(counter, 1) = article("title")
        results(counter, 2) = article("url")
    Next

    finalResults(1) = results
    GetInfo = finalResults
End Function

Standard module:

Option Explicit

Public Sub GetStories()
    Dim http As clsHTTP, json As Object
    Dim finalResults(), searchTerms(), searchTerm As Long, url As String
    Set http = New clsHTTP

    With ThisWorkbook.Worksheets("Sheet1")
        searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
    End With

    ReDim finalResults(1 To UBound(searchTerms))

    For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)

        url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"

        Set json = JsonConverter.ParseJson(http.GetString(url))

        finalResults(searchTerm) = http.GetInfo(json)

        Set json = Nothing

    Next

    Stop '<==Delete me later
End Sub

'


Otherwise:

I would use the following where I grab story links by their class name. I get the count and write the links to a collection

Option Explicit

Public Sub GetStories()
    Dim sResponse As String, html As HTMLDocument, articles As Collection
    Const BASE_URL As String = "https://news.google.com/"
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument: Set articles = New Collection
    Dim numberOfStories As Long, nodeList As Object, i As Long
    With html
        .body.innerHTML = sResponse
        Set nodeList = .querySelectorAll(".VDXfz")
        numberOfStories = nodeList.Length
        Debug.Print "number of stories = " & numberOfStories
        For i = 0 To nodeList.Length - 1
            articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
        Next
    End With
    Debug.Print articles.Count
End Sub


Standard Google search:

The following works an example standard google search but you will not always get the same HTML structure depending on your search term. You will need to provide some failing cases to help me determine if there is a consistent selector method that can be applied.

Option Explicit
Public Sub GetResultsCount()
    Dim sResponse As String, html As HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.com/search?q=mitsubishi", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument
    With html
        .body.innerHTML = sResponse
        Debug.Print .querySelector("#resultStats").innerText
    End With

End Sub

这篇关于VBA - Google 新闻搜索结果的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆