VBA - Google 新闻搜索结果的数量 [英] VBA - Number of Google News Search Results
问题描述
我有一个单元格,其中包含我想在 Google 新闻中搜索的内容.我希望代码返回该搜索的结果数.目前我有这个代码,我在网站的其他地方找到了它并且不使用谷歌新闻,但即便如此,我有时也会得到一个
<块引用>运行时错误 -2147024891 (80070005)
经过 70 次左右的搜索,我无法再次运行.
Sub HawkishSearch()Dim url As String, lastRow As LongDim XMLHTTP As Object, html As ObjectDim start_time As DateDim end_time As DatelastRow = Range("B" & Rows.Count).End(xlUp).RowDim cookie As StringDim result_cookie As String开始时间 = 时间Debug.Print "start_time:" &开始时间对于 i = 2 到 lastRowurl = "https://www.google.co.in/search?q=" &细胞(i, 2) &"&rnd=" &WorksheetFunction.RandBetween(1, 10000)Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")XMLHTTP.Open "GET", url, FalseXMLHTTP.setRequestHeader "Content-Type", "text/xml"XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"XMLHTTP.send设置 html = CreateObject("htmlfile")html.body.innerHTML = XMLHTTP.ResponseText如果 html.getElementById("resultStats") 是什么,那么str_text = "0 结果"别的str_text = html.getElementById("resultStats").innerText万一单元格(i, 3) = str_text事件下一个end_time = 时间Debug.Print "end_time:" &时间结束Debug.Print完成"&所用时间:" &DateDiff("n", start_time, end_time)MsgBox "完成" &所用时间:" &DateDiff("n", start_time, end_time)结束子
最佳选择 (IMO) 是使用
{}
表示您通过键访问的字典,[]
表示您通过索引或 For Each
访问的集合> 循环.
我使用键 totalResults
从 API 返回的初始字典中检索总结果计数.
然后我循环字典(文章)集合并提取故事标题和 URL.
然后您可以在本地窗口中检查结果或打印出来
本地窗口中的结果示例:
<小时>选项显式公共子 GetStories()Dim 文章作为集合,文章作为对象Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)设置 finalResults = 新集合searchTerm = "奥巴马"使用 CreateObject("MSXML2.XMLHTTP").打开GET",https://newsapi.org/v2/everything?q="&搜索词"&apiKey=yourAPIkey", False.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送设置 json = JsonConverter.ParseJson(.responseText)结束于Debug.Print "总结果 = " &json(总结果")设置文章 = json("文章")对于每篇文章在文章中arr(0) = 文章(标题")arr(1) = 文章("url")finalResults.Add arr下一个停止 '<== 稍后删除我结束子
<小时>
循环:
如果在循环中部署,您可以使用 clsHTTP
类来保存 XMLHTTP 对象.这比创建和销毁更有效.我为这个类提供了一个方法 GetString
来从 API 中检索 JSON 响应,以及一个 GetInfo
方法来解析 JSON 并检索结果计数和 API 结果 URL 和标题.
本地窗口中的结果结构示例:
类 clsHTTP:
选项显式私有 http 作为对象私有子类_Initialize()设置 http = CreateObject("MSXML2.XMLHTTP")结束子公共函数 GetString(ByVal url As String) As String使用 http.打开GET",网址,假.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送GetString = .responseText结束于结束函数公共函数 GetInfo(ByVal json As Object) 作为 VariantDim results()、counter As Long、finalResults(0 To 1)、文章作为对象、文章作为对象finalResults(0) = json("totalResults")设置文章 = json("文章")ReDim 结果(1 到文章数,1 到 2)对于每篇文章在文章中计数器 = 计数器 + 1结果(计数器,1)=文章(标题")结果(计数器,2)=文章(网址")下一个finalResults(1) = 结果获取信息 = 最终结果结束函数
标准模块:
选项显式公共子 GetStories()Dim http As clsHTTP, json As ObjectDim finalResults(), searchTerms(), searchTerm As Long, url As String设置 http = 新 clsHTTP使用 ThisWorkbook.Worksheets("Sheet1")searchTerms = Application.Transpose(.Range("A1:A2")) '<== 更改为包含搜索词的适当范围结束于ReDim finalResults(1 To UBound(searchTerms))对于 searchTerm = LBound(searchTerms, 1) 到 UBound(searchTerms, 1)url = "https://newsapi.org/v2/everything?q=" &searchTerms(searchTerm) &"&apiKey=你的APIkey"设置 json = JsonConverter.ParseJson(http.GetString(url))finalResults(searchTerm) = http.GetInfo(json)设置 json = 无下一个停止'<==稍后删除我结束子'
<小时>
否则:
我会使用以下内容,通过它们的类名获取故事链接.我得到了计数并写了一个集合的链接
选项显式公共子 GetStories()Dim sResponse 作为字符串,html 作为 HTMLDocument,文章作为集合Const BASE_URL As String = "https://news.google.com/"使用 CreateObject("MSXML2.XMLHTTP").打开GET",https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en",假.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送sResponse = StrConv(.responseBody, vbUnicode)结束于设置 html = 新建 HTMLDocument:设置文章 = 新建集合Dim numberOfStories As Long,nodeList As Object,i As Long用 html.body.innerHTML = sResponse设置 nodeList = .querySelectorAll(".VDXfz")numberOfStories = nodeList.LengthDebug.Print "故事数 = " &故事数对于 i = 0 到 nodeList.Length - 1文章.添加替换$(替换$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)下一个结束于调试.打印文章.计数结束子
<小时>
标准 Google 搜索:
以下是标准谷歌搜索的示例,但根据您的搜索词,您不会总是获得相同的 HTML 结构.您需要提供一些失败案例,以帮助我确定是否有可以应用的一致选择器方法.
选项显式公共子 GetResultsCount()Dim sResponse As String, html As HTMLDocument使用 CreateObject("MSXML2.XMLHTTP").打开"GET", "https://www.google.com/search?q=mitsubishi", False.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT".发送sResponse = StrConv(.responseBody, vbUnicode)结束于设置 html = 新建 HTMLDocument用 html.body.innerHTML = sResponseDebug.Print .querySelector("#resultStats").innerText结束于结束子
I have a cell that contains something I would like searched in google news. I want the code to return the number of results for that search. Currently I have this code which I found elsewhere on the site and does not use google news but even then I sometimes get a
runtime error -2147024891 (80070005)
after 70 or so searched and I can't run again.
Sub HawkishSearch()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("B" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
If html.getElementById("resultStats") Is Nothing Then
str_text = "0 Results"
Else
str_text = html.getElementById("resultStats").innerText
End If
Cells(i, 3) = str_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Best option (IMO) is to use the Google News API and register for an API key. You can then use a queryString including your search term and parse the JSON response to get the result count. I do that below and also populate a collection with the article titles and links. I use a JSON parser called JSONConverter.bas which you download and add to your project. You can then go to VBE > Tools > References > add a reference to Microsoft Scripting Runtime.
Sample JSON response from API:
The {}
denotes a dictionary which you access by key, the []
denotes a collection which you access by index or by For Each
loop over.
I use the key totalResults
to retrieve the total results count from the initial dictionary returned by the API.
I then loop the collection of dictionaries (articles) and pull the story titles and URLs.
You can then inspect the results in the locals window or print out
Sample of results in locals window:
Option Explicit
Public Sub GetStories()
Dim articles As Collection, article As Object
Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
Set finalResults = New Collection
searchTerm = "Obama"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print "total results = " & json("totalResults")
Set articles = json("articles")
For Each article In articles
arr(0) = article("title")
arr(1) = article("url")
finalResults.Add arr
Next
Stop '<== Delete me later
End Sub
Loop:
If deploying in a loop you can use a class clsHTTP
to hold the XMLHTTP object. This is more efficient than creating and destroying. I supply this class with a method GetString
to retrieve the JSON response from the API, and a GetInfo
method to parse the JSON and retrieve the results count and the API results URLs and Titles.
Example of results structure in locals window:
Class clsHTTP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
GetString = .responseText
End With
End Function
Public Function GetInfo(ByVal json As Object) As Variant
Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object
finalResults(0) = json("totalResults")
Set articles = json("articles")
ReDim results(1 To articles.Count, 1 To 2)
For Each article In articles
counter = counter + 1
results(counter, 1) = article("title")
results(counter, 2) = article("url")
Next
finalResults(1) = results
GetInfo = finalResults
End Function
Standard module:
Option Explicit
Public Sub GetStories()
Dim http As clsHTTP, json As Object
Dim finalResults(), searchTerms(), searchTerm As Long, url As String
Set http = New clsHTTP
With ThisWorkbook.Worksheets("Sheet1")
searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
End With
ReDim finalResults(1 To UBound(searchTerms))
For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)
url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"
Set json = JsonConverter.ParseJson(http.GetString(url))
finalResults(searchTerm) = http.GetInfo(json)
Set json = Nothing
Next
Stop '<==Delete me later
End Sub
'
Otherwise:
I would use the following where I grab story links by their class name. I get the count and write the links to a collection
Option Explicit
Public Sub GetStories()
Dim sResponse As String, html As HTMLDocument, articles As Collection
Const BASE_URL As String = "https://news.google.com/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument: Set articles = New Collection
Dim numberOfStories As Long, nodeList As Object, i As Long
With html
.body.innerHTML = sResponse
Set nodeList = .querySelectorAll(".VDXfz")
numberOfStories = nodeList.Length
Debug.Print "number of stories = " & numberOfStories
For i = 0 To nodeList.Length - 1
articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
Next
End With
Debug.Print articles.Count
End Sub
Standard Google search:
The following works an example standard google search but you will not always get the same HTML structure depending on your search term. You will need to provide some failing cases to help me determine if there is a consistent selector method that can be applied.
Option Explicit
Public Sub GetResultsCount()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.google.com/search?q=mitsubishi", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector("#resultStats").innerText
End With
End Sub
这篇关于VBA - Google 新闻搜索结果的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!