使用Google-apps-script从Google搜索中抓取时出现错误429 [英] Error 429 on scraping from Google search with Google-apps-script

查看:92
本文介绍了使用Google-apps-script从Google搜索中抓取时出现错误429的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取某些域的索引页数.因此,我想使用"site:"参数并从搜索结果页面提取结果数.

I want to get the number of indexed pages for certain domains. Therefore I want to use the "site:" parameter and extract the number of results from the search result page.

我用Google电子表格的Google-apps-script进行了尝试:

I tried it with a Google-apps-script for Google spreadsheets:

function sampleFormula_4() {
  const url = "https://www.google.com/search?q=site%3Abenedikt-sahlmueller.de";
  
  try {
    const html = UrlFetchApp.fetch(url).getContentText();
    return html.match(/<div id="result-stats">(.+?)nobr>/)[1].trim();

  } catch (e) {
    Utilities.sleep(5000);
    const html = UrlFetchApp.fetch(url).getContentText();
    return html.match(/<div id="result-stats">(.+?)nobr>/)[1].trim();
  }
}

Google Spreadsheet给我一个错误429-请求太多.我整合了5000毫秒的睡眠时间,但Google搜索仍然返回错误429.

Google Spreadsheet gives me an error 429 - too many requests. I integrated a sleep-time of 5000ms, but Google Search still returns error 429.

我需要的是Google搜索结果中某些URL的页面数.也许有更好的方法-我不能为此使用search-api,因为这些页面不属于我的GSC.

All I need is the number of pages for certain URLs in Google's search results. Maybe there is a better way - I can't use the search-api for this as those pages are not part of my GSC.

推荐答案

Google搜索很可能会将来自 UrlFetch 的请求视为自动流量,因此将其阻止.来自官方文档:

Most likely Google Search is considering requests coming from UrlFetch as automated traffic and hence blocking them. From the official docs:

Google认为自动流量

  • 从机器人,计算机程序,自动化服务或搜索刮板发送搜索

例如,使用诸如 wget curl 之类的工具时,也会发生相同的行为.

The same behaviour happens when using tools like wget or curl, for example.

建议使用搜索API .

这篇关于使用Google-apps-script从Google搜索中抓取时出现错误429的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆