如何在抓取请求时绕过 Google Recaptcha [英] How to Bypass Google Recaptcha while scraping with Requests

查看：47 发布时间：2021/12/17 14:20:43 python web-scraping beautifulsoup python-requests

本文介绍了如何在抓取请求时绕过 Google Recaptcha的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请求 URL 的 Python 代码:

Python code to request the URL:

agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} #using agent to solve the blocking issue
response = requests.get('https://www.naukri.com/jobs-in-andhra-pradesh', headers=agent)
#making the request to the link

打印html时的输出:

Output when printing the html :

<!DOCTYPE html>

<html>
  <head>
    <title>Naukri reCAPTCHA</title> #the title in the actual title of the URL that I am requested for
    <meta name="robots" content="noindex, nofollow">
        <link rel="stylesheet" href="https://static.naukimg.com/s/4/101/c/common_v62.min.css" />      
        <script src="https://www.google.com/recaptcha/api.js" async defer></script>   
    </head>
</html>

推荐答案

使用 Google Cache 和 referer(在标题中)将帮助您绕过验证码.
注意事项:

Using Google Cache along with a referer (in the header) will help you bypass the captcha.
Things to note:

不要发送超过 2 个请求/秒.你可能会被屏蔽.
您收到的结果是一个缓存.如果您试图抓取实时数据，这将无效.
示例:

header = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
    'referer':'https://www.google.com/'
}

r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)

这给出:

>>> r.content
[Squeezed 2554 lines]

这篇关于如何在抓取请求时绕过 Google Recaptcha的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在抓取请求时绕过 Google Recaptcha [英] How to Bypass Google Recaptcha while scraping with Requests

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在抓取请求时绕过 Google Recaptcha [英] How to Bypass Google Recaptcha while scraping with Requests

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭