如何在抓取请求时绕过Google Recaptcha [英] How to Bypass Google Recaptcha while scraping with Requests
本文介绍了如何在抓取请求时绕过Google Recaptcha的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
请求URL的Python代码:
Python code to request the URL:
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} #using agent to solve the blocking issue
response = requests.get('https://www.naukri.com/jobs-in-andhra-pradesh', headers=agent)
#making the request to the link
打印html时的输出:
Output when printing the html :
<!DOCTYPE html>
<html>
<head>
<title>Naukri reCAPTCHA</title> #the title in the actual title of the URL that I am requested for
<meta name="robots" content="noindex, nofollow">
<link rel="stylesheet" href="https://static.naukimg.com/s/4/101/c/common_v62.min.css" />
<script src="https://www.google.com/recaptcha/api.js" async defer></script>
</head>
</html>
推荐答案
使用Google Cache
和referer
可以防止这些验证码(请记住不要在1秒内发送超过2个请求.您可能会被阻止:
Using Google Cache
along with a referer
prevents these captcha's (do remember not to send more than 2 requests in 1 seconds. You may get blocked:
header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,'referer':'https://www.google.com/'}
r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)
这给出了:
>>> r.content
[Squeezed 2554 lines]
这篇关于如何在抓取请求时绕过Google Recaptcha的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文