如何使用Selenium和Python绕过Google验证码? [英] How to bypass Google captcha with Selenium and Python?
问题描述
我想知道如何使用Selenium和Python绕过Google验证码。当我尝试剪贴时,Google给我验证码,我可以用Selenium Python绕过Google验证码吗?
I want to know how to bypass Google captcha using Selenium and Python. When I try to scrap something, Google give me captcha, can I bypass Google captcha with Selenium Python?
例如,它是Google reCAPTCHA,您可以通过以下链接查看此验证码: https://www.google.com/recaptcha/api2/demo
As an example it's Google reCAPTCHA you can see this captcha via this link: https://www.google.com/recaptcha/api2/demo
推荐答案
首先使用硒的 Python 客户端,您应避免解决/绕过Google 验证码。
To start with using Selenium's Python clients you should avoid solving/bypass google captcha.
Selenium 使浏览器自动化。现在,您要用这种功能实现什么完全取决于个人,但主要是为了通过浏览器客户端自动化Web应用程序以进行测试,并且粗略地讲,它当然不限于此。
Selenium automates browsers. Now what you what to achieve with that power is entirely up to individuals but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.
另一方面, Captcha (缩写为 ...完全自动化的公共Turing测试,以告诉计算机和人类分开... )是一种挑战–用于计算用户是否为人类的响应测试。
On the other hand, Captcha (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.
因此, Selenium 和 Captcha 服务两个完全不同的目的,并且理想情况下不应使用它来完成任何相互关联的任务。
So, Selenium and Captcha serves two completely different purpose and ideally shouldn't be used to achieve any interrelated tasks.
话虽如此, recaptcha 可以轻松检测网络流量并将您的程序标识为 Selenium 驱动的 BOT 。
Having said that, recaptcha can easily detect the network traffic and identify your program as a Selenium driven BOT.
但是有一些通用方法可以避免在抓取网页时被检测到:
However there are some generic approaches to avoid getting detected while web-scraping:
- 网站能够确定您的脚本/程序的首要属性是通过您的监视器大小。因此,建议不使用常规的视口。
- 如果您需要向网站发送多个请求,请继续更改每个请求的用户代理。在这里,您可以找到有关方法的详细讨论。更改Selenium中的Google Chrome用户代理?
- 要模拟类似的行为,您可能需要放慢脚本执行速度,甚至超出 WebDriverWait 和期望的条件引发了
time.sleep(secs)
。在这里您可以找到有关如何休眠Webdriver的详细讨论python中的毫秒数
- The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
但是,在几个用例中,我们能够与 reCAPTCHA进行交互使用 Selenium ,您可以在以下讨论中找到更多详细信息:
However in a couple of usecases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:
- 如何使用Selenium和Java单击reCaptcha
- 使用Selenium和vba excel的reCaptcha Checkbok的CSS选择器
- 找到reCAPTCHA元素并单击它-Python + Selenium
- How to click on the reCaptcha using Selenium and Java
- CSS selector for reCaptcha checkbok using Selenium and vba excel
- Find the reCAPTCHA element and click on it — Python + Selenium
您可以在以下位置找到一些相关的讨论:
You can find a couple of related discussion in:
- How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
- Is there a version of selenium that is not detectable ? can selenium be truly undetectable?
- How does recaptcha 3 know I'm using selenium/chromedriver?
这篇关于如何使用Selenium和Python绕过Google验证码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!