如何使用 Selenium 和 Python 绕过 Google CAPTCHA? [英] How can I bypass the Google CAPTCHA with Selenium and Python?
问题描述
如何使用 Selenium 和 Python 绕过 Google CAPTCHA?
How can I bypass the Google CAPTCHA using Selenium and Python?
当我尝试抓取某些东西时,Google 会给我一个 CAPTCHA.我可以使用 Selenium Python 绕过 Google CAPTCHA 吗?
When I try to scrape something, Google give me a CAPTCHA. Can I bypass the Google CAPTCHA with Selenium Python?
例如,它是 Google reCAPTCHA.您可以通过此链接查看此 CAPTCHA:https://www.google.com/recaptcha/api2/演示
As an example, it's Google reCAPTCHA. You can see this CAPTCHA via this link: https://www.google.com/recaptcha/api2/demo
推荐答案
开始使用 Selenium 的Python 客户端,您应该避免解决/绕过 Google 验证码.
To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.
Selenium 使浏览器自动化.现在,您想用这种能力实现什么完全取决于个人,但主要是为了通过浏览器客户端自动化 Web 应用程序以进行测试,粗略地说,它当然不仅限于此.
Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.
另一方面,CAPTCHA(首字母缩写词是...完全自动化用于区分计算机和人类的公共图灵测试...) 是一种用于计算的挑战-响应测试,用于确定用户是否是人类.
On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.
因此,Selenium 和 CAPTCHA 服务于两个完全不同的目的,理想情况下不应用于实现任何相关的任务.
So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.
话虽如此,reCAPTCHA 可以轻松检测网络流量并将您的程序识别为 Seleniumem> 驱动机器人.
Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.
但是,有一些通用的方法可以避免在网页抓取时被检测到:
However, there are some generic approaches to avoid getting detected while web scraping:
- 网站可以确定您的脚本/程序的首要属性是您的显示器大小.所以建议不要使用传统的视口.
- 如果您需要向网站发送多个请求,请不断更改每个请求的用户代理.在这里您可以找到关于如何在 Selenium 中更改 Google Chrome 用户代理?
- 要模拟类人行为,您可能需要减慢脚本执行速度,甚至超出WebDriverWait 和 expected_conditions 诱导
time.sleep(secs)代码>.在这里您可以找到关于如何的详细讨论在 Python 中休眠 Selenium WebDriver 几毫秒
- The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds
但是,在几个用例中,我们能够与 reCAPTCHA 进行交互使用 Selenium,您可以在以下讨论中找到更多详细信息:
However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:
- 如何点击使用 Selenium 和 Java 的 reCAPTCHA
- CSS 选择器用于使用 Selenium 和 VBA Excel 的 reCAPTCHA 复选框
- 找到reCAPTCHA 元素并点击它——Python + Selenium
您可以在以下位置找到一些相关讨论:
You can find a couple of related discussion in:
这篇关于如何使用 Selenium 和 Python 绕过 Google CAPTCHA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!