如何使用 Selenium 和 Python 绕过 Google 验证码? [英] How can I bypass the Google CAPTCHA with Selenium and Python?

查看:83
本文介绍了如何使用 Selenium 和 Python 绕过 Google 验证码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 Selenium 和 Python 绕过 Google 验证码?

How can I bypass the Google CAPTCHA using Selenium and Python?

当我尝试抓取某些内容时,Google 会给我一个验证码.我可以使用 Selenium Python 绕过 Google 验证码吗?

When I try to scrape something, Google give me a CAPTCHA. Can I bypass the Google CAPTCHA with Selenium Python?

例如,它是 Google reCAPTCHA.您可以通过以下链接查看此验证码:https://www.google.com/recaptcha/api2/演示

As an example, it's Google reCAPTCHA. You can see this CAPTCHA via this link: https://www.google.com/recaptcha/api2/demo

推荐答案

开始使用 SeleniumPython 客户端,你应该避免解决/绕过 Google 验证码.

To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.

Selenium 使浏览器自动化.现在,您想用这种能力实现什么完全取决于个人,但主要是为了通过浏览器客户端自动化 Web 应用程序以进行测试,当然不限于此.

Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.

另一方面,CAPTCHA(缩写为 ...完全自动化用于区分计算机和人类的公共图灵测试...)是一种用于计算以确定用户是否是人类的挑战-响应测试.

On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.

因此,SeleniumCAPTCHA 服务于两个完全不同的目的,理想情况下不应该用于完成任何相互关联的任务.

So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.

话虽如此,reCAPTCHA 可以轻松检测网络流量并将您的程序识别为 Selenium 驱动 bot.

Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.

但是,有一些通用方法可以避免在网页抓取时被检测到:

However, there are some generic approaches to avoid getting detected while web scraping:

  • The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
  • To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds

但是,在几个用例中,我们能够与 reCAPTCHA 进行交互使用 Selenium,您可以在以下讨论中找到更多详细信息:

However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:

您可以在以下位置找到一些相关的讨论:

You can find a couple of related discussion in:

这篇关于如何使用 Selenium 和 Python 绕过 Google 验证码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆