reCAPTCHA 3 如何知道我正在使用 Selenium/chromedriver? [英] How does reCAPTCHA 3 know I'm using Selenium/chromedriver?
问题描述
我很好奇 reCAPTCHA v3 的工作原理.特别是浏览器指纹识别.
当我通过 Selenium/chromedriver 启动 Chrome 实例并针对 reCAPTCHA 3 进行测试时 (
- reCAPTCHA v2 - 隐形 reCAPTCHA 徽章:隐形 reCAPTCHA 徽章不需要用户单击复选框,而是在用户单击某个复选框时直接调用.您网站上的现有按钮或可以通过 JavaScript API 调用来调用.当 reCAPTCHA 验证完成时,集成需要 JavaScript 回调.默认情况下,只会提示最可疑的流量来解决验证码.要更改此行为,请在高级设置下编辑您的站点安全首选项.
- reCAPTCHA v2 - Android:reCAPTCHA Android 库是 Google Play 服务 SafetyNet API 的一部分.该库提供了可以直接集成到应用程序中的原生 Android API.在调用 reCAPTCHA API 之前,您应该在您的应用中设置 Google Play 服务并连接到 GoogleApiClient.这将立即让用户通过(没有 CAPTCHA 提示)或挑战他们以验证他们是否是人类.
- reCAPTCHA v1:reCAPTCHA v1 已于 2018 年 3 月关闭.
解决方案
但是,有一些通用的方法可以避免在网页抓取时被检测到:
- 网站可以确定您的脚本/程序的首要属性是您的显示器大小.所以建议不要使用传统的视口.
- 如果您需要向网站发送多个请求,请继续更改每个请求的用户代理.在这里您可以找到关于 Way to在 Selenium 中更改 Google Chrome 用户代理?
- 要模拟类似人类的行为,您可能需要减慢脚本执行速度,甚至超出 WebDriverWait 和 expected_conditions 诱导
time.sleep(secs)代码>.在这里你可以找到关于 如何睡眠 webdriver 的详细讨论在 python 中以毫秒为单位
尾声
一些值得深思的东西:
- Selenium webdriver:修改导航器.webdriver 标志以防止硒检测
- 无法使用 Selenium 自动执行 Chase网站登录
- 使用 reCAPTCHA v3 API 的请求的置信度得分
I'm curious how reCAPTCHA v3 works. Specifically the browser fingerprinting.
When I launch an instance of Chrome through Selenium/chromedriver and test against reCAPTCHA 3 (https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php) I always get a score of 0.1 when using Selenium/chromedriver.
When using incognito with a normal instance, I get 0.3.
I've beaten other detection systems by injecting JavaScript and modifying the web driver object and recompiling webdriver from source and modifying the $cdc_
variables.
I can see what looks like some obfuscated POST back to the server, so I'm going to start digging there.
What might it be looking for to determine if I'm running Selenium/chromedriver?
reCaptcha
Websites can easily detect the network traffic and identify your program as a BOT. Google have already released 5(five) reCAPTCHA to choose from when creating a new site. While four of them are active and reCAPTCHA v1 being shutdown.
reCAPTCHA versions and types
- reCAPTCHA v3 (verify requests with a score): reCAPTCHA v3 allows you to verify if an interaction is legitimate without any user interaction. It is a pure JavaScript API returning a score, giving you the ability to take action in the context of your site: for instance requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content.
- reCAPTCHA v2 - "I'm not a robot" Checkbox: The "I'm not a robot" Checkbox requires the user to click a checkbox indicating the user is not a robot. This will either pass the user immediately (with No CAPTCHA) or challenge them to validate whether or not they are human. This is the simplest option to integrate with and only requires two lines of HTML to render the checkbox.
- reCAPTCHA v2 - Invisible reCAPTCHA badge: The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call. The integration requires a JavaScript callback when reCAPTCHA verification is complete. By default only the most suspicious traffic will be prompted to solve a captcha. To alter this behavior edit your site security preference under advanced settings.
- reCAPTCHA v2 - Android: The reCAPTCHA Android library is part of the Google Play services SafetyNet APIs. This library provides native Android APIs that you can integrate directly into an app. You should set up Google Play services in your app and connect to the GoogleApiClient before invoking the reCAPTCHA API. This will either pass the user through immediately (without a CAPTCHA prompt) or challenge them to validate whether they are human.
- reCAPTCHA v1: reCAPTCHA v1 has been shut down since March 2018.
Solution
However there are some generic approaches to avoid getting detected while web-scraping:
- The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
Outro
Some food for thought:
- Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
- Unable to use Selenium to automate Chase site login
- Confidence Score of the request using reCAPTCHA v3 API
这篇关于reCAPTCHA 3 如何知道我正在使用 Selenium/chromedriver?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!