如何通过Python使用GeckoDriver和Firefox使Selenium脚本无法检测? [英] How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?
问题描述
是否可以使用 geckodriver 使Selenium脚本在Python中无法检测?
我正在使用Selenium进行刮擦.我们需要使用任何保护措施以使网站无法检测到硒吗?
检测到硒驱动的 Firefox / GeckoDriver 的事实没有得到"t取决于任何特定的 GeckoDriver 或 Firefox 版本.网站本身可以检测网络流量,并可以将浏览器客户端即 Web浏览器标识为 WebDriver受控./p>
根据
现在, NavigatorAutomationInformation
界面不应在 WorkerNavigator
上公开.
所以
webdriver如果设置了webdriver-active标志,则返回true,否则返回false.
位置
navigator.webdriver定义了一种标准方法,用于与用户代理进行协作,以告知文档该文档由WebDriver控制,例如,以便在自动化过程中可以触发备用代码路径.
因此,最重要的是:
硒可以自我识别
但是,一些避免在网络抓取过程中被检测到的通用方法如下:
- 网站可以通过您的显示器大小来确定您的脚本/程序的首要属性.因此,建议不使用常规的视口.
- 如果您需要向网站发送多个请求,则需要继续更改每个请求的用户代理.在这里,您可以找到有关方法的详细讨论.更改Selenium中的Google Chrome用户代理?
- 要模拟类似的行为,您甚至可能需要降低脚本执行速度,甚至超出 expected_conditions 导致
time.sleep(secs)
.在这里您可以找到有关如何休眠Webdriver的详细讨论python中的毫秒数
Is there a way to make your Selenium script undetectable in Python using geckodriver?
I'm using Selenium for scraping. Are there any protections we need to use so websites can't detect Selenium?
The fact that selenium driven Firefox / GeckoDriver gets detected doesn't depends on any specific GeckoDriver or Firefox version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
As per the documentation of the WebDriver Interface
in the latest editor's draft of WebDriver - W3C Living Document the webdriver-active
flag which is initially set as false, is set to true when the user agent is under remote control i.e. when controlled through Selenium.
Now that the NavigatorAutomationInformation
interface should not be exposed on WorkerNavigator
.
So,
webdriver
Returns true if webdriver-active flag is set, false otherwise.
where as,
navigator.webdriver
Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.
So, the bottom line is:
Selenium identifies itself
However some generic approaches to avoid getting detected while web-scraping are as follows:
- The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website, you need to keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
这篇关于如何通过Python使用GeckoDriver和Firefox使Selenium脚本无法检测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!