是否有无法检测到的 Selenium WebDriver 版本? [英] Is there a version of Selenium WebDriver that is not detectable?

查看:37
本文介绍了是否有无法检测到的 Selenium WebDriver 版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Ubuntu 服务器上的 Selenium 上运行 Chrome 驱动程序住宅代理网络.然而,我的硒被检测到了.有没有办法让 Chrome 驱动程序和 Selenium 100% 无法检测?

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?

我一直在努力,以至于我忘记了我做过的很多事情,包括:

I have been trying for so long I lost track of the many things I have done including:

  1. 尝试不同版本的 Chrome
  2. 添加几个标志并从 Chrome 驱动程序文件中删除一些字词.
  3. 使用隐身模式在代理(也包括住宅代理)后面运行.
  4. 正在加载配置文件.
  5. 鼠标随机移动.
  6. 随机化一切.

我正在寻找 100% 无法检测的真正 Selenium 版本.如果那曾经存在过.或者机器人跟踪器无法检测到的另一种自动化方式.

I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.

这是浏览器启动的一部分:

This is part of the starting of the browser:

sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)

display = Display(visible=0, size=(sx,sn))
display.start()


randagent = random.randint(0,len(useragents_desktop)-1)

uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
    "webrtc.ip_handling_policy" : "disable_non_proxied_udp",
    "webrtc.multiple_routes_enabled": False,
    "webrtc.nonproxied_udp_enabled" : False

chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")

wsize = "--window-size=" +  str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )

prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)

chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)

推荐答案

selenium 驱动的 WebDriver 被检测到的事实 取决于任何特定的 SeleniumChromeChromeDriver 版本.网站本身可以检测网络流量,并且可以识别浏览器客户端,即Web浏览器WebDriver控制.

The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

然而,一些避免在网页抓取时被检测到的通用方法如下:

However some generic approaches to avoid getting detected while web-scraping are as follows:

  • The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
  • To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

@Antoine Vastel 在他的博客网站 检测 Chrome Headless 提到了几种将 Chrome 浏览器与 headless Chrome 浏览器区分开来的方法.

@Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.

  • 用户代理:用户代理属性通常用于检测用户的操作系统和浏览器.对于 Chrome 59 版,它具有以下值:

  • User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36

  • 可以通过以下方式检查 Chrome headless 是否存在:

    if (/HeadlessChrome/.test(window.navigator.userAgent)) {
        console.log("Chrome headless detected");
    }
    

  • 插件:navigator.plugins 返回浏览器中存在的插件数组.通常,在 Chrome 上我们会找到默认插件,例如 Chrome PDF viewerGoogle Native Client.相反,在无头模式下,返回的数组包含没有插件.

    Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.

    • 可以通过以下方式检查插件是否存在:

    if(navigator.plugins.length == 0) {
        console.log("It may be Chrome headless");
    }
    

    Languages:Chrome 中有两个 Javascript 属性可以获取用户使用的语言:navigator.languagenavigator.languages.第一个是浏览器 UI 的语言,而第二个是代表用户首选语言的字符串数组.但是,在无头模式下,navigator.languages 返回一个 字符串.

    Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.

    • 可以通过以下方式检查语言是否存在:

    if(navigator.languages == "") {
         console.log("Chrome headless detected");
    }
    

    WebGL:WebGL 是一种在 HTML 画布中执行 3D 渲染的 API.使用此 API,可以查询图形驱动程序的供应商以及图形驱动程序的渲染器.使用 vanilla Chrome 和 Linux,我们可以获得渲染器和供应商的以下值:Google SwiftShaderGoogle Inc..在headless模式下,我们可以获得Mesa OffScreen,这是一种不使用任何窗口系统进行渲染的技术,以及Brian Paul,这是开始打开的程序源 Mesa 图形库.

    WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.

    • 可以通过以下方式检查 WebGL 是否存在:

    var canvas = document.createElement('canvas');
    var gl = canvas.getContext('webgl');
    
    var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
    var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
    var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
    
    if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
        console.log("Chrome headless detected");
    }
    

  • 并非所有无头 Chrome 的供应商和渲染器都具有相同的值.其他人保留也可以在非无头版本上找到的值.但是,Mesa OffscreenBrian Paul 表明存在无头版本.

  • Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.

    浏览器功能:Modernizr 库能够测试浏览器中是否存在各种 HTML 和 CSS 功能.我们发现 Chrome 和无头 Chrome 之间的唯一区别是后者没有细线功能,可以检测对 hidpi/retina 细线 的支持.

    Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.

    • 可以通过以下方式检查是否存在细线特征:

    if(!Modernizr["hairline"]) {
        console.log("It may be Chrome headless");
    }
    

  • 缺失图像:我们列表中的最后一个似乎也是最健壮的,来自 Chrome 使用的图像尺寸,以防图像无法加载.对于普通 Chrome,图像的宽度和高度取决于浏览器的缩放比例,但不为零.在无头 Chrome 中,图像的宽度和高度为零.

    Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.

    • 可以通过以下方式检查是否存在缺失图像:

    var body = document.getElementsByTagName("body")[0];
    var image = document.createElement("img");
    image.src = "http://iloveponeydotcom32188.jg";
    image.setAttribute("id", "fakeimage");
    body.appendChild(image);
    image.onerror = function(){
        if(image.width == 0 && image.height == 0) {
        console.log("Chrome headless detected");
        }
    }   
    

    您可以在以下位置找到几个类似的讨论:

    You can find a couple of similar discussions in:

    这篇关于是否有无法检测到的 Selenium WebDriver 版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆