是否存在无法检测到的 Selenium WebDriver 版本? [英] Is there a version of Selenium WebDriver that is not detectable?

查看:16
本文介绍了是否存在无法检测到的 Selenium WebDriver 版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Ubuntu 服务器上通过 Selenium 运行 Chrome 驱动程序住宅代理网络.然而,我的 Selenium 正在被检测到.有没有办法让 Chrome 驱动程序和 Selenium 100% 不可检测?

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?

我已经尝试了很长时间,以至于忘记了我做过的许多事情,包括:

I have been trying for so long I lost track of the many things I have done including:

  1. 尝试不同版本的 Chrome
  2. 从 Chrome 驱动程序文件中添加几个标志并删除一些单词.
  3. 使用隐身模式在代理(也包括住宅代理)后面运行它.
  4. 正在加载配置文件.
  5. 随机鼠标移动.
  6. 随机化一切.

我正在寻找 100% 无法检测到的真实版本的 Selenium.如果那曾经存在过.或者机器人跟踪器无法检测到的另一种自动化方式.

I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.

这是浏览器启动的一部分:

This is part of the starting of the browser:

sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)

display = Display(visible=0, size=(sx,sn))
display.start()


randagent = random.randint(0,len(useragents_desktop)-1)

uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
    "webrtc.ip_handling_policy" : "disable_non_proxied_udp",
    "webrtc.multiple_routes_enabled": False,
    "webrtc.nonproxied_udp_enabled" : False

chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")

wsize = "--window-size=" +  str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )

prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)

chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)

推荐答案

selenium 驱动的 WebDriver 被检测到 依赖于任何特定的 SeleniumChromeChromeDriver 版本.网站本身可以检测网络流量,并可以将浏览器客户端,即Web浏览器识别为WebDriver控制.

The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

然而,一些避免在网络抓取时被检测到的通用方法如下:

However some generic approaches to avoid getting detected while web-scraping are as follows:

  • The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
  • To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

@Antoine Vastel 在他的博客站点 检测 Chrome Headless 提到了几种方法,它们将 Chrome 浏览器与 headless Chrome 浏览器区分开来.

@Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.

  • 用户代理:用户代理属性通常用于检测操作系统以及用户的浏览器.对于 Chrome 版本 59,它具有以下值:

  • User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36

  • 可以通过以下方式检查是否存在 Chrome headless:

    if (/HeadlessChrome/.test(window.navigator.userAgent)) {
        console.log("Chrome headless detected");
    }
    

  • 插件:navigator.plugins 返回浏览器中存在的插件数组.通常,在 Chrome 上我们会找到默认插件,例如 Chrome PDF 查看器Google Native Client.相反,在无头模式下,返回的数组包含 no 插件.

    Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.

    • 可以通过以下方式检查是否存在 插件:

    if(navigator.plugins.length == 0) {
        console.log("It may be Chrome headless");
    }
    

    语言:在 Chrome 中,两个 Javascript 属性可以获取 用户使用的语言:navigator.languagenavigator.languages.第一个是浏览器 UI 的语言,而第二个是代表用户首选语言的字符串数组.但是,在无头模式下,navigator.languages 返回一个 empty 字符串.

    Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.

    • 可以通过以下方式检查是否存在语言:

    if(navigator.languages == "") {
         console.log("Chrome headless detected");
    }
    

    WebGL:WebGL 是一种在 HTML 画布中执行 3D 渲染的 API.使用此 API,可以查询图形驱动程序的供应商以及图形驱动程序的渲染器.使用 vanilla Chrome 和 Linux,我们可以获得以下渲染器和供应商的值:Google SwiftShaderGoogle Inc..在headless模式下,我们可以获得Mesa OffScreen,这是一种不使用任何窗口系统的渲染技术和Brian Paul,这是启动open的程序源 Mesa 图形库.

    WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.

    • 可以通过以下方式检查是否存在 WebGL:

    var canvas = document.createElement('canvas');
    var gl = canvas.getContext('webgl');
    
    var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
    var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
    var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
    
    if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
        console.log("Chrome headless detected");
    }
    

  • 并非所有 Chrome 无头浏览器都具有相同的供应商和渲染器值.其他人保留在非无头版本上也可以找到的值.但是,Mesa OffscreenBrian Paul 表示存在无头版本.

  • Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.

    浏览器功能:Modernizr 库可以测试浏览器中是否存在各种 HTML 和 CSS 功能.我们发现 Chrome 和无头 Chrome 之间的唯一区别是后者没有发际线功能,该功能检测对 hidpi/retina hairlines 的支持.

    Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.

    • 可以通过以下方式检查是否存在细线特征:

    if(!Modernizr["hairline"]) {
        console.log("It may be Chrome headless");
    }
    

  • 缺少图片:我们列表中的最后一个似乎也是最可靠的,来自 Chrome 使用的图片尺寸,以防图片无法加载.在 vanilla Chrome 的情况下,图像的宽度和高度取决于浏览器的缩放比例,但不为零.在无头 Chrome 中,图像的宽度和高度均为零.

    Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.

    • 可以通过以下方式检查是否存在缺失图像:

    var body = document.getElementsByTagName("body")[0];
    var image = document.createElement("img");
    image.src = "http://iloveponeydotcom32188.jg";
    image.setAttribute("id", "fakeimage");
    body.appendChild(image);
    image.onerror = function(){
        if(image.width == 0 && image.height == 0) {
        console.log("Chrome headless detected");
        }
    }   
    

    您可以在以下位置找到几个类似的讨论:

    You can find a couple of similar discussions in:

    这篇关于是否存在无法检测到的 Selenium WebDriver 版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆