是否存在无法检测到的Selenium Webdriver版本? [英] Is there a version of selenium webdriver that is not detectable?

查看:129
本文介绍了是否存在无法检测到的Selenium Webdriver版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ubuntu服务器上的硒上运行chrome驱动程序.住宅代理网络的背后.但是我的硒被查出了.有没有一种方法可以使铬驱动剂和硒100%无法检测到?

我已经尝试了很久了,以至于我忘记了我所做的许多事情,包括:

  1. 尝试使用不同版本的chrome
  2. 添加多个标志并从chrome驱动程序文件中删除一些单词.
  3. 使用隐身模式在代理(也包括住宅)后面运行它.
  4. 正在加载配置文件.
  5. 随机鼠标移动.
  6. 随机化所有内容.

我正在寻找100%无法检测到的硒的真实版本. (如果存在的话)或漫游器无法检测到的另一种自动化方式.

这是浏览器启动的一部分

sx = random.randint(1000,1500)
sn = random.randint(3000,4500)

display = Display(visible=0, size=(sx,sn))
display.start()


    randagent =  random.randint(0,len(useragents_desktop)-1)

    uag = useragents_desktop[randagent]
    #this is to prevent ip leaking
    preferences = 
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False

    chrome_options.add_experimental_option("prefs", preferences)
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-impl-side-painting")
    chrome_options.add_argument("--disable-setuid-sandbox")
    chrome_options.add_argument("--disable-seccomp-filter-sandbox")
    chrome_options.add_argument("--disable-breakpad")
    chrome_options.add_argument("--disable-client-side-phishing-detection")
    chrome_options.add_argument("--disable-cast")
    chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
    chrome_options.add_argument("--disable-cloud-import")
    chrome_options.add_argument("--disable-popup-blocking")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--disable-session-crashed-bubble")
    chrome_options.add_argument("--disable-ipv6")
    chrome_options.add_argument("--allow-http-screen-capture")
    chrome_options.add_argument("--start-maximized")
    wsize = "--window-size=" +  str(sx-10) + ',' + str(sn-10)
    chrome_options.add_argument(str(wsize) )

    prefs = {"profile.managed_default_content_settings.images": 2}
    chrome_options.add_experimental_option("prefs", prefs)

    chrome_options.add_argument("blink-settings=imagesEnabled=true")
    chrome_options.add_argument("start-maximized")
    chrome_options.add_argument("user-agent="+uag)
    chrome_options.add_extension(pluginfile)#this is for the residential proxy
    driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)

解决方案

硒驱动的WebDriver被检测到的事实 没有取决于任何特定的 Selenium Chrome ChromeDriver 版本. 网站本身可以检测网络流量,并且可以将浏览器客户端 Web浏览器标识为 WebDriver控制的./p>

但是,一些避免在网络抓取过程中被检测到的通用方法如下:

@Antoine Vastel在他的博客网站检测Chrome无头提到了几种方法,它们将 Chrome 浏览器与 headless Chrome 浏览器区分开来.

  • 用户代理:用户代理属性通常用于检测用户的操作系统和浏览器.在Chrome版本59中,它具有以下值:

    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
    

    • 可以通过以下方法检查 Chrome无头的存在:

      if (/HeadlessChrome/.test(window.navigator.userAgent)) {
          console.log("Chrome headless detected");
      }
      

  • 插件: navigator.plugins 返回浏览器中存在的一系列插件.通常,在Chrome上我们会找到默认插件,例如Chrome PDF viewerGoogle Native Client.相反,在无头模式下,返回的数组包含插件.

    • 可以通过以下方法检查插件的存在:

      if(navigator.plugins.length == 0) {
          console.log("It may be Chrome headless");
      }
      

  • 语言:在Chrome中,两个Javascript属性可用于获取 user: navigator.language navigator.languages 所使用的语言.第一个是浏览器用户界面的语言,第二个是代表用户首选语言的字符串数组.但是,在无头模式下,navigator.languages返回一个字符串.

    • 可以通过以下方法检查语言的存在:

      if(navigator.languages == "") {
           console.log("Chrome headless detected");
      }
      

  • WebGL :WebGL是用于在HTML画布中执行3D渲染的API.使用此API,可以查询图形驱动程序的供应商以及图形驱动程序的渲染器.使用普通的Chrome和Linux,我们可以获得渲染器和供应商的以下值:Google SwiftShaderGoogle Inc..在无头模式下,我们可以获得Mesa OffScreen(这是不使用任何窗口系统进行渲染的技术)和Brian Paul(这是启动开源Mesa图形库的程序)的方法.

    • 可以通过以下方法检查 WebGL 的存在:

      var canvas = document.createElement('canvas');
      var gl = canvas.getContext('webgl');
      
      var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
      var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
      var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
      
      if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
          console.log("Chrome headless detected");
      }
      

    • 并非所有的无头Chrome都具有相同的供应商和渲染器值.其他人保留的值也可以在非无头版本中找到.但是,Mesa OffscreenBrian Paul表示存在无头版本.

  • 浏览器功能:Modernizr库可以测试浏览器中是否存在各种HTML和CSS功能.我们发现Chrome和无头Chrome之间的唯一区别是后者没有发际线功能,该功能检测到对hidpi/retina hairlines的支持.

    • 可以通过以下方法检查发际线特征的存在:

      if(!Modernizr["hairline"]) {
          console.log("It may be Chrome headless");
      }
      

  • 缺少图像:我们列表中的最后一个似乎也最可靠,它来自Chrome所使用的图像尺寸,以防无法加载图像.在使用普通Chrome浏览器的情况下,图像的宽度和高度取决于浏览器的缩放比例,但不为零.在无头Chrome中,图片的宽度和高度等于零.

    • 可以通过以下方法检查是否存在缺少图像:

      var body = document.getElementsByTagName("body")[0];
      var image = document.createElement("img");
      image.src = "http://iloveponeydotcom32188.jg";
      image.setAttribute("id", "fakeimage");
      body.appendChild(image);
      image.onerror = function(){
          if(image.width == 0 && image.height == 0) {
          console.log("Chrome headless detected");
          }
      }   
      


参考文献

您可以在以下位置找到几个类似的讨论:


tl;博士

I am running chrome driver over selenium on a ubuntu server. Behind a residential proxy network . Yet my selenium is being detected . Is there a way to make chrome driver and selenium 100% undetectable ?

I have been trying for so long I lost track of the many things I have done including:

  1. Trying different versions of chrome
  2. Adding several flags and removing some words from the chrome driver file.
  3. Running it behind a proxy (residential ones also) using incognito mode.
  4. Loading profiles.
  5. Random mouse movements.
  6. Randomising everything.

I am looking for a true version of selenium that is 100% undetectable . if that ever existed .or another automation way that is not detectable by bot trackers .

This is part of the starting of the browser

sx = random.randint(1000,1500)
sn = random.randint(3000,4500)

display = Display(visible=0, size=(sx,sn))
display.start()


    randagent =  random.randint(0,len(useragents_desktop)-1)

    uag = useragents_desktop[randagent]
    #this is to prevent ip leaking
    preferences = 
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False

    chrome_options.add_experimental_option("prefs", preferences)
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-impl-side-painting")
    chrome_options.add_argument("--disable-setuid-sandbox")
    chrome_options.add_argument("--disable-seccomp-filter-sandbox")
    chrome_options.add_argument("--disable-breakpad")
    chrome_options.add_argument("--disable-client-side-phishing-detection")
    chrome_options.add_argument("--disable-cast")
    chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
    chrome_options.add_argument("--disable-cloud-import")
    chrome_options.add_argument("--disable-popup-blocking")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--disable-session-crashed-bubble")
    chrome_options.add_argument("--disable-ipv6")
    chrome_options.add_argument("--allow-http-screen-capture")
    chrome_options.add_argument("--start-maximized")
    wsize = "--window-size=" +  str(sx-10) + ',' + str(sn-10)
    chrome_options.add_argument(str(wsize) )

    prefs = {"profile.managed_default_content_settings.images": 2}
    chrome_options.add_experimental_option("prefs", prefs)

    chrome_options.add_argument("blink-settings=imagesEnabled=true")
    chrome_options.add_argument("start-maximized")
    chrome_options.add_argument("user-agent="+uag)
    chrome_options.add_extension(pluginfile)#this is for the residential proxy
    driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)

解决方案

The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

However some generic approaches to avoid getting detected while web-scraping are as follows:

@Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.

  • User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:

    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
    

    • A check for the presence of Chrome headless can be done through:

      if (/HeadlessChrome/.test(window.navigator.userAgent)) {
          console.log("Chrome headless detected");
      }
      

  • Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.

    • A check for the presence of Plugins can be done through:

      if(navigator.plugins.length == 0) {
          console.log("It may be Chrome headless");
      }
      

  • Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.

    • A check for the presence of Languages can be done through:

      if(navigator.languages == "") {
           console.log("Chrome headless detected");
      }
      

  • WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.

    • A check for the presence of WebGL can be done through:

      var canvas = document.createElement('canvas');
      var gl = canvas.getContext('webgl');
      
      var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
      var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
      var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
      
      if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
          console.log("Chrome headless detected");
      }
      

    • Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.

  • Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.

    • A check for the presence of hairline feature can be done through:

      if(!Modernizr["hairline"]) {
          console.log("It may be Chrome headless");
      }
      

  • Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.

    • A check for the presence of Missing image can be done through:

      var body = document.getElementsByTagName("body")[0];
      var image = document.createElement("img");
      image.src = "http://iloveponeydotcom32188.jg";
      image.setAttribute("id", "fakeimage");
      body.appendChild(image);
      image.onerror = function(){
          if(image.width == 0 && image.height == 0) {
          console.log("Chrome headless detected");
          }
      }   
      


References

You can find a couple of similar discussions in:


tl; dr

这篇关于是否存在无法检测到的Selenium Webdriver版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆