检测到通过ChromeDriver启动的Chrome浏览器 [英] Chrome browser initiated through ChromeDriver gets detected

查看:575
本文介绍了检测到通过ChromeDriver启动的Chrome浏览器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在python中使用硒chromedriver用于网站www.mouser.co.uk.但是,从第一枪开始就将其检测为机器人.

I am trying to use selenium chromedriver in python for the website www.mouser.co.uk. However, it is detected as bot from the first shot .

有人对此有解释吗?此后,我正在使用的代码:

Does any one has an explanation for this ?. hereafter the code i am using :

options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')

推荐答案

我尝试使用某些 chrome.options 访问URL https://www.mouser.co.uk/,但确实被检测到并被重定向到请原谅我们的打扰页面.

I have tried to access the url https://www.mouser.co.uk/ with certain chrome.options but did get detected and was redirected to Pardon Our Interruption page.

  • 代码块:

  • Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.mouser.co.uk")
myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
driver.execute_script("arguments[0].click();", myElement)

现在,在检查赦免我们的中断页面时,您会发现<body>标签包含:

Now on inspecting the Pardon Our Interruption page you will find the <body> tag contains:

  • class 属性 dist-GlobalHeader
  • class 属性 dist-PageWrap

明确表明该网站受 Bot Management 服务提供商保护 > Distil Networks ,并检测到 ChromeDriver 进行的导航,随后阻止.

Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.

根据文章

Distil通过观察站点行为并识别刮板特有的模式来保护站点免受自动内容抓取机器人的攻击.当Distil在一个站点上识别出恶意机器人时,它会创建一个列入黑名单的行为配置文件,并将其部署到所有客户.像漫游器防火墙一样,Distil会检测模式并做出反应.

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

进一步

"One pattern with Selenium was automating the theft of Web content",Distil首席执行官拉米·埃塞伊(Rami Essai)在上周的一次采访中表示. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考

您可以在以下位置找到一些详细的讨论:


Reference

You can find a couple of detailed discussion in:

  • Distil detects WebDriver driven Chrome Browsing Context
  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Akamai Bot Manager detects WebDriver driven Chrome Browsing Context

这篇关于检测到通过ChromeDriver启动的Chrome浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆