检测到通过 ChromeDriver 启动的 Chrome 浏览器 [英] Chrome browser initiated through ChromeDriver gets detected

查看:101
本文介绍了检测到通过 ChromeDriver 启动的 Chrome 浏览器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 www.mouser.co.uk 网站的 python 中使用 selenium chromedriver.但是,它从第一次拍摄就被检测为机器人.

I am trying to use selenium chromedriver in python for the website www.mouser.co.uk. However, it is detected as bot from the first shot .

有没有人对此有解释?.此后我使用的代码:

Does any one has an explanation for this ?. hereafter the code i am using :

options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')

推荐答案

我尝试使用某些 chrome 访问 url https://www.mouser.co.uk/.选项但确实被检测到并被重定向到请原谅我们的打扰页面.

I have tried to access the url https://www.mouser.co.uk/ with certain chrome.options but did get detected and was redirected to Pardon Our Interruption page.

  • 代码块:

  • Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:UtilityBrowserDriverschromedriver.exe')
driver.get("https://www.mouser.co.uk")
myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
driver.execute_script("arguments[0].click();", myElement)

现在检查请原谅我们的打扰页面,您会发现<body>标签包含:

Now on inspecting the Pardon Our Interruption page you will find the <body> tag contains:

  • class 属性 dist-GlobalHeader
  • class 属性 dist-PageWrap

这清楚地表明该网站受到机器人管理服务提供商的保护Distil NetworksChromeDriver 的导航被检测到并随后被阻止.

Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.

根据文章确实有一些关于 Distil.it 的东西...:

Distil 通过观察站点行为和识别抓取工具特有的模式来保护站点免受自动内容抓取机器人的侵害.当 Distil 在一个站点上识别出恶意机器人时,它会创建一个列入黑名单的行为配置文件,并将其部署给所有客户.类似于机器人防火墙,Distil 会检测模式并做出反应.

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

进一步,

Selenium 的一个模式是自动窃取网络内容",Distil 首席执行官 Rami Essaid 上周在接受采访时表示.即使他们可以创建新的机器人,我们还是找到了一种方法来识别 Selenium 是他们正在使用的工具,因此无论他们在该机器人上迭代多少次,我们都会阻止 Selenium.我们正在这样做现在使用 Python 和许多不同的技术.一旦我们看到一种模式从一种机器人中出现,我们就会对他们使用的技术进行逆向工程并将其识别为恶意".

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".

<小时>

参考

您可以在以下位置找到一些详细的讨论:


Reference

You can find a couple of detailed discussion in:

这篇关于检测到通过 ChromeDriver 启动的 Chrome 浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆