检测到通过 ChromeDriver 启动的 Chrome 浏览器 [英] Chrome browser initiated through ChromeDriver gets detected
问题描述
我正在尝试在 www.mouser.co.uk 网站的 python 中使用 selenium chromedriver.但是,它从第一次拍摄就被检测为机器人.
I am trying to use selenium chromedriver in python for the website www.mouser.co.uk. However, it is detected as bot from the first shot .
有没有人对此有解释?.此后我使用的代码:
Does any one has an explanation for this ?. hereafter the code i am using :
options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')
推荐答案
我尝试使用某些 chrome 访问 url https://www.mouser.co.uk/
.选项但确实被检测到并被重定向到请原谅我们的打扰页面.
I have tried to access the url https://www.mouser.co.uk/
with certain chrome.options but did get detected and was redirected to Pardon Our Interruption page.
代码块:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:UtilityBrowserDriverschromedriver.exe')
driver.get("https://www.mouser.co.uk")
myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
driver.execute_script("arguments[0].click();", myElement)
现在检查请原谅我们的打扰页面,您会发现<body>
标签包含:
Now on inspecting the Pardon Our Interruption page you will find the <body>
tag contains:
- class 属性
dist-GlobalHeader
- class 属性
dist-PageWrap
这清楚地表明该网站受到机器人管理服务提供商的保护Distil Networks 和 ChromeDriver 的导航被检测到并随后被阻止.
Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.
Distil 通过观察站点行为和识别抓取工具特有的模式来保护站点免受自动内容抓取机器人的侵害.当 Distil 在一个站点上识别出恶意机器人时,它会创建一个列入黑名单的行为配置文件,并将其部署给所有客户.类似于机器人防火墙,Distil 会检测模式并做出反应.
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
进一步,
Selenium 的一个模式是自动窃取网络内容"
,Distil 首席执行官 Rami Essaid 上周在接受采访时表示.即使他们可以创建新的机器人,我们还是找到了一种方法来识别 Selenium 是他们正在使用的工具,因此无论他们在该机器人上迭代多少次,我们都会阻止 Selenium.我们正在这样做现在使用 Python 和许多不同的技术.一旦我们看到一种模式从一种机器人中出现,我们就会对他们使用的技术进行逆向工程并将其识别为恶意".
"One pattern with Selenium was automating the theft of Web content"
, Distil CEO Rami Essaid said in an interview last week."Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
<小时>
参考
您可以在以下位置找到一些详细的讨论:
Reference
You can find a couple of detailed discussion in:
- Distil 检测 WebDriver 驱动的 Chrome 浏览上下文
- Selenium webdriver:修改导航器.webdriver 标志以防止硒检测
- Akamai Bot Manager 检测到 WebDriver 驱动的 Chrome 浏览上下文
这篇关于检测到通过 ChromeDriver 启动的 Chrome 浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!