无头Chrome驱动程序不适用于Selenium [英] Headless Chrome Driver not working for Selenium

查看：86 发布时间：2021/4/22 19:37:28 python selenium web-scraping selenium-chromedriver cloudflare

本文介绍了无头Chrome驱动程序不适用于Selenium的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我设置 options.add_argument(-headless")时，我的刮板当前有问题.但是，将其卸下时，效果很好.谁能建议我如何使用无头模式实现相同的结果?

I am current having an issue with my scraper when I set options.add_argument("--headless"). However, it works perfectly fine when it is removed. Could anyone advise how I can achieve the same results with headless mode?

以下是我的python代码:

Below is my python code:

from seleniumwire import webdriver as wireDriver
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
    
chromedriverPath = '/Users/applepie/Desktop/chromedrivermac'

    def scraper(search):

    mit = "https://orbit-kb.mit.edu/hc/en-us/search?utf8=✓&query="  # Empty search on mit site
    mit += "+".join(search) + "&commit=Search"
    results = []

    options = Options()
    options.add_argument("--headless")
    options.add_argument("--window-size=1440, 900")
    driver = webdriver.Chrome(options=options, executable_path= chromedriverPath)

    driver.get(mit)
    # Wait 20 seconds for page to load
    timeout = 20
    try:
        WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CLASS_NAME, "header")))
        search_results = driver.find_element_by_class_name("search-results")
        for result in search_results.find_elements_by_class_name("search-result"):
            resultObject = {
                "url": result.find_element_by_class_name('search-result-link').get_attribute("href")
            }
            results.append(resultObject)
        driver.quit()
    except TimeoutException:
        print("Timed out waiting for page to load")
        driver.quit()

    return results

这也是我在 get()之后进行 print(driver.page_source)时的屏幕截图:

Here is also a screenshot of when I print(driver.page_source) after get():

推荐答案

此屏幕截图...

...表示 Cloudflare 已将您对网站的请求检测为自动bot，随后拒绝您访问该应用程序.

...implies that the Cloudflare have detected your requests to the website as an automated bot and subsequently denying you the access to the application.

在这些情况下，可能的解决方案是在 undetected-chromedriver 中使用 headless 模式来初始化

In these cases the a potential solution would be to use the undetected-chromedriver in headless mode to initialize the google-chrome-headless browsing context.

undetected-chromedriver 是经过优化的Selenium Chromedriver补丁，不会触发反机器人服务例如Distill Network/Imperva/DataDome/Botprotect.io.它会自动下载驱动程序二进制文件并对其进行修补.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

代码块:

Code Block:

import undetected_chromedriver as uc
from selenium import webdriver

options = webdriver.ChromeOptions() 
options.headless = True
driver = uc.Chrome(options=options)
driver.get(url)

您可以在以下位置找到几个相关的详细讨论:

You can find a couple of relevant detailed discussions in:

这篇关于无头Chrome驱动程序不适用于Selenium的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无头Chrome驱动程序不适用于Selenium [英] Headless Chrome Driver not working for Selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

无头Chrome驱动程序不适用于Selenium [英] Headless Chrome Driver not working for Selenium

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭