requests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构 [英] requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Python

查看:71
本文介绍了requests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Selenium + Python 查找我网页上的损坏链接.我尝试了上面的代码,但它显示了以下错误:

I want to find the broken links on my web page by using Selenium + Python. I tried the above code but it shows me the following error:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

代码试验:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

完整代码:

def test_lsearch(self):
    driver=self.driver
    driver.get("http://www.google.com")
    driver.set_page_load_timeout(10)
    driver.find_element_by_name("q").send_keys("selenium")

    driver.set_page_load_timeout(10)
    el=driver.find_element_by_name("btnK")
    el.click()
    time.sleep(5)

    links=driver.find_elements_by_css_selector("a")
    for link in links:
        r=requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'),r.status_code)

推荐答案

此错误信息...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...暗示对 unicode 域名和路径的支持在收集的 href 属性内失败.

...implies that the Support for unicode domain names and paths failed within the collected href attribute.

此错误定义在 models.py 如下:

This error is defined in models.py as follows:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL {0!r}: No schema supplied. "
                            "Perhaps you meant http://{0}?".format(url))

解决方案

Google 主页搜索框.为此,您可以使用以下解决方案:

Solution

Possibly you are trying to look for the broken links once the search results are available for the keyword selenium on Google Home Page Search Box. To achieve that you can use the following solution:

  • 代码块:

  • Code Block:

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys 

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:UtilityBrowserDriverschromedriver.exe')
driver.get('https://google.co.in/')
search = driver.find_element_by_name('q')
search.send_keys("selenium")
search.send_keys(Keys.RETURN)
links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
print("Number of links : %s" %len(links))
for link in links:
    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

  • 控制台输出:

  • Console Output:

    Number of links : 9
    https://www.seleniumhq.org/ 200
    https://www.seleniumhq.org/download/ 200
    https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
    https://www.guru99.com/selenium-tutorial.html 200
    https://en.wikipedia.org/wiki/Selenium_(software) 200
    https://github.com/SeleniumHQ 200
    https://www.edureka.co/blog/what-is-selenium/ 200
    https://seleniumhq.github.io/selenium/docs/api/py/ 200
    https://seleniumhq.github.io/docs/ 200
    

  • 根据您的反问,从 Selenium 的角度,要规范地回答为什么 xpath 有效,而 tagName 无效的原因会有点困难.也许您可能想更深入地研究这些讨论:

    As per your counter question, it would be a bit tough to canonically answer why xpath worked but not tagName from Selenium perspective. Perhaps you may like to dig deeper into these discussions for the same:

    这篇关于requests.exceptions.MissingSchema:无效的 URL 'None':尝试通过 Selenium 和 Python 查找断开的链接时未提供架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆