“SSL:certificate_verify_failed"抓取 https://www.thenewboston.com/时出错 [英] "SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/

查看:55
本文介绍了“SSL:certificate_verify_failed"抓取 https://www.thenewboston.com/时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我最近开始使用 youtube 上的The New Boston's"视频学习 Python,一切都很顺利,直到我学习了他制作简单网络爬虫的教程.虽然我理解它没有问题,但当我运行代码时,我得到的错误似乎都是基于SSL:CERTIFICATE_VERIFY_FAILED".自从昨晚试图找出解决方法以来,我一直在寻找答案,似乎视频或他网站上的评论中没有其他人与我有同样的问题,甚至使用他的其他人的代码网站我得到相同的结果.我将发布我从网站上获得的代码,因为它给了我同样的错误,而我现在编码的代码一团糟.

导入请求从 bs4 导入 BeautifulSoupdef trade_spider(max_pages):页 = 1而页面 <= max_pages:url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=" + str(page) #这是热门帖子的页面source_code = requests.get(url)# 只获取代码,没有标题或任何东西纯文本 = source_code.text# BeautifulSoup 对象可以通过简单的方式排序for link in soup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, 其中包含 "" class='index_singleListingTitles' "" .href = "https://www.thenewboston.com/" + link.get('href')title = link.string # 只是文本,而不是 HTML打印(参考)打印(标题)# get_single_item_data(href)页 += 1贸易蜘蛛(1)

完整的错误是:ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

如果这是一个愚蠢的问题,我很抱歉,我仍然是编程新手,但我真的无法弄清楚这一点,我正在考虑跳过本教程但它困扰着我无法解决这个问题,谢谢!

解决方案

问题不在于您的代码,而在于您尝试访问的网站.查看 SSLLabs 的分析 时,您会注意到:

<块引用>

此服务器的证书链不完整.等级上限为 B.

这意味着服务器配置错误,不仅python而且其他几个站点都会出现问题.一些桌面浏览器通过尝试从 Internet 加载丢失的证书或填充缓存的证书来解决此配置问题.但是其他浏览器或应用程序也会失败,类似于python.

要解决损坏的服务器配置,您可以明确提取丢失的证书并将它们添加到您的信任存储中.或者您可以在 verify 参数中将证书作为信任.来自文档:

<块引用>

您可以通过验证路径到 CA_BUNDLE 文件或目录可信 CA 的证书:

<预><代码>>>>requests.get('https://github.com', verify='/path/to/certfile')

这个可信 CA 列表也可以通过REQUESTS_CA_BUNDLE 环境变量.

So I started learning Python recently using "The New Boston's" videos on youtube, everything was going great until I got to his tutorial of making a simple web crawler. While I understood it with no problem, when I run the code I get errors all seemingly based around "SSL: CERTIFICATE_VERIFY_FAILED." I've been searching for an answer since last night trying to figure out how to fix it, it seems no one else in the comments on the video or on his website are having the same problem as me and even using someone elses code from his website I get the same results. I'll post the code from the one I got from the website as it's giving me the same error and the one I coded is a mess right now.

import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=" + str(page) #this is page of popular posts
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        for link in soup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it.
            href = "https://www.thenewboston.com/" + link.get('href')
            title = link.string # just the text, not the HTML
            print(href)
            print(title)
            # get_single_item_data(href)
    page += 1
trade_spider(1)

The full error is: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

I apologize if this is a dumb question, I'm still new to programming but I seriously can't figure this out, I was thinking about just skipping this tutorial but it's bothering me not being able to fix this, thanks!

解决方案

The problem is not in your code but in the web site you are trying to access. When looking at the analysis by SSLLabs you will note:

This server's certificate chain is incomplete. Grade capped to B.

This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some desktop browsers work around this configuration problem by trying to load the missing certificates from the internet or fill in with cached certificates. But other browsers or applications will fail too, similar to python.

To work around the broken server configuration you might explicitly extract the missing certificates and add them to you trust store. Or you might give the certificate as trust inside the verify argument. From the documentation:

You can pass verify the path to a CA_BUNDLE file or directory with certificates of trusted CAs:

>>> requests.get('https://github.com', verify='/path/to/certfile') 

This list of trusted CAs can also be specified through the REQUESTS_CA_BUNDLE environment variable.

这篇关于“SSL:certificate_verify_failed"抓取 https://www.thenewboston.com/时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆