尝试刮网页时出现SSL错误 [英] python - getting SSL error when trying to scrape a webpage

查看:111
本文介绍了尝试刮网页时出现SSL错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python抓取此网页:
https: //fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php

I'm trying to scrape this webpage using Python: https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php

我一直在使用请求包。我可以通过设置 verify = False 来解决该问题,但是我知道这并不安全。在其他线程中,人们说将 requests.get()函数指向相关证书的文件路径。我从浏览器中导出了证书,然后尝试了一下,但是没有运气。这

I've been using the requests package. I can "solve" the issue by setting verify=False, however I've read that that's not secure. In other threads, people said to point the requests.get() function to the filepath of the relevant certificate. I exported the certificate from my browser, and then tried that, but with no luck. This

requests.get('https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php',verify='C:/Users/ericb/Desktop/fftoolboxscoutfantasysportscom.crt')

仍然给出SSL错误

SSLError: HTTPSConnectionPool(host='fftoolbox.scoutfantasysports.com', port=443): Max retries exceeded with url: /football/rankings/PrintVersion.php (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))

requests.get('https://fftoolbox.scoutfantasysports.com/football/rankings/PrintVersion.php',cert='C:/Users/ericb/Desktop/fftoolboxscoutfantasysportscom.crt')

收益

Error: [('PEM routines', 'PEM_read_bio', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')]

我以前做过很多的网络爬虫,但是直到现在我再也不需要处理证书。我该如何解决?我还要注意,我想将最终的Python脚本及其使用的所有文件放到公共GitHub存储库中。但是我不想做任何会危及我的安全性的事情,例如上传密钥或其他东西。

I've done a decent amount of webscraping before, but I've never had to deal with certificates until now. How can I get around this? I should also note that I'd like to put my final Python script and any files it uses onto a public GitHub repo. But I don't want do do anything that would jeopardize my security, like uploading keys or something.

推荐答案

服务器配置错误,它不会发送需要发送的中间证书。
请参阅以下报告: https: //www.ssllabs.com/ssltest/analyze.html?d=fftoolbox.scoutfantasysports.com&hideResults=on

The server is misconfigured, it does not send the intermediate certificate it needs to send. See this report: https://www.ssllabs.com/ssltest/analyze.html?d=fftoolbox.scoutfantasysports.com&hideResults=on


提供的证书1 (1776字节)

Certificates provided 1 (1776 bytes)

链问题不完整

https://sslanalyzer.comodoca.com/?url=fftoolbox.scoutfantasysports.com

Microsoft信任吗?否(无法获得本地发行者证书)是否已取消

Trusted by Microsoft? No (unable to get local issuer certificate) UNTRUSTED

Mozilla信任吗?否(无法获取本地发行者证书)否

Trusted by Mozilla? No (unable to get local issuer certificate) UNTRUSTED

使用 openssl s_client -connect fftoolbox.scoutfantasysports.com:443 -showcerts 您可以看到:

Certificate chain
 0 s:/OU=Domain Control Validated/CN=fftoolbox.scoutfantasysports.com
   i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2

并且应该将网络服务器配置为发送 / C = US / ST =亚利桑那州/ L =斯科茨代尔/O=GoDaddy.com,Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy安全证书颁发机构-G2 中间证书,但

And the webserver should be configured to send the /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2 intermediary certificate but it does not.

因此,您可以联系该网站,并告知他们配置错误。如第二个链接所示,您将不是唯一受此影响的人。

So, you could contact the website and tells them they are misconfigured. You will not be the only one impacted by that, as the second link shows.

或者,您可以在本地将丢失的证书添加为完全信任,但这会降低安全性。您还可以在本地下载缺少的证书(不是网站的证书,不是中介的证书),并在 verify = / path / to / certificate > requests.get 呼叫。

Alternatively, you could add the missing certificate locally as fully trusted, but this kind of lowers your security. You can also download the missing certificate (not the one of the website, the intermediary one) locally and add verify=/path/to/certificate in your requests.get call.

这篇关于尝试刮网页时出现SSL错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆