如何使用requests_html忽略无效的SSL证书? [英] How to ignore an invalid SSL certificate with requests_html?

查看:80
本文介绍了如何使用requests_html忽略无效的SSL证书?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以基本上我试图从网站上抓取 javascript 生成的数据.为此,我使用 Python 库 requests_html.

So basically I'm trying to scrap the javascript generated data from a website. To do this, I'm using the Python library requests_html.

这是我的代码:

from requests_html import HTMLSession
session = HTMLSession()

url = 'https://myurl'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
payload = {'mylog': 'root', 'mypass': 'root'}

r = session.post(url, headers=headers, verify=False, data=payload)
r.html.render()
load = r.html.find('#load_span', first=True)

print (load.text)  

如果我不使用 render() 函数,我可以连接到网站并且我抓取的数据为空(这是正常的),但是当我使用它时,出现此错误:

If I don't use the render() function, I can connect to the website and my scraped data is null (which is normal) but when I use it, I have this error :

pyppeteer.errors.PageError: net::ERR_CERT_COMMON_NAME_INVALID at https://myurl

net::ERR_CERT_WEAK_SIGNATURE_ALGORITHM

我假设 session.post 的参数verify=False"被渲染忽略.我该怎么做?

I assume the parameter "verify=False" of session.post is ignored by the render. How do I do it ?

如果你想重现错误:

from requests_html import HTMLSession
import requests

session = HTMLSession()

url = 'https://wrong.host.badssl.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

r = session.post(url, headers=headers, verify=False)

r.html.render()

load = r.html.find('#content', first=True)

print (load)

推荐答案

唯一的方法是在 pyppeteer.问题是requests_html没有提供任何设置这个参数的方法,其实还有一个问题 关于它.我的建议是通过在此处添加另一条消息来再次 ping 开发人员.

The only way is to set the ignoreHTTPSErrors parameter in pyppeteer. The problem is that requests_html doesn't provide any way to set this parameter, in fact, there is an issue about it. My advice is to ping again the developers by adding another message here.

或者你可以拉取这个新功能.

Or maybe you can pull this new feature.

另一种方法是使用 Selenium.

Another way is to use Selenium.


我添加了 verify=False 作为拉取请求的功能(已接受).现在可以忽略 SSL 错误 :)


I added verify=False as a feature with a pull request (accepted). Now is possible to ignore the SSL error :)

它不是 Get() 的参数 当你设置它实例化对象:

It's not a parameter of the Get() set it when you instantiate the object:

session = HTMLSession(verify=False)

这篇关于如何使用requests_html忽略无效的SSL证书?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆