如何抓取https页面? [英] How do I scrape an https page?

查看:104
本文介绍了如何抓取https页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有'lxml'和'requests'的python脚本来抓取网页.我的目标是从页面中获取一个元素并下载它,但是内容在HTTPS页面上,尝试访问页面中的内容时出现错误.我确定必须包含某种证书或身份验证,但是我一直在努力寻找合适的资源.我正在使用:

I'm using a python script with 'lxml' and 'requests' to scrape a web page. My goal is to grab an element from a page and download it, but the content is on an HTTPS page and I'm getting an error when trying to access the stuff in the page. I'm sure there is some kind of certificate or authentication I have to include, but I'm struggling to find the right resources. I'm using:

page = requests.get("https://[example-page.com]", auth=('[username]','[password]'))

,错误是:

requests.exceptions.SSLError: [Errno 185090050] _ssl.c:340: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib

推荐答案

verify=False添加到GET请求即可解决此问题.

Adding verify=False to the GET request solves the issue.

page = requests.get("https://[example-page.com]", auth=('[username]','[password]'), verify=False)

这篇关于如何抓取https页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆