在 Scrapy 中禁用 SSL 证书验证 [英] Disable SSL certificate verification in Scrapy

查看:320
本文介绍了在 Scrapy 中禁用 SSL 证书验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在努力解决我在 Scrapy 中遇到的问题.每当我使用 Scrapy 抓取证书的 CN 值与服务器的域名匹配的 HTTPS 站点时,Scrapy 的效果非常好!但是,另一方面,每当我尝试抓取证书的 CN 值与服务器的域名不匹配的站点时,我都会得到以下信息:

I am currently struggling with an issue I am having with Scrapy. Whenever I used Scrapy to scrape an HTTPS site where the certificate's CN value matches the server's domain name, Scrapy works great! On the other hand, though, whenever I try scraping a site where the certificate's CN value does NOT match the server's domain name, I get the following:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._write(bytes)
  File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 554, in _write
    sent = self._tlsConnection.send(toSend)
  File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 926, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
    return wrapped(connection, where, ret)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1154, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/usr/local/lib/python2.7/dist-packages/service_identity/pyopenssl.py", line 30, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/usr/local/lib/python2.7/dist-packages/service_identity/_common.py", line 235, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

我已经尽可能多地浏览了文档,据我所知,Scrapy 没有办法禁用 SSL 证书验证.甚至 Scrapy Request 对象的文档(我认为这是此功能所在的位置)也没有参考:

I have looked through as much documentation as I can, and as far as I can tell Scrapy does not have a way to disable SSL certificate verification. Even the documentation for the Scrapy Request object (which I would assume is where this functionality would lie) has no reference:

http://doc.scrapy.org/en/1.0/topics/request-response.html#scrapy.http.Requesthttps://github.com/scrapy/scrapy/blob/master/scrapy/http/request/init.py

也没有解决此问题的 Scrapy 设置:

There are also no Scrapy settings which address the issue:

http://doc.scrapy.org/en/1.0/topics/设置.html

除了按源使用 Scrapy 并根据需要修改源之外,有人对如何禁用 SSL 证书验证有任何想法吗?

Short of using Scrapy by source and modifying the source as needed, does anyone have any ideas for how I can disable the SSL certificate verification?

谢谢!

推荐答案

来自您为 设置,看来您可以修改DOWNLOAD_HANDLERS 设置.

From the documentation you linked for the settings, it looks like you would be able to modify the DOWNLOAD_HANDLERS setting.

来自文档:

"""
    A dict containing the request download handlers enabled by default in
    Scrapy. You should never modify this setting in your project, modify
    DOWNLOAD_HANDLERS instead.
"""

DOWNLOAD_HANDLERS_BASE = {
    'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
    'http': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
    'https': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
    's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler',
}

然后在您的设置中,如下所示:

Then in your settings, something like this:

""" 
    Configure your download handlers with something custom to override
    the default https handler
"""
DOWNLOAD_HANDLERS = {
    'https': 'my.custom.downloader.handler.https.HttpsDownloaderIgnoreCNError',
}

因此,通过为 https 协议定义自定义处理程序,您应该能够处理您遇到的错误并允许 scrapy 继续其业务.

So by defining a custom handler for the https protocol, you should be able to handle the error you're getting and allow scrapy to continue with its' business.

这篇关于在 Scrapy 中禁用 SSL 证书验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆