在 Scrapy 中禁用 SSL 证书验证 [英] Disable SSL certificate verification in Scrapy

查看：320 发布时间：2021/7/16 21:52:14 python ssl scrapy

本文介绍了在 Scrapy 中禁用 SSL 证书验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在努力解决我在 Scrapy 中遇到的问题.每当我使用 Scrapy 抓取证书的 CN 值与服务器的域名匹配的 HTTPS 站点时，Scrapy 的效果非常好！但是，另一方面，每当我尝试抓取证书的 CN 值与服务器的域名不匹配的站点时，我都会得到以下信息:

I am currently struggling with an issue I am having with Scrapy. Whenever I used Scrapy to scrape an HTTPS site where the certificate's CN value matches the server's domain name, Scrapy works great! On the other hand, though, whenever I try scraping a site where the certificate's CN value does NOT match the server's domain name, I get the following:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._write(bytes)
  File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 554, in _write
    sent = self._tlsConnection.send(toSend)
  File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 1270, in send
    result = _lib.SSL_write(self._ssl, buf, len(buf))
  File "/usr/local/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 926, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
    return wrapped(connection, where, ret)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1154, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/usr/local/lib/python2.7/dist-packages/service_identity/pyopenssl.py", line 30, in verify_hostname
    obligatory_ids=[DNS_ID(hostname)],
  File "/usr/local/lib/python2.7/dist-packages/service_identity/_common.py", line 235, in __init__
    raise ValueError("Invalid DNS-ID.")
exceptions.ValueError: Invalid DNS-ID.

我已经尽可能多地浏览了文档，据我所知，Scrapy 没有办法禁用 SSL 证书验证.甚至 Scrapy Request 对象的文档(我认为这是此功能所在的位置)也没有参考:

I have looked through as much documentation as I can, and as far as I can tell Scrapy does not have a way to disable SSL certificate verification. Even the documentation for the Scrapy Request object (which I would assume is where this functionality would lie) has no reference:

http://doc.scrapy.org/en/1.0/topics/request-response.html#scrapy.http.Request https://github.com/scrapy/scrapy/blob/master/scrapy/http/request/init.py

也没有解决此问题的 Scrapy 设置:

There are also no Scrapy settings which address the issue:

http://doc.scrapy.org/en/1.0/topics/设置.html

除了按源使用 Scrapy 并根据需要修改源之外，有人对如何禁用 SSL 证书验证有任何想法吗?

Short of using Scrapy by source and modifying the source as needed, does anyone have any ideas for how I can disable the SSL certificate verification?

谢谢！

推荐答案

来自您为设置，看来您可以修改DOWNLOAD_HANDLERS 设置.

From the documentation you linked for the settings, it looks like you would be able to modify the DOWNLOAD_HANDLERS setting.

来自文档:

"""
    A dict containing the request download handlers enabled by default in
    Scrapy. You should never modify this setting in your project, modify
    DOWNLOAD_HANDLERS instead.
"""

DOWNLOAD_HANDLERS_BASE = {
    'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
    'http': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
    'https': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
    's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler',
}

然后在您的设置中，如下所示:

Then in your settings, something like this:

""" 
    Configure your download handlers with something custom to override
    the default https handler
"""
DOWNLOAD_HANDLERS = {
    'https': 'my.custom.downloader.handler.https.HttpsDownloaderIgnoreCNError',
}

因此，通过为 https 协议定义自定义处理程序，您应该能够处理您遇到的错误并允许 scrapy 继续其业务.

So by defining a custom handler for the https protocol, you should be able to handle the error you're getting and allow scrapy to continue with its' business.

这篇关于在 Scrapy 中禁用 SSL 证书验证的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 Scrapy 中禁用 SSL 证书验证 [英] Disable SSL certificate verification in Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 Scrapy 中禁用 SSL 证书验证 [英] Disable SSL certificate verification in Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭