“NoneType"对象在scrapy wistedopenssl 中没有属性“_app_data" [英] 'NoneType' object has no attribute '_app_data' in scrapy wistedopenssl

查看:24
本文介绍了“NoneType"对象在scrapy wistedopenssl 中没有属性“_app_data"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用scrapy的抓取过程中,我的日志中不时出现一个错误.它似乎没有出现在我的代码中的任何地方,看起来像是在 Twistedopenssl 中的某个东西.任何想法是什么导致了这种情况以及如何摆脱它?

During the scraping process using scrapy one error appears in my logs from time to time. It doesnt seem to be anywhere in my code, and looks like it something inside twistedopenssl. Any ideas what caused this and how to get rid of it?

此处的堆栈跟踪:

[Launcher,27487/stderr] Error during info_callback
    Traceback (most recent call last):
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
        self._write(bytes)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 554, in _write
        sent = self._tlsConnection.send(toSend)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
        result = _lib.SSL_write(self._ssl, buf, len(buf))
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 926, in wrapper
        callback(Connection._reverse_mapping[ssl], where, return_code)
    --- <exception caught here> ---
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
        return wrapped(connection, where, ret)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1157, in _identityVerifyingInfoCallback
        transport = connection.get_app_data()
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1589, in get_app_data
        return self._app_data
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1148, in __getattr__
        return getattr(self._socket, name)
    exceptions.AttributeError: 'NoneType' object has no attribute '_app_data'

推荐答案

乍一看,这似乎是由于 scrapy 中的错误.Scrapy 定义了自己的 Twisted 上下文工厂":https://github.com/scrapy/scrapy/blob/ad36de4e6278cf635509a1ade30cca9a506da682/scrapy/core/downloader/contextfactory.py#L21-L28

At first glance, it appears as though this is due to a bug in scrapy. Scrapy defines its own Twisted "context factory": https://github.com/scrapy/scrapy/blob/ad36de4e6278cf635509a1ade30cca9a506da682/scrapy/core/downloader/contextfactory.py#L21-L28

此代码使用它打算返回的上下文实例化 ClientTLSOptions.实例化此类的副作用是在上下文工厂中安装了信息回调".信息回调要求在连接上将 Twisted TLS 实现设置为应用程序数据".但是,由于没有任何东西使用 ClientTLSOptions 实例(它会立即被丢弃),因此永远不会设置应用数据.

This code instantiates ClientTLSOptions with the context it intends to return. A side-effect of instantiating this class is that an "info callback" is installed on the context factory. The info callback requires that the Twisted TLS implementation has been set as "app data" on the connection. However, since nothing ever uses the ClientTLSOptions instance (it is discarded immediately), the app data is never set.

当信息回调返回以获取 Twisted TLS 实现(执行其部分工作所必需的)时,它会发现没有应用数据并失败并出现您报告的异常.

When the info callback comes back around to get the Twisted TLS implementation (necessary to do part of its job) it instead finds there is no app data and fails with the exception you've reported.

ClientTLSOptions 的副作用有点令人不快,但我认为这显然是由误用/滥用 ClientTLSOptions 引起的一个爬虫错误.我认为这段代码不可能经过很好的测试,因为每次证书无法验证时都会发生此错误.

The side-effect of ClientTLSOptions is a little bit unpleasant but I think this is clearly a scrapy bug caused by mis-use/abuse of ClientTLSOptions. I don't think this code could ever have been very well tested since this error will happen every single time a certificate fails to verify.

我建议将错误报告给 Scrapy.希望他们可以修复对 ClientTLSOptions 的使用,并为您消除此错误.

I suggest reporting the bug to Scrapy. Hopefully they can fix their use of ClientTLSOptions and eliminate this error for you.

这篇关于“NoneType"对象在scrapy wistedopenssl 中没有属性“_app_data"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆