“NoneType"对象在scrapy wistedopenssl 中没有属性“_app_data" [英] 'NoneType' object has no attribute '_app_data' in scrapy wistedopenssl
问题描述
在使用scrapy的抓取过程中,我的日志中不时出现一个错误.它似乎没有出现在我的代码中的任何地方,看起来像是在 Twistedopenssl 中的某个东西.任何想法是什么导致了这种情况以及如何摆脱它?
During the scraping process using scrapy one error appears in my logs from time to time. It doesnt seem to be anywhere in my code, and looks like it something inside twistedopenssl. Any ideas what caused this and how to get rid of it?
此处的堆栈跟踪:
[Launcher,27487/stderr] Error during info_callback
Traceback (most recent call last):
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
self._write(bytes)
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 554, in _write
sent = self._tlsConnection.send(toSend)
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
result = _lib.SSL_write(self._ssl, buf, len(buf))
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 926, in wrapper
callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
return wrapped(connection, where, ret)
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1157, in _identityVerifyingInfoCallback
transport = connection.get_app_data()
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1589, in get_app_data
return self._app_data
File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1148, in __getattr__
return getattr(self._socket, name)
exceptions.AttributeError: 'NoneType' object has no attribute '_app_data'
推荐答案
乍一看,这似乎是由于 scrapy 中的错误.Scrapy 定义了自己的 Twisted 上下文工厂":https://github.com/scrapy/scrapy/blob/ad36de4e6278cf635509a1ade30cca9a506da682/scrapy/core/downloader/contextfactory.py#L21-L28
At first glance, it appears as though this is due to a bug in scrapy. Scrapy defines its own Twisted "context factory": https://github.com/scrapy/scrapy/blob/ad36de4e6278cf635509a1ade30cca9a506da682/scrapy/core/downloader/contextfactory.py#L21-L28
此代码使用它打算返回的上下文实例化 ClientTLSOptions
.实例化此类的副作用是在上下文工厂中安装了信息回调".信息回调要求在连接上将 Twisted TLS 实现设置为应用程序数据".但是,由于没有任何东西使用 ClientTLSOptions
实例(它会立即被丢弃),因此永远不会设置应用数据.
This code instantiates ClientTLSOptions
with the context it intends to return. A side-effect of instantiating this class is that an "info callback" is installed on the context factory. The info callback requires that the Twisted TLS implementation has been set as "app data" on the connection. However, since nothing ever uses the ClientTLSOptions
instance (it is discarded immediately), the app data is never set.
当信息回调返回以获取 Twisted TLS 实现(执行其部分工作所必需的)时,它会发现没有应用数据并失败并出现您报告的异常.
When the info callback comes back around to get the Twisted TLS implementation (necessary to do part of its job) it instead finds there is no app data and fails with the exception you've reported.
ClientTLSOptions
的副作用有点令人不快,但我认为这显然是由误用/滥用 ClientTLSOptions
引起的一个爬虫错误.我认为这段代码不可能经过很好的测试,因为每次证书无法验证时都会发生此错误.
The side-effect of ClientTLSOptions
is a little bit unpleasant but I think this is clearly a scrapy bug caused by mis-use/abuse of ClientTLSOptions
. I don't think this code could ever have been very well tested since this error will happen every single time a certificate fails to verify.
我建议将错误报告给 Scrapy.希望他们可以修复对 ClientTLSOptions
的使用,并为您消除此错误.
I suggest reporting the bug to Scrapy. Hopefully they can fix their use of ClientTLSOptions
and eliminate this error for you.
这篇关于“NoneType"对象在scrapy wistedopenssl 中没有属性“_app_data"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!