没有项目的scrapy代理中间件 [英] scrapy proxy middleware without project
问题描述
我正在使用scrapy 的runspider 方法来运行我在没有项目的情况下设置和定义的蜘蛛.我正在设置我的自定义设置和下载器中间件来定义 http 代理中间件,如下所示:
I am using scrapy's runspider method to run a spider that I've setup and defined without a project. I am setting up my custom settings and Downloader Middlewares to define an http proxy middleware as follows:
custom_settings = { 'DOWNLOADER_MIDDLEWARES': { 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750 } }
然后在我的请求中使用 request.meta['proxy'] = "proxy-ip:proxy-port"
让步请求
and then calling it in my request with request.meta['proxy'] = "proxy-ip:proxy-port"
yield request
但蜘蛛没有运行并说:文件/usr/lib/python2.7/dist-packages/twisted/internet/abstract.py", line 522, in isIPv6Address if '%' in addr: TypeError: argument of type 'NoneType' is not iterable
but the spider does not run and says:
File "/usr/lib/python2.7/dist-packages/twisted/internet/abstract.py", line 522, in isIPv6Address if '%' in addr: TypeError: argument of type 'NoneType' is not iterable
我做错了什么?
推荐答案
经过大量的挖掘(Scrapy 没有太多日志记录,恐怕),我发现这个问题可能是没有指定方案导致的在代理地址中;即,Scrapy 期望代理作为 URI 传递,因此在您的情况下,而不是:
After a lot of digging (not much logging going on in Scrapy, I'm afraid), I found that this problem can be caused by not specifying the scheme in the proxy address; i.e., Scrapy expects the proxy to be passed as a URI, so in your case, instead of:
request.meta['proxy'] = "proxy-ip:proxy-port" # doesn't work
你想要这个:
request.meta['proxy'] = "http://proxy-ip:proxy-port" # does work
(据我所知,http
只是被忽略了,但没有它,urlparse
无法解析其余部分).
(As far as I can make out, the http
is just ignored, but without it the rest can't be parsed by urlparse
).
这篇关于没有项目的scrapy代理中间件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!