如何为不同的蜘蛛设置不同的scrapy-settings? [英] How to set different scrapy-settings for different spiders?

查看:70
本文介绍了如何为不同的蜘蛛设置不同的scrapy-settings?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为某些蜘蛛启用一些 http 代理,并为其他蜘蛛禁用它们.

I want to enable some http-proxy for some spiders, and disable them for other spiders.

我可以做这样的事情吗?

Can I do something like this?

# settings.py
proxy_spiders = ['a1' , b2']

if spider in proxy_spider: #how to get spider name ???
    HTTP_PROXY = 'http://127.0.0.1:8123'
    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'myproject.middlewares.ProxyMiddleware': 410,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }
else:
    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }

如果上面的代码不起作用,还有什么建议吗?

If the code above doesn't work, is there any other suggestion?

推荐答案

你可以定义自己的代理中间件,像这样简单:

You can define your own proxy middleware, something straightforward like this:

from scrapy.contrib.downloadermiddleware import HttpProxyMiddleware

class ConditionalProxyMiddleware(HttpProxyMiddleware):
    def process_request(self, request, spider):
        if getattr(spider, 'use_proxy', None):
            return super(ConditionalProxyMiddleware, self).process_request(request, spider)

然后在要启用代理的蜘蛛中定义属性 use_proxy = True.不要忘记禁用默认代理中间件并启用修改后的中间件.

Then define the attribute use_proxy = True in the spiders that you want to have the proxy enabled. Don't forget to disable the default proxy middleware and enable your modified one.

这篇关于如何为不同的蜘蛛设置不同的scrapy-settings?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆