如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道 [英] How can I use different pipelines for different spiders in a single Scrapy project

查看:70
本文介绍了如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个蜘蛛的scrapy项目.有什么方法可以定义哪个管道用于哪个蜘蛛?并非我定义的所有管道都适用于每个蜘蛛.

I have a scrapy project which contains multiple spiders. Is there any way I can define which pipelines to use for which spider? Not all the pipelines i have defined are applicable for every spider.

谢谢

推荐答案

建立在 Pablo Hoffman 的解决方案,您可以在 Pipeline 对象的 process_item 方法上使用以下装饰器,以便它检查蜘蛛的 pipeline 属性是否应该执行.例如:

Building on the solution from Pablo Hoffman, you can use the following decorator on the process_item method of a Pipeline object so that it checks the pipeline attribute of your spider for whether or not it should be executed. For example:

def check_spider_pipeline(process_item_method):

    @functools.wraps(process_item_method)
    def wrapper(self, item, spider):

        # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)

        # if class is in the spider's pipeline, then use the
        # process_item method normally.
        if self.__class__ in spider.pipeline:
            spider.log(msg % 'executing', level=log.DEBUG)
            return process_item_method(self, item, spider)

        # otherwise, just return the untouched item (skip this step in
        # the pipeline)
        else:
            spider.log(msg % 'skipping', level=log.DEBUG)
            return item

    return wrapper

要使此装饰器正常工作,蜘蛛必须具有管道属性,其中包含要用于处理项目的管道对象的容器,例如:

For this decorator to work correctly, the spider must have a pipeline attribute with a container of the Pipeline objects that you want to use to process the item, for example:

class MySpider(BaseSpider):

    pipeline = set([
        pipelines.Save,
        pipelines.Validate,
    ])

    def parse(self, response):
        # insert scrapy goodness here
        return item

然后在 pipelines.py 文件中:

class Save(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do saving here
        return item

class Validate(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do validating here
        return item

所有 Pipeline 对象仍应在设置中的 ITEM_PIPELINES 中定义(以正确的顺序 - 最好更改一下,以便也可以在 Spider 上指定顺序).

All Pipeline objects should still be defined in ITEM_PIPELINES in settings (in the correct order -- would be nice to change so that the order could be specified on the Spider, too).

这篇关于如何在单个 Scrapy 项目中为不同的蜘蛛使用不同的管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆