scrapy:访问管道 __init__ 中的蜘蛛类变量 [英] scrapy: access spider class variable in pipeline __init__

查看:51
本文介绍了scrapy:访问管道 __init__ 中的蜘蛛类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道您可以在 process_item() 中访问蜘蛛变量,但是如何在管道 init 函数中访问蜘蛛变量?

I know you can access spider variables in process_item(), but how can I access spider variables in pipeline init function?

class SiteSpider(CrawlSpider):
   def __init__(self):
        self.id = 10

class MyPipeline(object):
     def __init__(self):
        ...

我还需要访问 MyPipeline 中的 CUSTOM_SETTINGS_VARIABLE.

I also need to access CUSTOM_SETTINGS_VARIABLE in MyPipeline.

推荐答案

您无法访问蜘蛛实例,因为引擎启动时管道初始化已完成.事实上,您必须认为您的管道处理多个蜘蛛,而不仅仅是一个蜘蛛.

You can't access the spider instance as the pipeline initialization is done when the engine starts. In fact, you have to think that your pipeline handles multiple spiders and not just one spider.

话虽如此,您可以在启动时钩住 spider_opened 信号来访问蜘蛛实例.

Having said that, you can hook the spider_opened signal to access the spider instance when it starts.

from scrapy import signals


class MyPipeline(object):

    def __init__(self, mysetting):
        # do stuff with the arguments...
        self.mysetting = mysetting

    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        instance = cls(settings['CUSTOM_SETTINGS_VARIABLE']
        crawler.signals.connect(instance.spider_opened, signal=signals.spider_opened)
        return instance

    def spider_opened(self, spider):
        # do stuff with the spider: initialize resources, etc.
        spider.log("[MyPipeline] Initializing resources for %s" % spider.name)

    def process_item(self, item, spider):
        return item

这篇关于scrapy:访问管道 __init__ 中的蜘蛛类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆