在scrapy的管道中注入参数 [英] Injecting arguments in scrapy's pipeline

查看:32
本文介绍了在scrapy的管道中注入参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个自定义管道,其中包含一些需要在构造函数中注入的参数,例如:

I have a custom pipeline with some arguments that i need to inject in the constructor, like:

class MyPipeline(object):
    def __init__(self, some_argument):
        self.some_argument = some_argument
...

我开始爬行过程的脚本(我们称之为 run_crawler.py):

The script (let's call it run_crawler.py) from where I start the crawling process it's:

process = CrawlerProcess(get_project_settings())

process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
process.start()

在 settings.py 中:

And in settings.py:

ITEM_PIPELINES = {
    'crawler.pipelines.SomePipeline': 100,
    'crawler.pipelines.MyPipeline': 300
}

我想这是一个愚蠢的问题,但我一直无法在 docs 如何使用自定义参数实例化 MyPipeline.有人能指出我正确的方向吗?

I guess this is a silly question but i've been unable to find in the docs how to instantiate MyPipeline with custom arguments. Could someone plz point me in the right direction?

特别是,我不知道应该(或者我是否应该)修改 run_crawler.py 来实例化 MyPipeline 的自定义参数,我猜它应该是这样的:

In particular, i don't know how should (or if i should at all) modify run_crawler.py to instantiate the custom argument for MyPipeline, i'm guessing it should be something like:

process = CrawlerProcess(get_project_settings())

process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
some_argument = ... # instantiate my custom argument
# this is made up, it's what i've been unable to find how to do properly
my_pipeline = MyPipeline(some_argument)
process.pipelines.append(my_pipeline, ...)

process.start()

推荐答案

你可以使用scrapy from_crawler 方法.scrapy 文档有一个很好的描述示例:

You can use the scrapy from_crawler method. The scrapy docs have a good description and example:

class MongoPipeline(object):

    collection_name = 'scrapy_items'

    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )

"如果存在,则调用此类方法从爬虫创建管道实例.它必须返回管道的新实例."

通过这种方式,您可以根据爬虫或蜘蛛设置创建管道的新实例.

This way you can create new instance of the pipeline depend of the crawler or spider settings.

这篇关于在scrapy的管道中注入参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆