在scrapy的管道中注入参数 [英] Injecting arguments in scrapy's pipeline
问题描述
我有一个自定义管道,其中包含一些需要在构造函数中注入的参数,例如:
I have a custom pipeline with some arguments that i need to inject in the constructor, like:
class MyPipeline(object):
def __init__(self, some_argument):
self.some_argument = some_argument
...
我开始爬行过程的脚本(我们称之为 run_crawler.py):
The script (let's call it run_crawler.py) from where I start the crawling process it's:
process = CrawlerProcess(get_project_settings())
process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
process.start()
在 settings.py 中:
And in settings.py:
ITEM_PIPELINES = {
'crawler.pipelines.SomePipeline': 100,
'crawler.pipelines.MyPipeline': 300
}
我想这是一个愚蠢的问题,但我一直无法在 docs 如何使用自定义参数实例化 MyPipeline.有人能指出我正确的方向吗?
I guess this is a silly question but i've been unable to find in the docs how to instantiate MyPipeline with custom arguments. Could someone plz point me in the right direction?
特别是,我不知道应该(或者我是否应该)修改 run_crawler.py 来实例化 MyPipeline 的自定义参数,我猜它应该是这样的:
In particular, i don't know how should (or if i should at all) modify run_crawler.py to instantiate the custom argument for MyPipeline, i'm guessing it should be something like:
process = CrawlerProcess(get_project_settings())
process.crawl(SomeCrawler)
process.crawl(AnotherCrawler)
...
some_argument = ... # instantiate my custom argument
# this is made up, it's what i've been unable to find how to do properly
my_pipeline = MyPipeline(some_argument)
process.pipelines.append(my_pipeline, ...)
process.start()
推荐答案
你可以使用scrapy from_crawler
方法.scrapy 文档有一个很好的描述 和
You can use the scrapy from_crawler
method.
The scrapy docs have a good description and example:
class MongoPipeline(object):
collection_name = 'scrapy_items'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)
"如果存在,则调用此类方法从爬虫创建管道实例.它必须返回管道的新实例."
通过这种方式,您可以根据爬虫或蜘蛛设置创建管道的新实例.
This way you can create new instance of the pipeline depend of the crawler or spider settings.
这篇关于在scrapy的管道中注入参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!