爬虫对象与蜘蛛对象和管道对象是什么关系? [英] What is the relationship between the crawler object with spider and pipeline objects?
问题描述
我正在使用scrapy.我有一个以以下内容开头的管道:
I'm working with scrapy. I have a pipieline that starts with:
class DynamicSQLlitePipeline(object):
@classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "table" parameter
table = getattr(crawler.spider, "table")
return cls(table)
def __init__(self,table):
try:
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
db = dataset.connect(db_path)
table_name = table[0:3] # FIRST 3 LETTERS
self.my_table = db[table_name]
我一直在阅读https://doc.scrapy.org/en/latest/topics/api.html#crawler-api ,其中包含:
I've been reading through https://doc.scrapy.org/en/latest/topics/api.html#crawler-api , which contains:
Scrapy API 的主要入口点是 Crawler 对象,通过 from_crawler 类方法传递给扩展.该对象提供对所有 Scrapy 核心组件的访问,它是扩展访问它们并将其功能挂钩到 Scrapy 的唯一途径.
The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.
但是还是不理解from_crawler方法,以及爬虫对象.爬虫对象与蜘蛛对象和管道对象是什么关系?如何以及何时实例化爬虫?蜘蛛是爬虫的子类吗?我问过传递scrapy实例(不是类) 归因于管道,但我不明白这些部分是如何组合在一起的.
but still do not understand the from_crawler method, and the crawler object. What is the relationship between the crawler object with spider and pipeline objects? How and when is a crawler instantiated? Is a spider a subclass of crawler? I've asked Passing scrapy instance (not class) attribute to pipeline, but I don't understand how the pieces fit together.
推荐答案
Crawler
实际上是 Scrapy 架构中最重要的对象之一.它是爬行执行逻辑的核心部分,将许多其他部分粘合"在一起:
Scrapy API 的主要入口点是 Crawler
对象,传递给通过 from_crawler 类方法扩展.该对象提供访问所有 Scrapy 核心组件,这是唯一的途径访问它们并将其功能挂钩到 Scrapy 的扩展.
The main entry point to Scrapy API is the
Crawler
object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.
一个或多个爬虫由 CrawlerRunner
或 CrawlerProcess
实例控制.
A crawler or multiple crawlers are controlled by the CrawlerRunner
or the CrawlerProcess
instance.
现在,在许多 Scrapy 组件上可用的 from_crawler
方法只是这些组件访问运行此特定组件的 crawler
实例的一种方式.
Now that from_crawler
method which is available on lots of Scrapy components is just a way for these components to get access to the crawler
instance that is running this particular component.
另外,查看爬虫
、CrawlerRunner
和 CrawlerProcess
实际实现.
Also, look at the Crawler
, CrawlerRunner
and CrawlerProcess
actual implementations.
而且,为了更好地了解 Scrapy 在内部的工作原理,我个人认为从脚本运行蜘蛛很有帮助 - 查看这些详细的分步说明.
And, what I personally found helpful in order to better understand how Scrapy works internally was to run a spider from a script - check out these detailed step-by-step instructions.
这篇关于爬虫对象与蜘蛛对象和管道对象是什么关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!