爬虫对象与蜘蛛对象和管道对象是什么关系? [英] What is the relationship between the crawler object with spider and pipeline objects?

查看:59
本文介绍了爬虫对象与蜘蛛对象和管道对象是什么关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scrapy.我有一个以以下内容开头的管道:

I'm working with scrapy. I have a pipieline that starts with:

class DynamicSQLlitePipeline(object):

    @classmethod
    def from_crawler(cls, crawler):
        # Here, you get whatever value was passed through the "table" parameter
        table = getattr(crawler.spider, "table")
        return cls(table)

    def __init__(self,table):
        try:
            db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
            db = dataset.connect(db_path)
            table_name = table[0:3]  # FIRST 3 LETTERS
            self.my_table = db[table_name]

我一直在阅读https://doc.scrapy.org/en/latest/topics/api.html#crawler-api ,其中包含:

I've been reading through https://doc.scrapy.org/en/latest/topics/api.html#crawler-api , which contains:

Scrapy API 的主要入口点是 Crawler 对象,通过 from_crawler 类方法传递给扩展.该对象提供对所有 Scrapy 核心组件的访问,它是扩展访问它们并将其功能挂钩到 Scrapy 的唯一途径.

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

但是还是不理解from_crawler方法,以及爬虫对象.爬虫对象与蜘蛛对象和管道对象是什么关系?如何以及何时实例化爬虫?蜘蛛是爬虫的子类吗?我问过传递scrapy实例(不是类) 归因于管道,但我不明白这些部分是如何组合在一起的.

but still do not understand the from_crawler method, and the crawler object. What is the relationship between the crawler object with spider and pipeline objects? How and when is a crawler instantiated? Is a spider a subclass of crawler? I've asked Passing scrapy instance (not class) attribute to pipeline, but I don't understand how the pieces fit together.

推荐答案

Crawler 实际上是 Scrapy 架构中最重要的对象之一.它是爬行执行逻辑的核心部分,将许多其他部分粘合"在一起:

Scrapy API 的主要入口点是 Crawler 对象,传递给通过 from_crawler 类方法扩展.该对象提供访问所有 Scrapy 核心组件,这是唯一的途径访问它们并将其功能挂钩到 Scrapy 的扩展.

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

一个或多个爬虫由 CrawlerRunnerCrawlerProcess 实例控制.

A crawler or multiple crawlers are controlled by the CrawlerRunner or the CrawlerProcess instance.

现在,在许多 Scrapy 组件上可用的 from_crawler 方法只是这些组件访问运行此特定组件的 crawler 实例的一种方式.

Now that from_crawler method which is available on lots of Scrapy components is just a way for these components to get access to the crawler instance that is running this particular component.

另外,查看爬虫CrawlerRunnerCrawlerProcess 实际实现.

Also, look at the Crawler, CrawlerRunner and CrawlerProcess actual implementations.

而且,为了更好地了解 Scrapy 在内部的工作原理,我个人认为从脚本运行蜘蛛很有帮助 - 查看这些详细的分步说明.

And, what I personally found helpful in order to better understand how Scrapy works internally was to run a spider from a script - check out these detailed step-by-step instructions.

这篇关于爬虫对象与蜘蛛对象和管道对象是什么关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆