Scrapy Crawl Spider 只抓取一定数量的层 [英] Scrapy Crawl Spider Only Scrape Certain Number Of Layers

查看：40 发布时间：2021/7/16 22:15:53 python scrapy

本文介绍了Scrapy Crawl Spider 只抓取一定数量的层的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用 Scrapy CrawlSpider 类(此处的文档).

Hi I want to crawl all the pages of a web using Scrapy CrawlSpider class (Documentation here).

class MySpider(CrawlSpider):
    name = 'abc.com'
    allowed_domains = ['abc.com']
    start_urls = ['http://www.abc.com']

    rules = (
        Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item')
    )

    def parse_item(self, response):
        self.log('Hi, this is an item page! %s' % response.url)
        ...

(1) 所以，这个蜘蛛会从 start_urls 中定义的页面:www.abc.com 开始，它会自动进行解析......然后跟随 www.abc.com 中的每个链接哪个遵循规则?我想知道有没有办法让我只能刮一定数量的层..说只刮第一层(直接来自 www.abc.com 的链接)?

(1) So, this spider will start from page: www.abc.com which is defined in the start_urls, and it will automatically do the parsing... and then follow every single link in the www.abc.com which follows the rule right? I am wondering is there a way so I could only scrape a certain number of layers.. say only scrape the 1st layer (links directly derived from www.abc.com)?

(2) 因为我在 allowed_deomains 中定义了只有 abc.com 的网址会被抓取.所以我不需要在规则中重新定义它?并做这样的事情:

(2) Since i have defined in the allowed_deomains that only abc.com urls would be scraped. So I don't need to redefine that in the rules? and do something like this:

Rule(SgmlLinkExtractor(allow=('item\.php', )), allow_domains="www.abc.com", callback='parse_item')

(3) 如果我使用crawlspider，如果我不在spider类中定义规则会怎样?它会抓取所有页面吗?或者它甚至不会遵循任何一个规则，因为没有满足"规则?

(3) If I am using crawlspider, what will happen if I don't define rules in the spider class? it will crawl follow all the pages? or it would not even follow any single one because the rule has not been 'met'?

Scrapy Crawl Spider 只抓取一定数量的层 [英] Scrapy Crawl Spider Only Scrape Certain Number Of Layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy Crawl Spider 只抓取一定数量的层 [英] Scrapy Crawl Spider Only Scrape Certain Number Of Layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭