在 Scrapy 中爬行的顺序 [英] Order of crawling in Scrapy
本文介绍了在 Scrapy 中爬行的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在scrapy中编写了一个基本的CrawlSpider,但我想了解抓取网址的顺序是什么 - FIFO/LIFO?
I wrote a basic CrawlSpider in scrapy but I want to understand what is the order in which the urls are crawled - FIFO / LIFO?
我希望爬虫应该抓取起始 url 页面中的所有链接,然后移动到其他似乎不是顺序的 URL.
I want that the crawler should crawl all the links in the start url page and then move on to other URLs which does not seem to be the order.
我该怎么做?
推荐答案
By default, Scrapy uses a LIFO queue for storing pending requests, which basically means that it crawls in DFO order. This order is more convenient in most cases. If you do want to crawl in true BFO order, you can do it by setting the following settings:
DEPTH_PRIORITY = 1
SCHEDULER_DISK_QUEUE = 'scrapy.squeue.PickleFifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeue.FifoMemoryQueue'
这篇关于在 Scrapy 中爬行的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文