强迫我的爬虫蜘蛛停止爬行 [英] Force my scrapy spider to stop crawling
问题描述
是否有机会在特定条件为真时停止爬行(例如 scrap_item_id == predefine_value ).我的问题类似于 Scrapy - 如何识别已经抓取的网址 但是我想在发现最后一个被抓取的项目后强制"我的爬虫蜘蛛停止爬行.
is there a chance to stop crawling when specific if condition is true (like scrap_item_id == predefine_value ). My problem is similar to Scrapy - how to identify already scraped urls but I want to 'force' my scrapy spider to stop crawling after discover the last scraped item.
推荐答案
在 GitHub 上提供的最新版 Scrapy 中,您可以引发 CloseSpider 异常以手动关闭蜘蛛.
In the latest version of Scrapy, available on GitHub, you can raise a CloseSpider exception to manually close a spider.
在 0.14 发行说明文档 中提到:添加 CloseSpider 异常以手动关闭蜘蛛 (r2691)"
In the 0.14 release note doc is mentioned: "Added CloseSpider exception to manually close spiders (r2691)"
以文档为例:
def parse_page(self, response):
if 'Bandwidth exceeded' in response.body:
raise CloseSpider('bandwidth_exceeded')
另见:http://readthedocs.org/docs/scrapy/en/latest/topics/exceptions.html?highlight=closeSpider
这篇关于强迫我的爬虫蜘蛛停止爬行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!