强迫我的爬虫蜘蛛停止爬行 [英] Force my scrapy spider to stop crawling

查看：88 发布时间：2021/6/25 20:44:22 python scrapy

本文介绍了强迫我的爬虫蜘蛛停止爬行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有机会在特定条件为真时停止爬行(例如 scrap_item_id == predefine_value ).我的问题类似于 Scrapy - 如何识别已经抓取的网址但是我想在发现最后一个被抓取的项目后强制"我的爬虫蜘蛛停止爬行.

is there a chance to stop crawling when specific if condition is true (like scrap_item_id == predefine_value ). My problem is similar to Scrapy - how to identify already scraped urls but I want to 'force' my scrapy spider to stop crawling after discover the last scraped item.

推荐答案

在 GitHub 上提供的最新版 Scrapy 中，您可以引发 CloseSpider 异常以手动关闭蜘蛛.

In the latest version of Scrapy, available on GitHub, you can raise a CloseSpider exception to manually close a spider.

在 0.14 发行说明文档中提到:添加 CloseSpider 异常以手动关闭蜘蛛 (r2691)"

In the 0.14 release note doc is mentioned: "Added CloseSpider exception to manually close spiders (r2691)"

以文档为例:

def parse_page(self, response):
  if 'Bandwidth exceeded' in response.body:
    raise CloseSpider('bandwidth_exceeded')

另见:http://readthedocs.org/docs/scrapy/en/latest/topics/exceptions.html?highlight=closeSpider

这篇关于强迫我的爬虫蜘蛛停止爬行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

强迫我的爬虫蜘蛛停止爬行 [英] Force my scrapy spider to stop crawling

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

强迫我的爬虫蜘蛛停止爬行 [英] Force my scrapy spider to stop crawling

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭