如果满足条件,则终止 Scrapy [英] Terminate Scrapy if a condition is met

查看:98
本文介绍了如果满足条件,则终止 Scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中使用scrapy编写了一个scraper.它包含 100 个 start_url.

I have written a scraper using scrapy in python. It contains 100 start_urls.

我想在满足条件后终止抓取过程.即找到特定 div 的终止抓取.终止我的意思是它应该停止抓取所有网址.

I want to terminate the scraping process once a condition is met. ie terminate scraping of a particular div is found. By terminate I mean it should stop scraping all the urls .

有没有可能

推荐答案

您正在寻找的是 CloseSpider 异常.

What you're looking for is the CloseSpider exception.

在源文件顶部的某处添加以下行:

Add the following line somewhere at the top of your source file:

from scrapy.exceptions import CloseSpider

当您检测到满足终止条件时,只需执行类似的操作

And when you detect that your termination condition is met, simply do something like

        raise CloseSpider('termination condition met')

在您的回调方法中(而不是返回或产生 ItemRequest).

in your callback method (instead of returning or yielding an Item or Request).

请注意,仍在进行中的请求(已发送 HTTP 请求,尚未收到响应)仍将被解析.但不会处理任何新请求.

Note that requests that are still in progress (HTTP request sent, response not yet received) will still be parsed. No new request will be processed though.

这篇关于如果满足条件,则终止 Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆