Scrapy 不响应 CloseSpider 异常 [英] Scrapy not responding to CloseSpider exception

查看:48
本文介绍了Scrapy 不响应 CloseSpider 异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一个依赖 Scrapy 同时运行多个蜘蛛的解决方案.根据我在这里阅读的内容(http://doc.scrapy.org/en/latest/topics/exceptions.html),为了优雅地向蜘蛛发出该死的信号,我应该按如下方式引发 CloseSpider 异常:

I've implemented a solution that relies on Scrapy to run multiple spiders simultaneously. Based on what I've read here (http://doc.scrapy.org/en/latest/topics/exceptions.html), in order to gracefully signal a spider that it's time to die, I should raise a CloseSpider exception as follows:

from scrapy.exceptions import CloseSpider

class SomeSpider(CrawlSpider):
  def parse_items(self, response):
     if self.to_be_killed:
        raise CloseSpider(reason="Received kill signal")

然而,虽然代码在遇到异常时似乎确实会引发异常,但蜘蛛仍在处理请求很长时间.我需要它立即停止它正在做的事情.

However, while the code does seem to raise the exception when it hits the exception, requests are still being processed by the spider for a long time. I need it to immediately stop what it's doing.

我意识到 Scrapy 是围绕异步框架构建的,但是有什么方法可以强制蜘蛛关闭而不产生任何额外的出站请求?

I realize that Scrapy is built around an asynchronous framework, but is there any way that I can force the spider to shutdown without generating any additional outbound requests?

推荐答案

所以我最终使用了一个 hacky 解决方案来绕过这个问题.实际上,我并没有以与 Twisted 框架不兼容的方式立即终止蜘蛛,而是编写了 DownloaderMiddleware,它拒绝来自我要求关闭的蜘蛛的任何请求.

So I ended up using a hacky solution to bypass the problem. Instead of actually immediately terminating the spider in a way that doesn't play well with the Twisted framework, I wrote DownloaderMiddleware that refuses any request that comes up from a spider that I had requested closed.

所以:

from scrapy import log
from scrapy.exceptions import IgnoreRequest

class SpiderStatusMiddleware:

    def process_request(self, request, spider):
        if spider.to_be_killed or not spider.active:
            log.msg("Spider has been killed, ignoring request to %s" % request.url, log.DEBUG, spider=spider)
            raise IgnoreRequest()

        return None

注意:to_be_killed 和 active 都是我在蜘蛛类中定义的标志,由我自己的代码管理.

NOTE: to_be_killed and active are both flags that I had defined in my spider class and are managed by my own code.

这篇关于Scrapy 不响应 CloseSpider 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆