Scrapy 不响应 CloseSpider 异常 [英] Scrapy not responding to CloseSpider exception
问题描述
我已经实现了一个依赖 Scrapy 同时运行多个蜘蛛的解决方案.根据我在这里阅读的内容(http://doc.scrapy.org/en/latest/topics/exceptions.html),为了优雅地向蜘蛛发出该死的信号,我应该按如下方式引发 CloseSpider 异常:
I've implemented a solution that relies on Scrapy to run multiple spiders simultaneously. Based on what I've read here (http://doc.scrapy.org/en/latest/topics/exceptions.html), in order to gracefully signal a spider that it's time to die, I should raise a CloseSpider exception as follows:
from scrapy.exceptions import CloseSpider
class SomeSpider(CrawlSpider):
def parse_items(self, response):
if self.to_be_killed:
raise CloseSpider(reason="Received kill signal")
然而,虽然代码在遇到异常时似乎确实会引发异常,但蜘蛛仍在处理请求很长时间.我需要它立即停止它正在做的事情.
However, while the code does seem to raise the exception when it hits the exception, requests are still being processed by the spider for a long time. I need it to immediately stop what it's doing.
我意识到 Scrapy 是围绕异步框架构建的,但是有什么方法可以强制蜘蛛关闭而不产生任何额外的出站请求?
I realize that Scrapy is built around an asynchronous framework, but is there any way that I can force the spider to shutdown without generating any additional outbound requests?
推荐答案
所以我最终使用了一个 hacky 解决方案来绕过这个问题.实际上,我并没有以与 Twisted 框架不兼容的方式立即终止蜘蛛,而是编写了 DownloaderMiddleware,它拒绝来自我要求关闭的蜘蛛的任何请求.
So I ended up using a hacky solution to bypass the problem. Instead of actually immediately terminating the spider in a way that doesn't play well with the Twisted framework, I wrote DownloaderMiddleware that refuses any request that comes up from a spider that I had requested closed.
所以:
from scrapy import log
from scrapy.exceptions import IgnoreRequest
class SpiderStatusMiddleware:
def process_request(self, request, spider):
if spider.to_be_killed or not spider.active:
log.msg("Spider has been killed, ignoring request to %s" % request.url, log.DEBUG, spider=spider)
raise IgnoreRequest()
return None
注意:to_be_killed 和 active 都是我在蜘蛛类中定义的标志,由我自己的代码管理.
NOTE: to_be_killed and active are both flags that I had defined in my spider class and are managed by my own code.
这篇关于Scrapy 不响应 CloseSpider 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!