带有scrapy的while循环中的ReactorNotRestartable错误 [英] ReactorNotRestartable error in while loop with scrapy

查看:87
本文介绍了带有scrapy的while循环中的ReactorNotRestartable错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在执行以下代码时收到 twisted.internet.error.ReactorNotRestartable 错误:

I get twisted.internet.error.ReactorNotRestartable error when I execute following code:

from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher

result = None

def set_result(item):
    result = item

while True:
    process = CrawlerProcess(get_project_settings())
    dispatcher.connect(set_result, signals.item_scraped)

    process.crawl('my_spider')
    process.start()

    if result:
        break
    sleep(3)

它第一次工作,然后我得到错误.我每次都创建 process 变量,有什么问题?

For the first time it works, then I get error. I create process variable each time, so what's the problem?

推荐答案

默认情况下,CrawlerProcess.start() 将在所有爬虫完成后停止它创建的 Twisted 反应器.

By default, CrawlerProcess's .start() will stop the Twisted reactor it creates when all crawlers have finished.

如果您在每次迭代中创建 process,您应该调用 process.start(stop_after_crawl=False).

You should call process.start(stop_after_crawl=False) if you create process in each iteration.

另一种选择是自己处理 Twisted reactor 并使用 CrawlerRunner.文档有一个例子 这样做.

Another option is to handle the Twisted reactor yourself and use CrawlerRunner. The docs have an example on doing that.

这篇关于带有scrapy的while循环中的ReactorNotRestartable错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆