带有scrapy的while循环中的ReactorNotRestartable错误 [英] ReactorNotRestartable error in while loop with scrapy
问题描述
我在执行以下代码时收到 twisted.internet.error.ReactorNotRestartable
错误:
I get twisted.internet.error.ReactorNotRestartable
error when I execute following code:
from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher
result = None
def set_result(item):
result = item
while True:
process = CrawlerProcess(get_project_settings())
dispatcher.connect(set_result, signals.item_scraped)
process.crawl('my_spider')
process.start()
if result:
break
sleep(3)
它第一次工作,然后我得到错误.我每次都创建 process
变量,有什么问题?
For the first time it works, then I get error. I create process
variable each time, so what's the problem?
推荐答案
默认情况下,CrawlerProcess
的 .start()
将在所有爬虫完成后停止它创建的 Twisted 反应器.
By default, CrawlerProcess
's .start()
will stop the Twisted reactor it creates when all crawlers have finished.
如果您在每次迭代中创建 process
,您应该调用 process.start(stop_after_crawl=False)
.
You should call process.start(stop_after_crawl=False)
if you create process
in each iteration.
另一种选择是自己处理 Twisted reactor 并使用 CrawlerRunner
.文档有一个例子 这样做.
Another option is to handle the Twisted reactor yourself and use CrawlerRunner
. The docs have an example on doing that.
这篇关于带有scrapy的while循环中的ReactorNotRestartable错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!