Scrapy 在 AWS Lambda 上运行时抛出错误 ReactorNotRestartable [英] Scrapy throws error ReactorNotRestartable when runnning on AWS Lambda

查看:22
本文介绍了Scrapy 在 AWS Lambda 上运行时抛出错误 ReactorNotRestartable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经部署了一个scrapy项目,它会在收到 lambda api 请求时进行抓取.

I have deployed a scrapy project which crawls whenever an lambda api requests comes.

它在第一次调用 api 时运行良好,但后来失败并抛出 ReactorNotRestartable 错误.

It runs perfectly for the first api call but later on it fails and throws ReactorNotRestartable error.

据我所知,AWS Lambda 生态系统并没有终止进程,因此反应堆仍然存在于内存中.

As far as I can understand the AWS Lambda ecosystem is not killing the process, hence reactor is still present in the memory.

lambda 日志错误如下:

The lambda log error is as follows:

Traceback (most recent call last):
File "/var/task/aws-lambda.py", line 42, in run_company_details_scrapy
process.start()
File "./lib/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False)  # blocking call
File "./lib/twisted/internet/base.py", line 1242, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "./lib/twisted/internet/base.py", line 1222, in startRunning
ReactorBase.startRunning(self)
File "./lib/twisted/internet/base.py", line 730, in startRunning
raise error.ReactorNotRestartable()
ReactorNotRestartable

lambda 处理函数是:

The lambda handler function is:

def run_company_details_scrapy(event, context):
   process = CrawlerProcess()
   process.crawl(CompanyDetailsSpidySpider)
   process.start()

我有一个解决方法,即不通过在启动函数中插入标志来停止反应器

I had a workaround by not stopping the reactor by inserting a flag in the start function

process.start(stop_after_crawl=False)

但问题在于我必须等到 lambda 调用超时.

But the problem with this was that I had to wait until the lambda call timed out.

尝试了其他解决方案,但似乎都不起作用.谁能指导我如何解决这个问题.

Tried other solutions, but none of them seems to work.Can anyone guide me how to solve this problem.

推荐答案

您可以尝试使用 https://pypi.python.org/pypi/crochet 协调使用在 Lambda 控制的主线程的非主线程中运行的反应器.

You could try using https://pypi.python.org/pypi/crochet to coordinate use of a reactor running in a non-main thread from the Lambda-controlled main thread.

Crochet 将为您进行线程反应器初始化,并提供工具使您可以轻松地从主线程调用反应器线程中的代码(并获取结果).

Crochet will do the threaded reactor initialization for you and provides tools to make it easy to call code in the reactor thread from the main (and get the results).

这可能更符合 Lambda 对您的代码的期望.

This might be more in line with the expectations Lambda has of your code.

这篇关于Scrapy 在 AWS Lambda 上运行时抛出错误 ReactorNotRestartable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆