在AWS Lambda上运行时Scrapy引发错误ReactorNotRestartable [英] Scrapy throws error ReactorNotRestartable when runnning on AWS Lambda

查看:81
本文介绍了在AWS Lambda上运行时Scrapy引发错误ReactorNotRestartable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经部署了一个scrapy项目,该项目在lambda api请求到来时都会进行爬网.

I have deployed a scrapy project which crawls whenever an lambda api requests comes.

它可以在第一个api调用中完美运行,但随后失败,并引发ReactorNotRestartable错误.

It runs perfectly for the first api call but later on it fails and throws ReactorNotRestartable error.

据我了解,AWS Lambda生态系统并没有杀死进程,因此反应器仍存在于内存中.

As far as I can understand the AWS Lambda ecosystem is not killing the process, hence reactor is still present in the memory.

lambda日志错误如下:

The lambda log error is as follows:

Traceback (most recent call last):
File "/var/task/aws-lambda.py", line 42, in run_company_details_scrapy
process.start()
File "./lib/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False)  # blocking call
File "./lib/twisted/internet/base.py", line 1242, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "./lib/twisted/internet/base.py", line 1222, in startRunning
ReactorBase.startRunning(self)
File "./lib/twisted/internet/base.py", line 730, in startRunning
raise error.ReactorNotRestartable()
ReactorNotRestartable

lambda处理函数是:

The lambda handler function is:

def run_company_details_scrapy(event, context):
   process = CrawlerProcess()
   process.crawl(CompanyDetailsSpidySpider)
   process.start()

我有一个解决方法,就是不通过在启动函数中插入一个标志来停止反应堆

I had a workaround by not stopping the reactor by inserting a flag in the start function

process.start(stop_after_crawl=False)

但是问题是我不得不等到lambda调用超时.

But the problem with this was that I had to wait until the lambda call timed out.

尝试了其他解决方案,但似乎都不起作用.有人可以指导我如何解决此问题.

Tried other solutions, but none of them seems to work.Can anyone guide me how to solve this problem.

推荐答案

您可以尝试使用 https://pypi.python.org/pypi/crochet 协调在Lambda控制的主线程中的非主线程中运行的反应堆的使用.

You could try using https://pypi.python.org/pypi/crochet to coordinate use of a reactor running in a non-main thread from the Lambda-controlled main thread.

Crochet将为您完成线程反应堆的初始化,并提供一些工具,使您可以轻松地从主线程调用反应堆线程中的代码(并获取结果).

Crochet will do the threaded reactor initialization for you and provides tools to make it easy to call code in the reactor thread from the main (and get the results).

这可能更符合Lambda对您的代码的期望.

This might be more in line with the expectations Lambda has of your code.

这篇关于在AWS Lambda上运行时Scrapy引发错误ReactorNotRestartable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆