Scrapy在芹菜中随机崩溃与芹菜 [英] Scrapy randomly crashing with celery in django

查看:42
本文介绍了Scrapy在芹菜中随机崩溃与芹菜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Ubuntu服务器上的Django中运行我的Scrapy项目.问题是,即使只有一只蜘蛛在运行,Scrapy也会随机崩溃.

I am running my Scrapy project within Django on a Ubuntu Server. The problem is, Scrapy randomly crash even if Its only one spider running.

下面是TraceBack的代码段.作为非专家,我已经用Google搜索

Below is a snippet of the TraceBack. As a none expert, I have googled

_SIGCHLDWaker Scrappy

_SIGCHLDWaker Scrappy

但无法理解以下代码段的解决方案:

but couldn't comprehend the solutions found for the snippet of below:

--- <exception caught here> ---
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
    why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'

我不熟悉扭曲,尽管试图理解它,但对我来说似乎非常不友好.

I am not familiar with twisted and it seems very unfriendly to me despite trying to understand it.

下面是完整的追溯:

2015-10-10 14:17:13,652: INFO/Worker-4] Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, RandomUserAgentMiddleware, ProxyMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
[2015-10-10 14:17:13,655: INFO/Worker-4] Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
[2015-10-10 14:17:13,656: INFO/Worker-4] Enabled item pipelines: MadePipeline
[2015-10-10 14:17:13,656: INFO/Worker-4] Spider opened
[2015-10-10 14:17:13,657: INFO/Worker-4] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
Unhandled Error
Traceback (most recent call last):
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py", line 101, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py", line 84, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
    why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'

这是我按照

我也尝试过类似教程但这导致另一个问题,我无法获得回溯

Also I have tried something like this tutorial but it results in a different problem which I couldn't get traceback

为了完整起见,这是我的蜘蛛的摘要

For completeness here is a snippet of my Spider

class ComberSpider(CrawlSpider):

    name = "amazon"
    allowed_domains = ["amazon.com"]
    rules = (Rule(LinkExtractor(allow=r'corporations/.+/-*50/[0-9]+\.html', restrict_xpaths="//a[@class='next']"),
                  callback="parse_items", follow=True),
             )

    def __init__(self, *args, **kwargs):
        super(ComberSpider, self).__init__(*args, **kwargs)
        self.query = kwargs.get('query')
        self.job_id = kwargs.get('job_id')
        SignalManager(dispatcher.Any).connect(self.closed_handler, signal=signals.spider_closed)
        self.start_urls = (
            "http://www.amazon.com/corporations/%s/------------"
            "--------50/1.html" % self.query.strip().replace(" ", "_").lower(),
        )

推荐答案

这是一个已知的Scrapy问题.有关详细信息和可能的解决方法,请参见问题报告线程.

This is a known Scrapy issue. See the issue report thread for details and possible workarounds.

这篇关于Scrapy在芹菜中随机崩溃与芹菜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆