为什么我在Scrapy中获取“ _SIGCHLDWaker”对象没有属性“ doWrite”? [英] Why I am Getting '_SIGCHLDWaker' object has no attribute 'doWrite' in Scrapy?

查看:144
本文介绍了为什么我在Scrapy中获取“ _SIGCHLDWaker”对象没有属性“ doWrite”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在芹菜中使用了Scrapy蜘蛛,并且随机得到这种错误

I am using Scrapy spiders inside Celery and I am getting this kind of errors randomly

Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/lib/python2.7/site-packages/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/lib/python2.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
    why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'

我正在使用:

celery==3.1.19
Django==1.9.4
Scrapy==1.3.0

这是我在芹菜中运行Scrapy的方式:

This is how I run Scrapy inside Celery:

from billiard import Process
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

class MyCrawlerScript(Process):
    def __init__(self, **kwargs):
        Process.__init__(self)
        settings = get_project_settings('my_scraper')
        self.crawler = CrawlerProcess(settings)
        self.spider_name = kwargs.get('spider_name')
        self.kwargs = kwargs

    def run(self):
        self.crawler.crawl(self.spider_name, qwargs=self.kwargs)
        self.crawler.start()

def my_crawl_manager(**kwargs):
    crawler = MyCrawlerScript(**kwargs)
    crawler.start()
    crawler.join()

在芹菜任务中,我正在打电话:

Inside a celery task, I am calling:

my_crawl_manager(spider_name='my_spider', url='www.google.com/any-url-here')

请知道为什么会这样吗?

Please any idea why this is happening?

PS:我问了另一个问题为什么我在Scrapy中遇到KeyError?我不知道它们是否在某种程度上相似

P.S: I have asked another question Why I am Getting KeyError in Scrapy? I don't know if they are somehow similar

推荐答案

我有同样的问题。我正在使用 asyncio 多处理,Twisted和Scrapy一起使用的复杂应用程序。

I had the same issue. I'm working within a complex application, using asyncio, multiprocessing, Twisted and Scrapy all together.

对我来说,解决方案是使用 asyncioreactor ,在 scrapy

The solution for me was to use asyncioreactor, by installing the alternate reactor before any imports in scrapy:

from twisted.internet import asyncioreactor
asyncioreactor.install()

from scrapy import stuff
# ...

这篇关于为什么我在Scrapy中获取“ _SIGCHLDWaker”对象没有属性“ doWrite”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆