如何在Scrapy/Twisted中使用线程,即如何在响应回调中异步调用阻塞代码? [英] How to use threading in Scrapy/Twisted, i.e. how to do async calls to blocking code in response callbacks?

查看:344
本文介绍了如何在Scrapy/Twisted中使用线程,即如何在响应回调中异步调用阻塞代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Scrapy中运行一些多线程\多处理工作(因为我有一些使用阻塞调用的库),并在完成后将Request放回Scrapy引擎.

I need run some multi-thread\multiprocessing work (because I have some library which uses blocking call) in Scrapy, and after its completion put back Request to Scrapy engine.

我需要这样的东西:

def blocking_call(self, html):
    # ....
    # do some work in blocking call
    return Request(url)

def parse(self, response):
    return self.blocking_call(response.body)

我该怎么做?我认为我应该使用扭曲的反应堆和Deferred对象. 但是Scrapy parse回调必须仅返回NoneRequestBaseItem对象.

How I can do that? I think I should to use Twisted reactor and Deferred object. But Scrapy parse callback must return only None or Request or BaseItem object.

推荐答案

如果要返回在阻塞操作在反应堆的线程池线程之一中运行完后触发的Deferred,请使用deferToThreadPool:

If you want to return a Deferred that fires after your blocking operation has finished running in one of the reactor's thread pool threads, use deferToThreadPool:

from twisted.internet.threads import deferToThreadPool
from twisted.internet import reactor

...

    def parse(self, response):
        return deferToThreadPool(
            reactor, reactor.getThreadPool(), self.blocking_call, response.body)

这篇关于如何在Scrapy/Twisted中使用线程,即如何在响应回调中异步调用阻塞代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆