如何在Scrapy/Twisted中使用线程,即如何在响应回调中异步调用阻塞代码? [英] How to use threading in Scrapy/Twisted, i.e. how to do async calls to blocking code in response callbacks?
问题描述
我需要在Scrapy中运行一些多线程\多处理工作(因为我有一些使用阻塞调用的库),并在完成后将Request放回Scrapy引擎.
I need run some multi-thread\multiprocessing work (because I have some library which uses blocking call) in Scrapy, and after its completion put back Request to Scrapy engine.
我需要这样的东西:
def blocking_call(self, html):
# ....
# do some work in blocking call
return Request(url)
def parse(self, response):
return self.blocking_call(response.body)
我该怎么做?我认为我应该使用扭曲的反应堆和Deferred对象.
但是Scrapy parse
回调必须仅返回None
或Request
或BaseItem
对象.
How I can do that? I think I should to use Twisted reactor and Deferred object.
But Scrapy parse
callback must return only None
or Request
or BaseItem
object.
推荐答案
如果要返回在阻塞操作在反应堆的线程池线程之一中运行完后触发的Deferred
,请使用deferToThreadPool
:
If you want to return a Deferred
that fires after your blocking operation has finished running in one of the reactor's thread pool threads, use deferToThreadPool
:
from twisted.internet.threads import deferToThreadPool
from twisted.internet import reactor
...
def parse(self, response):
return deferToThreadPool(
reactor, reactor.getThreadPool(), self.blocking_call, response.body)
这篇关于如何在Scrapy/Twisted中使用线程,即如何在响应回调中异步调用阻塞代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!