异步查询数据库以在多个请求中使用密钥 [英] Async query database for keys to use in multiple requests
问题描述
我想异步查询数据库中的密钥,然后为每个密钥请求几个URL。
我有一个函数,该函数从数据库中返回 Deferred
,其值是多个请求的关键。理想情况下,我将调用此函数并从 start_requests
返回一个Deferreds生成器。
@inlineCallbacks
def get_request_deferred(self):
d = yield engine.execute( select([table]))#异步
d.addCallback(make_url)
d.addCallback(请求)
return d
def start_requests(self):
????
但是以多种方式尝试会引起
builtins.AttributeError:'Deferred'对象没有属性'dont_filter'
我的意思是 start_requests
必须返回 Request
对象,而不是返回值为<$ c $的Deferreds。 c> Request 对象。蜘蛛中间件的 process_start_requests()
似乎也是如此。
或者,我可以向 http:// localhost /
发出初始请求,并将其更改为一旦可以通过下载程序中间件的 process_request()
从数据库获得密钥,则返回真实URL。但是, process_request
仅返回 Request
对象;它不能使用以下键将请求产生到多个页面:尝试 yield Request(url)
引发
AssertionError:中间件myDownloaderMiddleware.process_request
必须返回None,响应或请求,并获得生成器
- 从数据库异步获取密钥的最干净解决方案是什么
- 键,生成几个请求
您可以让Deferred对象的回调传递网址给某种发电机。然后,生成器会将收到的所有URL转换为scrapy Request对象,并产生它们。以下是使用您链接的代码的示例(未经测试):
从scrapy
导入import Queue $ b从pdb导入set_trace的$ b作为st从扭曲的.internet.defer导入的
推迟,inlineCallbacks
类ExampleSpider(scrapy.Spider):
name ='example '
def __init __(self):
self.urls =队列()
self.stop =假
self.requests = request_generator()
self.deferred = deferred_generator()
def deferred_generator(self):
d = Deferred()
d.addCallback(self.deferred_callback)
yield d
def request_generator(self):
而非self.stop:
url = self.urls.get()
产生scrapy.Request(url = url,callback = self .parse)
def start_requests(self):
return self.requests.next()
def parse(self,response):
st ()
#当您需要从回调
yield self.requests.next()
@static_method
def deferred_callback(url)解析下一个URL时:
self。 urls.put(url)
如果no_more_urls():
self.stop =真
完成操作后,别忘了停止请求生成器。
I want to asynchronously query a database for keys, then make requests to several urls for each key.
I have a function that returns a Deferred
from the database whose value is the key for several requests. Ideally, I would call this function and return a generator of Deferreds from start_requests
.
@inlineCallbacks
def get_request_deferred(self):
d = yield engine.execute(select([table])) # async
d.addCallback(make_url)
d.addCallback(Request)
return d
def start_requests(self):
????
But attempting this in several ways raises
builtins.AttributeError: 'Deferred' object has no attribute 'dont_filter'
which I take to mean that start_requests
must return Request
objects, not Deferreds whose values are Request
objects. The same seems to be true of spider middleware's process_start_requests()
.
Alternatively, I can make initial requests to, say, http://localhost/
and change them to the real url once the key is available from the database through downloader middleware's process_request()
. However, process_request
only returns a Request
object; it cannot yield Requests to multiple pages using the key: attempting yield Request(url)
raises
AssertionError: Middleware myDownloaderMiddleware.process_request
must return None, Response or Request, got generator
What is the cleanest solution to
- get key asynchronously from database
- for each key, generate several requests
You can let the callback for the Deferred object pass the urls to a generator of some sort. The generator will then convert any received urls into scrapy Request objects and yield them. Below is an example using the code you linked (not tested):
import scrapy
from Queue import Queue
from pdb import set_trace as st
from twisted.internet.defer import Deferred, inlineCallbacks
class ExampleSpider(scrapy.Spider):
name = 'example'
def __init__(self):
self.urls = Queue()
self.stop = False
self.requests = request_generator()
self.deferred = deferred_generator()
def deferred_generator(self):
d = Deferred()
d.addCallback(self.deferred_callback)
yield d
def request_generator(self):
while not self.stop:
url = self.urls.get()
yield scrapy.Request(url=url, callback=self.parse)
def start_requests(self):
return self.requests.next()
def parse(self, response):
st()
# when you need to parse the next url from the callback
yield self.requests.next()
@static_method
def deferred_callback(url):
self.urls.put(url)
if no_more_urls():
self.stop = True
Don't forget to stop the request generator when you're done.
这篇关于异步查询数据库以在多个请求中使用密钥的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!