将scrapy队列存储在数据库中 [英] Storing scrapy queue in a database

查看:30
本文介绍了将scrapy队列存储在数据库中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 url 队列存储在一个数据库中,以便它可以被在不同机器上运行的多个 scrapy 实例共享.

I would like to store the url queue in a database so that it can be shared by multiple instances of scrapy running on different machines.

我发现我需要实现一个队列 API,然后使用 SPIDER_QUEUE_CLASS 设置注册我的队列.但是,我在任何地方都找不到应该实现的 API 的样子.还能用吗?

I've found that there is a queue API that I need to implement and then register my queue using SPIDER_QUEUE_CLASS setting. However, I cannot find anywhere how the API that should be implemented looks like. Is it still available?

推荐答案

scrapy 内部队列使用 queuelib

scrapy internal queue uses queuelib

Queuelib 的目标是速度和简单性.它最初是 Scrapy 框架的一部分,并被剥离到自己的库中.

Queuelib goals are speed and simplicity. It was originally part of the Scrapy framework and stripped out on its own library.

它的api"比较简单,看pushpop的方式由scrapy使用,您的实现可能如下所示:

its "api" is rather simple, see the way push and pop are used by scrapy, your implementation might look like:

DatabaseFifoQueue = _serializable_queue(myqueue.DatabaseQueue, \
_pickle_serialize, pickle.loads)
...

这篇关于将scrapy队列存储在数据库中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆