如何在scrapy中获取队列中的请求数? [英] How to get the number of requests in queue in scrapy?

查看:105
本文介绍了如何在scrapy中获取队列中的请求数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 scrapy 来抓取一些网站.如何获取队列中的请求数?

I am using scrapy to crawl some websites. How to get the number of requests in the queue?

我查看了 scrapy 源代码并发现 scrapy.core.scheduler.Scheduler 可能会导致我的答案.请参阅:https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py

I have looked at the scrapy source code and find scrapy.core.scheduler.Scheduler may lead to my answer. See: https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py

两个问题:

  1. 如何访问我的蜘蛛类中的调度程序?
  2. 调度程序类中的 self.dqsself.mqs 是什么意思?
  1. How to access the scheduler in my spider class?
  2. What does the self.dqs and self.mqs mean in the scheduler class?

推荐答案

这让我花了一段时间才弄明白,但这是我使用的:

This took me a while to figure out, but here's what I used:

self.crawler.engine.slot.scheduler

那是调度器的实例.然后,您可以调用它的 __len__() 方法,或者如果您只需要对挂起的请求进行 true/false,请执行以下操作:

That is the instance of the scheduler. You can then call the __len__() method of it, or if you just need true/false for pending requests, do something like this:

self.crawler.engine.scheduler_cls.has_pending_requests(self.crawler.engine.slot.scheduler)

请注意,即使队列为空,仍然可能有正在运行的请求.要检查当前有多少请求正在运行,请使用:

Beware that there could still be running requests even thought the queue is empty. To check how many requests are currently running use:

len(self.crawler.engine.slot.inprogress)

这篇关于如何在scrapy中获取队列中的请求数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆