如何在scrapy中获取队列中的请求数? [英] How to get the number of requests in queue in scrapy?
问题描述
我正在使用 scrapy
来抓取一些网站.如何获取队列中的请求数?
I am using scrapy
to crawl some websites. How to get the number of requests in the queue?
我查看了 scrapy
源代码并发现 scrapy.core.scheduler.Scheduler
可能会导致我的答案.请参阅:https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py
I have looked at the scrapy
source code and find scrapy.core.scheduler.Scheduler
may lead to my answer. See: https://github.com/scrapy/scrapy/blob/0.24/scrapy/core/scheduler.py
两个问题:
- 如何访问我的蜘蛛类中的调度程序?
- 调度程序类中的
self.dqs
和self.mqs
是什么意思?
- How to access the scheduler in my spider class?
- What does the
self.dqs
andself.mqs
mean in the scheduler class?
推荐答案
这让我花了一段时间才弄明白,但这是我使用的:
This took me a while to figure out, but here's what I used:
self.crawler.engine.slot.scheduler
那是调度器的实例.然后,您可以调用它的 __len__()
方法,或者如果您只需要对挂起的请求进行 true/false,请执行以下操作:
That is the instance of the scheduler. You can then call the __len__()
method of it, or if you just need true/false for pending requests, do something like this:
self.crawler.engine.scheduler_cls.has_pending_requests(self.crawler.engine.slot.scheduler)
请注意,即使队列为空,仍然可能有正在运行的请求.要检查当前有多少请求正在运行,请使用:
Beware that there could still be running requests even thought the queue is empty. To check how many requests are currently running use:
len(self.crawler.engine.slot.inprogress)
这篇关于如何在scrapy中获取队列中的请求数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!