用于测试的 Scrapy 限制请求 [英] Scrapy Limit Requests For Testing
问题描述
我一直在搜索 Scrapy 文档,寻找一种方法来限制我的蜘蛛允许发出的请求数量.在开发过程中,我不想坐在这里等待我的蜘蛛完成整个爬行,即使爬行非常专注,它们仍然需要很长时间.
I've been searching the scrapy documentation for a way to limit the number of requests my spiders are allowed to make. During development I don't want to sit here and wait for my spiders to finish an entire crawl, even though the crawls are pretty focused they can still take quite awhile.
我希望能够说,在向网站发送 x 个请求后,我正在抓取停止生成新请求."
I want the ability to say, "After x requests to the site I'm scraping stop generating new requests."
在我尝试提出自己的解决方案之前,我想知道是否有我可能错过的设置或使用框架的其他方法.
I was wondering if there is a setting for this I may have missed or some other way to do it using the framework before I try to come up with my own solution.
我正在考虑实现一个下载器中间件,它可以跟踪正在处理的请求数量,并在达到限制后停止将它们传递给下载器.但就像我说的那样,如果可能的话,我宁愿使用框架中已有的机制.
I was considering implementing a downloader middleware that would keep track of the number of requests being processed and stop passing them to the downloader once a limit has been reached. But like I said I'd rather use a mechanism already in the framework if possible.
有什么想法吗?谢谢.
推荐答案
您正在寻找 CLOSESPIDER_PAGECOUNT
CloseSpider
扩展的设置:
You are looking for the CLOSESPIDER_PAGECOUNT
setting of the CloseSpider
extension:
一个整数,指定要抓取的最大响应数.如果蜘蛛爬行超过这个,蜘蛛将被关闭原因 closespider_pagecount
.如果为零(或未设置),蜘蛛将不会被抓取的响应数量关闭.
An integer which specifies the maximum number of responses to crawl. If the spider crawls more than that, the spider will be closed with the reason
closespider_pagecount
. If zero (or non set), spiders won’t be closed by number of crawled responses.
这篇关于用于测试的 Scrapy 限制请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!