用于测试的 Scrapy 限制请求 [英] Scrapy Limit Requests For Testing

查看:34
本文介绍了用于测试的 Scrapy 限制请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在搜索 Scrapy 文档,寻找一种方法来限制我的蜘蛛允许发出的请求数量.在开发过程中,我不想坐在这里等待我的蜘蛛完成整个爬行,即使爬行非常专注,它们仍然需要很长时间.

I've been searching the scrapy documentation for a way to limit the number of requests my spiders are allowed to make. During development I don't want to sit here and wait for my spiders to finish an entire crawl, even though the crawls are pretty focused they can still take quite awhile.

我希望能够说,在向网站发送 x 个请求后,我正在抓取停止生成新请求."

I want the ability to say, "After x requests to the site I'm scraping stop generating new requests."

在我尝试提出自己的解决方案之前,我想知道是否有我可能错过的设置或使用框架的其他方法.

I was wondering if there is a setting for this I may have missed or some other way to do it using the framework before I try to come up with my own solution.

我正在考虑实现一个下载器中间件,它可以跟踪正在处理的请求数量,并在达到限制后停止将它们传递给下载器.但就像我说的那样,如果可能的话,我宁愿使用框架中已有的机制.

I was considering implementing a downloader middleware that would keep track of the number of requests being processed and stop passing them to the downloader once a limit has been reached. But like I said I'd rather use a mechanism already in the framework if possible.

有什么想法吗?谢谢.

推荐答案

您正在寻找 CLOSESPIDER_PAGECOUNT CloseSpider 扩展的设置:

You are looking for the CLOSESPIDER_PAGECOUNT setting of the CloseSpider extension:

一个整数,指定要抓取的最大响应数.如果蜘蛛爬行超过这个,蜘蛛将被关闭原因 closespider_pagecount.如果为零(或未设置),蜘蛛将不会被抓取的响应数量关闭.

An integer which specifies the maximum number of responses to crawl. If the spider crawls more than that, the spider will be closed with the reason closespider_pagecount. If zero (or non set), spiders won’t be closed by number of crawled responses.

这篇关于用于测试的 Scrapy 限制请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆