如何在 Scrapy 中处理 429 Too Many Requests 响应? [英] How to handle a 429 Too Many Requests response in Scrapy?

查看：81 发布时间：2021/7/16 21:50:21 web-scraping scrapy

本文介绍了如何在 Scrapy 中处理 429 Too Many Requests 响应?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试运行一个输出日志如下所示的刮刀:

I'm trying to run a scraper of which the output log ends as follows:

2017-04-25 20:22:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <429 http://www.apkmirror.com/apk/instagram/instagram-instagram/instagram-instagram-9-0-0-34920-release/instagram-9-0-0-4-android-apk-download/>: HTTP status code is not handled or not allowed
2017-04-25 20:22:22 [scrapy.core.engine] INFO: Closing spider (finished)
2017-04-25 20:22:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16048410,
 'downloader/request_count': 32902,
 'downloader/request_method_count/GET': 32902,
 'downloader/response_bytes': 117633316,
 'downloader/response_count': 32902,
 'downloader/response_status_count/200': 121,
 'downloader/response_status_count/429': 32781,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 4, 25, 18, 22, 22, 710446),
 'log_count/DEBUG': 32903,
 'log_count/INFO': 32815,
 'request_depth_max': 2,
 'response_received_count': 32902,
 'scheduler/dequeued': 32902,
 'scheduler/dequeued/memory': 32902,
 'scheduler/enqueued': 32902,
 'scheduler/enqueued/memory': 32902,
 'start_time': datetime.datetime(2017, 4, 25, 17, 54, 36, 621481)}
2017-04-25 20:22:22 [scrapy.core.engine] INFO: Spider closed (finished)

简而言之，在 32,902 个请求中，只有 121 个成功(响应代码 200)，而其余请求因请求过多"而收到 429 个(参见 https://httpstatuses.com/429).

In short, of the 32,902 requests, only 121 are successful (response code 200) whereas the remainder receives 429 for 'too many requests' (cf. https://httpstatuses.com/429).

有没有推荐的方法来解决这个问题?首先，我想查看 429 响应的详细信息，而不是忽略它，因为它可能包含一个 Retry-After 标头，指示在创建新响应之前等待多长时间请求.

Are there any recommended ways to get around this? To start with, I'd like to have a look at the details of the 429 response rather than just ignoring it, as it may contain a Retry-After header indicating how long to wait before making a new request.

此外，如果请求是按照 http://blog.michaelyin.info/2014/02/19/scrapy-socket-proxy/，有可能实现重试中间件，使 Tor 在发生这种情况时更改其 IP 地址.是否有此类代码的公开示例?

Also, if the requests are made using Privoxy and Tor as described in http://blog.michaelyin.info/2014/02/19/scrapy-socket-proxy/, it may be possible to implement retry middleware which makes Tor change its IP address when this occurs. Are there any public examples of such code?

如何在 Scrapy 中处理 429 Too Many Requests 响应? [英] How to handle a 429 Too Many Requests response in Scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Scrapy 中处理 429 Too Many Requests 响应? [英] How to handle a 429 Too Many Requests response in Scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭