如何避免 HTTP 错误 429(请求过多)python [英] How to avoid HTTP error 429 (Too Many Requests) python

查看:87
本文介绍了如何避免 HTTP 错误 429(请求过多)python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 登录网站并从多个网页收集信息,但出现以下错误:

<块引用>

回溯(最近一次调用最后一次):文件extract_test.py",第 43 行,在 <module> 中响应=br.open(v)文件/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py",第203行,打开返回 self._mech_open(url, data, timeout=timeout)文件/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py",第255行,_mech_open提高反应mechanize._response.httperror_seek_wrapper:HTTP 错误 429:未知的响应代码

我使用了 time.sleep() 并且它有效,但它似乎不智能且不可靠,有没有其他方法可以避免此错误?

这是我的代码:

导入机械化导入cookielib进口重新第一个=(example.com/page1")第二=(example.com/page2")第三=(example.com/page3")第四=(example.com/page4")## 我有七个要打开的 URLurls_list=[第一、第二、第三、第四]br = mechanize.Browser()# 饼干罐cj = cookielib.LWPCookieJar()br.set_cookiejar(cj)# 浏览器选项br.set_handle_equiv(真)br.set_handle_redirect(真)br.set_handle_referer(真)br.set_handle_robots(假)# 登录信息br.open("example.com")br.select_form(nr=0)br["用户名"] = "用户名"br["密码"] = "密码"br.提交()对于 urls_list 中的 url:br.open(网址)打印 re.findall("一些字符串")

解决方案

收到状态 429 不是错误,它是另一个服务器请善待"要求您停止垃圾邮件请求.很明显,你的请求率太高了,服务器不愿意接受.

你不应该试图躲避"为此,甚至试图通过欺骗您的 IP 来规避服务器安全设置,您应该简单地尊重服务器的回答,不要发送太多请求.

如果一切设置正确,您还将收到Retry-after"标头以及 429 响应.此标头指定在进行另一个调用之前应等待的秒数.处理这个问题"的正确方法是是读取此标题并使您的进程休眠几秒钟.

您可以在此处找到有关状态 429 的更多信息:https://www.rfc-editor.org/rfc/rfc6585#page-3

I am trying to use Python to login to a website and gather information from several webpages and I get the following error:

Traceback (most recent call last):
  File "extract_test.py", line 43, in <module>
    response=br.open(v)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

I used time.sleep() and it works, but it seems unintelligent and unreliable, is there any other way to dodge this error?

Here's my code:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open

urls_list=[first,second,third,fourth]

br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options 
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit()

for url in urls_list:
        br.open(url)
        print re.findall("Some String")

解决方案

Receiving a status 429 is not an error, it is the other server "kindly" asking you to please stop spamming requests. Obviously, your rate of requests has been too high and the server is not willing to accept this.

You should not seek to "dodge" this, or even try to circumvent server security settings by trying to spoof your IP, you should simply respect the server's answer by not sending too many requests.

If everything is set up properly, you will also have received a "Retry-after" header along with the 429 response. This header specifies the number of seconds you should wait before making another call. The proper way to deal with this "problem" is to read this header and to sleep your process for that many seconds.

You can find more information on status 429 here: https://www.rfc-editor.org/rfc/rfc6585#page-3

这篇关于如何避免 HTTP 错误 429(请求过多)python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆