如何避免HTTP错误429(Too Many Requests)python [英] How to avoid HTTP error 429 (Too Many Requests) python

查看:1593
本文介绍了如何避免HTTP错误429(Too Many Requests)python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python登录网站并从多个网页收集信息,我收到以下错误:

I am trying to use Python to login to a website and gather information from several webpages and I get the following error:


Traceback (most recent call last):
  File "extract_test.py", line 43, in <module>
    response=br.open(v)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code


我使用 time.sleep()并且它有效,但它看起来不聪明且不可靠,有没有其他方法可以避免此错误?

I used time.sleep() and it works, but it seems unintelligent and unreliable, is there any other way to dodge this error?

这是我的代码:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open

urls_list=[first,second,third,fourth]

br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options 
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit()

for url in urls_list:
        br.open(url)
        print re.findall("Some String")


推荐答案

接收状态429 不是错误,是另一个服务器请请您停止发送垃圾邮件请求。显然,你的请求率太高了,服务器也不愿意接受这个。

Receiving a status 429 is not an error, it is the other server "kindly" asking you to please stop spamming requests. Obviously, your rate of requests has been too high and the server is not willing to accept this.

你不应该试图躲闪这个,甚至试图规避通过尝试欺骗您的IP来实现服务器安全设置,您应该通过不发送太多请求来尊重服务器的答案。

You should not seek to "dodge" this, or even try to circumvent server security settings by trying to spoof your IP, you should simply respect the server's answer by not sending too many requests.

如果一切设置正确,您还可以收到了Retry-after标题以及429响应。此标头指定在进行另一次呼叫之前应等待的秒数。处理这个问题的正确方法是读取这个标题并让你的过程睡眠很多秒。

If everything is set up properly, you will also have received a "Retry-after" header along with the 429 response. This header specifies the number of seconds you should wait before making another call. The proper way to deal with this "problem" is to read this header and to sleep your process for that many seconds.

你可以在这里找到有关状态429的更多信息: http://tools.ietf.org/html/rfc6585#page-3

You can find more information on status 429 here: http://tools.ietf.org/html/rfc6585#page-3

这篇关于如何避免HTTP错误429(Too Many Requests)python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆