Python 请求:requests.exceptions.TooManyRedirects:超过 30 次重定向 [英] Python Requests: requests.exceptions.TooManyRedirects: Exceeded 30 redirects

查看:93
本文介绍了Python 请求:requests.exceptions.TooManyRedirects:超过 30 次重定向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用 python-requests 库抓取此页面

导入请求从 lxml 导入 etree,htmlurl = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'r = requests.get(url)树 = etree.HTML(r.text)打印树

但我遇到了上述错误.(太多重定向)我尝试使用 allow_redirects 参数但同样的错误

r = requests.get(url, allow_redirects=True)

我什至尝试将标头和数据与 url 一起发送,但我不确定这是否正确.

headers = {'content-type': 'text/html'}有效载荷 = {'ie':'UTF8','node':'976419031'}r = requests.post(url,data=payload,headers=headers,allow_redirects=True)

如何解决此错误.出于好奇,我什至尝试过 Beautiful-soup4,但我遇到了不同但相同的错误

page = BeautifulSoup(urllib2.urlopen(url))

urllib2.HTTPError:HTTP 错误 301:HTTP 服务器返回重定向错误,这将导致无限循环.最后的 30x 错误消息是:永久移动

解决方案

Amazon 正在将您的请求重定向到 http://www.amazon.in/b?ie=UTF8&node=976419031,它反过来重定向到 http://www.amazon.in/electronics/b?ie=UTF8&node=976419031,之后你进入了一个循环:

<预><代码>>>>loc = 网址>>>看到 = 设置()>>>而真:... r = requests.get(loc,allow_redirects=False)... loc = r.headers['位置']...如果看到 loc:break...看到.add(loc)...打印位置...http://www.amazon.in/b?ie=UTF8&node=976419031http://www.amazon.in/electronics/b?ie=UTF8&node=976419031>>>位置http://www.amazon.in/b?ie=UTF8&node=976419031

所以你的原始 URL A 没有重定向一个新的 URL B,它重定向到 C,它重定向到 B 等等.

显然,亚马逊根据 User-Agent 标头执行此操作,此时它设置了一个 cookie,随后的请求应将其发回.以下工作:

<预><代码>>>>s = requests.Session()>>>s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'>>>r = s.get(url)>>>r<响应[200]>

这会创建一个会话(为了便于重复使用和 cookie 持久性),以及 Chrome 用户代理字符串的副本.请求成功(返回 200 响应).

I was trying to crawl this page using python-requests library

import requests
from lxml import etree,html

url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'
r = requests.get(url)
tree = etree.HTML(r.text)
print tree

but I got above error. (TooManyRedirects) I tried to use allow_redirects parameter but same error

r = requests.get(url, allow_redirects=True)

I even tried to send headers and data alongwith url but I'm not sure if this is correct way to do it.

headers = {'content-type': 'text/html'}
payload = {'ie':'UTF8','node':'976419031'}
r = requests.post(url,data=payload,headers=headers,allow_redirects=True)

how to resolve this error. I've even tried beautiful-soup4 out of curiosity and I got different but same kind of error

page = BeautifulSoup(urllib2.urlopen(url))

urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Permanently

解决方案

Amazon is redirecting your request to http://www.amazon.in/b?ie=UTF8&node=976419031, which in turn redirects to http://www.amazon.in/electronics/b?ie=UTF8&node=976419031, after which you have entered a loop:

>>> loc = url
>>> seen = set()
>>> while True:
...     r = requests.get(loc, allow_redirects=False)
...     loc = r.headers['location']
...     if loc in seen: break
...     seen.add(loc)
...     print loc
... 
http://www.amazon.in/b?ie=UTF8&node=976419031
http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
>>> loc
http://www.amazon.in/b?ie=UTF8&node=976419031

So your original URL A redirects no a new URL B, which redirects to C, which redirects to B, etc.

Apparently Amazon does this based on the User-Agent header, at which point it sets a cookie that following requests should send back. The following works:

>>> s = requests.Session()
>>> s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'
>>> r = s.get(url)
>>> r
<Response [200]>

This created a session (for ease of re-use and for cookie persistence), and a copy of the Chrome user agent string. The request succeeds (returns a 200 response).

这篇关于Python 请求:requests.exceptions.TooManyRedirects:超过 30 次重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆