python 3.7 urllib.request 不遵循重定向 URL [英] python 3.7 urllib.request doesn't follow redirect URL

查看:109
本文介绍了python 3.7 urllib.request 不遵循重定向 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有 urllib 的 Python 3.7.一切正常,但在收到 http 重定向请求 (307) 时似乎无法自动重定向.

这是我得到的错误:

ERROR 2020-06-15 10:25:06,968 HTTP 错误 307:临时重定向

我必须使用 try-except 来处理它并手动向新位置发送另一个请求:它工作正常,但我不喜欢它.

这些是我用来执行请求的代码:

 req = urllib.request.Request(url)req.add_header('授权', auth)req.add_header('Content-Type','application/json; charset=utf-8')req.data=jdatiself.logger.debug(req.headers)self.logger.info(req.data)resp = urllib.request.urlopen(req)

url 是一个 https 资源,我设置了一个带有一些授权信息和内容类型的标头.req.data 是一个 JSON

从 urllib 文档中我了解到重定向是由库本身自动执行的,但它对我不起作用.它总是引发 http 307 错误并且不遵循重定向 URL.我还尝试使用指定默认重定向处理程序的开启程序,但结果相同

 opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)req = urllib.request.Request(url)req.add_header('授权', auth)req.add_header('Content-Type','application/json; charset=utf-8')req.data=jdatiresp = opener.open(req)

可能是什么问题?

解决方案

重定向没有自动完成的原因已经在评论部分的讨论中被您正确地识别出来了.具体来说,RFC 2616 第 10.3.8 节 指出:><块引用>

如果收到 307 状态码以响应其他请求除了 GET 或 HEAD,用户代理不得自动重定向请求,除非它可以被用户确认,因为这可能更改发出请求的条件.

回到问题 - 鉴于 data 已分配,这会自动导致 get_method 返回 POST(根据 这个方法是如何实现的),以及从那以后请求方法是POST,响应代码是307,根据上述规范,会引发HTTPError.在 Python 的 urllib 上下文中,urllib.request 模块的这个特定部分引发异常.

对于实验,请尝试以下代码:

导入 urllib.request导入 urllib.parseurl = 'http://httpbin.org/status/307'req = urllib.request.Request(url)req.data = b'hello' # 注释掉不触发手动重定向处理尝试:resp = urllib.request.urlopen(req)除了 urllib.error.HTTPError 作为 e:如果 e.status != 307:raise # 不是这里可以处理的状态码redirected_url = urllib.parse.urljoin(url, e.headers['Location'])resp = urllib.request.urlopen(redirected_url)print('Redirected -> %s' % redirected_url) # 原始重定向的 urlprint('响应 URL -> %s ' % resp.url) # 最终 url

按原样运行代码可能会产生以下结果

重定向 ->http://httpbin.org/redirect/1响应 URL ->http://httpbin.org/get

注意后续重定向到 get 是自动完成的,因为后续请求是一个 GET 请求.注释掉 req.data 赋值行将导致缺少重定向"输出行.

在异常处理块中需要注意的其他值得注意的事情,e.read() 可以用来检索服务器生成的响应正文,作为 HTTP 307 响应(由于发布了 data,响应中可能有一个可以处理的短实体?),并且需要 urljoin 作为 Location 标头可能是后续资源的相对 URL(或只是缺少主机).

此外,作为一个有趣的问题(出于链接目的),这个特定问题之前已经被问过多次,我很惊讶他们从未得到任何答案,如下:

I'm using Python 3.7 with urllib. All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).

This is the error i get:

ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect

I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.

These is the piece of code i use to perform the request:

      req = urllib.request.Request(url)
      req.add_header('Authorization', auth)
      req.add_header('Content-Type','application/json; charset=utf-8')
      req.data=jdati  
      self.logger.debug(req.headers)
      self.logger.info(req.data)
      resp = urllib.request.urlopen(req)

url is an https resource and i set an header with some Authhorization info and content-type. req.data is a JSON

From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL. I've also tried to use an opener specifiyng the default redirect handler, but with the same result

  opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)          
  req = urllib.request.Request(url)
  req.add_header('Authorization', auth)
  req.add_header('Content-Type','application/json; charset=utf-8')
  req.data=jdati  
  resp = opener.open(req)         

What could be the problem?

解决方案

The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:

If the 307 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.

For an experiment, try the following code:

import urllib.request
import urllib.parse


url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello'  # comment out to not trigger manual redirect handling
try:
    resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
    if e.status != 307:
        raise  # not a status code that can be handled here
    redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
    resp = urllib.request.urlopen(redirected_url)
    print('Redirected -> %s' % redirected_url)  # the original redirected url 
print('Response URL -> %s ' % resp.url)  # the final url

Running the code as is may produce the following

Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get 

Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.

Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.

Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:

这篇关于python 3.7 urllib.request 不遵循重定向 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆