如何检查重定向的网页地址，而无需在Python中下载 [英] How to check redirected web page address, without downloading it in Python

查看：396 发布时间：2018/7/10 15:17:29 python http http-headers urllib2 httplib

本文介绍了如何检查重定向的网页地址，而无需在Python中下载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于给定的URL，如何在HTTP重定向后检测最终的Internet位置，而无需使用python下载最终页面（例如HEAD请求。）。我正在尝试编写一个大规模下载器，我的下载机制需要在下载之前知道页面的互联网位置。

For a given url, how can I detect final internet location after HTTP redirects, without downloading final page (e.g. HEAD request.) using python. I am trying to write a mass downloader, my downloading mechanism needs to know internet location of page before downloading it.

我最终做到了这一点，我希望这有助于其他人。我仍然对其他方法持开放态度。

I ended up doing this, I hope this helps other people. I am still open to other methods.

import urlparse
import httplib

def getFinalUrl(url):
    "Navigates Through redirections to get final url."
    parsed = urlparse.urlparse(url)
    conn = httplib.HTTPConnection(parsed.netloc)
    conn.request("HEAD",parsed.path)
    response = conn.getresponse()
    if str(response.status).startswith("3"):
        new_location = [v for k,v in response.getheaders() if k == "location"][0]
        return getFinalUrl(new_location)
    return url

推荐答案

我强烈建议你使用请求库。它编码良好，并得到积极维护。请求可以提供您需要的任何内容，例如预取/

I strongly suggest you to use requests library. It is well coded and actively maintained. Requests can make anything you need like prefetch/

来自请求文档 http://docs.python-requests.org/en/latest/user/advanced/ ：

默认情况下，当您发出请求时，会立即下载响应正文。您可以覆盖此行为并推迟下载响应正文，直到您使用prefetch参数访问Response.content属性：

By default, when you make a request, the body of the response is downloaded immediately. You can override this behavior and defer downloading the response body until you access the Response.content attribute with the prefetch parameter:

tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, prefetch=False)

此时只下载了响应头并且连接保持打开状态，因此允许我们以有条件的方式进行内容检索：

At this point only the response headers have been downloaded and the connection remains open, hence allowing us to make content retrieval conditional:

if int(r.headers['content-length']) < TOO_LONG:
  content = r.content
  ...

你可以进一步使用 Response.iter_content 和Response.iter_lines方法控制工作流程，或从底层urllib3读取 urllib3.HTTPResponse at Response.raw

You can further control the workflow by use of the Response.iter_content and Response.iter_lines methods, or reading from the underlying urllib3 urllib3.HTTPResponse at Response.raw

这篇关于如何检查重定向的网页地址，而无需在Python中下载的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何检查重定向的网页地址，而无需在Python中下载 [英] How to check redirected web page address, without downloading it in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何检查重定向的网页地址，而无需在Python中下载 [英] How to check redirected web page address, without downloading it in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭