当最终 url 是 https 时,如何使用 python 取消缩短(解析)url? [英] How to un-shorten (resolve) a url using python, when final url is https?

查看:37
本文介绍了当最终 url 是 https 时,如何使用 python 取消缩短(解析)url?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当最终 url 是 https 时,我希望在 python 中缩短(解析)一个 url.我看到了这个问题:如何取消缩短网址使用 python? (以及其他类似的),但是正如对已接受答案的评论中所述,此解决方案仅在网址未重定向到 https 时有效.

作为参考,该问题中的代码(重定向到 http url 时工作正常)是:

# 这是针对 Py2k 的.对于 Py3k,请改用 http.client 和 urllib.parse,并且# 使用//代替/进行除法导入 httplib导入 urlparsedef unshorten_url(url):解析 = urlparse.urlparse(url)h = httplib.HTTPConnection(parsed.netloc)资源 = parsed.path如果 parsed.query != "":资源+=?"+ parsed.queryh.request('HEAD', 资源)响应 = h.getresponse()如果 response.status/100 == 3 和 response.getheader('Location'):return unshorten_url(response.getheader('Location')) # 改为处理短网址链别的:返回网址

(注意 - 出于明显的带宽原因,我希望通过只请求文件头的 [即像上面的 http-only 版本] 而不是请求整个页面的内容来实现)

解决方案

您可以从 url 获取方案,然后在 解析后使用 HTTPSConnection.方案https.
您也可以使用 requests 库非常简单地完成此操作.

<预><代码>>>>进口请求>>>r = requests.head('http://bit.ly/IFHzvO', allow_redirects=True)>>>打印(r.url)https://www.google.com

I am looking to unshorten (resolve) a url in python, when the final urls are https. I have seen the question: How can I un-shorten a URL using python? (as well as similar others), however as noted in the comment to the accepted answer, this solution only works when the urls is not redirected to https.

For reference, the code in that question (which works fine when redirecting to http urls) is:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    resource = parsed.path
    if parsed.query != "":
        resource += "?" + parsed.query
    h.request('HEAD', resource )
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return unshorten_url(response.getheader('Location')) # changed to     process chains of short urls
    else:
        return url

(note - for obvious bandwidth reasons, I am looking to achieve via only asking for the file header's [i.e. like the http-only version above] and not by asking for the content of the whole pages)

解决方案

You can get the scheme from the url and then use HTTPSConnection if the parsed.scheme is https.
You can also use the requests library to do this very simply.

>>> import requests
>>> r = requests.head('http://bit.ly/IFHzvO', allow_redirects=True)
>>> print(r.url)
https://www.google.com

这篇关于当最终 url 是 https 时,如何使用 python 取消缩短(解析)url?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆