当最终 url 是 https 时,如何使用 python 取消缩短(解析)url? [英] How to un-shorten (resolve) a url using python, when final url is https?
问题描述
当最终 url 是 https 时,我希望在 python 中缩短(解析)一个 url.我看到了这个问题:如何取消缩短网址使用 python? (以及其他类似的),但是正如对已接受答案的评论中所述,此解决方案仅在网址未重定向到 https 时有效.
作为参考,该问题中的代码(重定向到 http url 时工作正常)是:
# 这是针对 Py2k 的.对于 Py3k,请改用 http.client 和 urllib.parse,并且# 使用//代替/进行除法导入 httplib导入 urlparsedef unshorten_url(url):解析 = urlparse.urlparse(url)h = httplib.HTTPConnection(parsed.netloc)资源 = parsed.path如果 parsed.query != "":资源+=?"+ parsed.queryh.request('HEAD', 资源)响应 = h.getresponse()如果 response.status/100 == 3 和 response.getheader('Location'):return unshorten_url(response.getheader('Location')) # 改为处理短网址链别的:返回网址
(注意 - 出于明显的带宽原因,我希望通过只请求文件头的 [即像上面的 http-only 版本] 而不是请求整个页面的内容来实现)
您可以从 url
获取方案,然后在 解析后使用
是HTTPSConnection
.方案https
.
您也可以使用 requests 库非常简单地完成此操作.
I am looking to unshorten (resolve) a url in python, when the final urls are https. I have seen the question: How can I un-shorten a URL using python? (as well as similar others), however as noted in the comment to the accepted answer, this solution only works when the urls is not redirected to https.
For reference, the code in that question (which works fine when redirecting to http urls) is:
# This is for Py2k. For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse
def unshorten_url(url):
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
(note - for obvious bandwidth reasons, I am looking to achieve via only asking for the file header's [i.e. like the http-only version above] and not by asking for the content of the whole pages)
You can get the scheme from the url
and then use HTTPSConnection
if the parsed.scheme
is https
.
You can also use the requests library to do this very simply.
>>> import requests
>>> r = requests.head('http://bit.ly/IFHzvO', allow_redirects=True)
>>> print(r.url)
https://www.google.com
这篇关于当最终 url 是 https 时,如何使用 python 取消缩短(解析)url?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!